External war factor analysis with SCCS - using the *.dta SCCS file
From InterSciWiki
Contents |
[edit] Obsolete
- Right click to download data and R routines download, all in the same R folder: named sccs.RData (data and R programs): then click on your hard drive.
Review Factor analysis in R as part of this exercise and the - External War variable lists for factors.
We use open source advanced routines in R that don't want to bother with such intricate handling of test. These routines want you to preprocess the data to obtain numeric columns or data matrices, with "na.exclude" to exclude missing data.
[edit] Start
help.start()
library(foreign) #the format for the input dta file is Stata, which is foreign, it was made from Spss
getwd() #see your working director name: you might want to set it to
setwd("C:/Program Files/R/R-2.6.2/") #this for PC
setwd("/Users/doug/Desktop") #this for MacBook
#See: http://web.csb.ias.edu/library/foreign/html/read.dta.html
#sccs<-read.dta("SCCSvar1-2008NoMap.dta") #if this doesnt work THEN:
#http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008NoMap.dta Right click & save to your R working directory then repeat the command above
sccs<-read.dta("http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008NoMapStata8.dta")#download 1st time by right clicking the url and saving to your working directory
sccs<-read.dta("SCCSvar1-2008NoMapStata8.dta")
attach(sccs)
plot(v891,v893,xlab="Int War",ylab="Ext War-Attacked") #(test whether data has been read)
length(v891) # check that length is 186 as for SCCS
library(gmodels)
CrossTable(v891,v893,expected=TRUE,prop.chisq=TRUE,fisher=TRUE,dnn=c("v891 Int War","v893 External War:Attacked")) #delete these options if not needed
names(sccs)
length(sccs) #number of variables
[edit] Start
We start with alphanumeric variables, found in the <SCCS variable codebook> and the first task after selecting variables e.g. from this list will be to convert them to numeric. --> Example
SCCS factors for analytic study]
- v893 FREQUENCY OF EXTERNAL WAR - BEING ATTACKED
- v894 FORM OF MILITARY MOBILIZATION
- v892 FREQUENCY OF EXTERNAL WAR - ATTACKING N=134 p=.002
- v900 MILITARY EXPECTATIONS II-STATE N=128 p=.002
- v895 DECISION TO ENGAGE IN WAR N=134 p=.001
- v896 COMMENCEMENT OF WAR N=113 p=.24
- v897 CONCLUSION OF WAR N=113 p=.27
- v891 FREQUENCY OF INTERNAL WAR
- v774 (Low)R External Warfare
- v783 //Un//Acceptability of violence toward people in other societies
- v780 Hostility toward other societies
- v775+ Compliance of individuals w/ community norms (see SCCS_test_of_hypotheses#Sample_table for a very interesting cross-tabulation result for this variable with external war)
- v903 //Low// PRESTIGE ASSOCIATED WITH BEING A SOLDIER OR WARRIOR
- sccs<-read.spss(source("http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008Map.sav")) #OUGHT TO WORK BUT DOESNT
- sccs<-read.spss("sccs_R/SCCSvar1-2008Map.sav") #OUGHT TO WORK BUT DOESNT
- library(foreign)
- sccs<-read.spss("SCCSvar1-2008NoMap.sav") #New test
- RINNER Heinrich http://r-help.com/msg/53107.html says the problem is Spss 15.0 see http://wiki.math.yorku.ca/index.php/R:_Data_conversion_from_SPSS
[edit] More
The goal is to try to find some combination of variables -- initially categorical -- put them in a data frame with cases excluded with any missing data, then define variables x1-x6 as numeric variables, then bind these together into a matrix ml with some subset of these variables, then correlate the matrix, and finally factanal (factor analyze) the ml matrix, reducing the factors to 1 to test the single factor model. The results will give a significance test for the null hypothesis.
#cols<-na.exclude(data.frame(v774,v903,v783,v780,v775,v891,v893,v894,v896,v892)) #categorical #cols<-na.exclude(data.frame(v774,v783,v780,v775)) #categorical #cols<-na.exclude(data.frame(v892,v893,v894,v891)) #categorical N=130 p=.78 #cols<-na.exclude(data.frame(v892,v893,v894,v896)) #categorical N=134 p=.24 #cols<-na.exclude(data.frame(v892,v893,v894,v897)) #categorical N=134 p=.27 #cols<-na.exclude(data.frame(v892,v893,v894,v896)) #categorical N=134 p=.27 #cols<-na.exclude(data.frame(v892,v893,v894)) #categorical N=134 p=.002 #cols<-na.exclude(data.frame(v892,v893,v894,v900)) #categorical N=128 p=.002 2 factors is too many for the 4 variables #cols<-na.exclude(data.frame(v892,v893,v894,v895,v900)) #categorical N=113 p=.002 x1=as.numeric(v892) x2=as.numeric(v893) x3=as.numeric(v894) x4=as.numeric(v895) x5=as.numeric(v900) #these are numeric with NA, 186 cases z5=as.numeric(x5) cols<-na.exclude(data.frame(x1,x2,x3,x4,x5)) # reverts to categorical, right # cases x1=as.numeric(cols[,1]) x2=as.numeric(cols[,2]) x3=as.numeric(cols[,3]) x4=as.numeric(cols[,4]) x5=as.numeric(cols[,5]) length(x5);length(x5) #check matching l.enths mat <- cbind(x1,x2,x3,x4,x5) #this binds them in the proper alignment, and have checked this against Spss [103,] 1 1 2 2 2 [104,] 2 2 1 1 1 [105,] 2 2 2 1 2 [106,] 2 2 1 1 2 [107,] 1 1 2 1 2 [108,] 2 3 1 1 2 [109,] 2 2 2 2 2 [110,] 2 2 2 1 2 [111,] 1 1 2 1 2 [112,] 2 2 2 1 2 [113,] 3 3 2 2 2 factanal(mat,factors=1,rotation="none") http://tolstoy.newcastle.edu.au/R/help/04/12/8237.html cor(mat) x=edit(mat) #view data in editor y=edit(cols) #view data in editor 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, NA, 1, NA, NA, 1, 1, NA, 2, #IN THIS METHOD THIS IS THE CORRECT MATRIX - THESE ARE CORRELATIONS AMONG LABELS !! m1 #CORRELATION AFTER DELETING All CASES WITH MISSING VALUES x1 x2 x3 x4 x5 x1 1.0000000 0.59761795 0.10095934 0.2960275 0.1224148 x2 0.5976179 1.00000000 -0.05542798 0.1183924 0.1205201 x3 0.1009593 -0.05542798 1.00000000 0.2224718 0.2633117 x4 0.2960275 0.11839240 0.22247177 1.0000000 0.2491139 x5 0.1224148 0.12052008 0.26331167 0.2491139 1.0000000 #doesnt match Spss however. Call: factanal(x = mat, factors = 1, rotation = "none") Uniquenesses: x1 x2 x3 x4 x5 0.005 0.641 0.990 0.912 0.985
Loadings:
Factor1
x1 0.997
x2 0.599
x3 0.101
x4 0.297
x5 0.123
Factor1
SS loadings 1.467
Proportion Var 0.293
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 19.47 on 5 degrees of freedom.
The p-value is 0.00157
Call: factanal(x = mat, factors = 2, rotation = "none") Uniquenesses: x1 x2 x3 x4 x5 0.005 0.631 0.710 0.747 0.766
Loadings:
Factor1 Factor2
x1 0.997
x2 0.599 -0.101
x3 0.101 0.528 <-- higher on 2
x4 0.297 0.406 <-- higher on 2
x5 0.124 0.468 <-- higher on 2
Factor1 Factor2
SS loadings 1.468 0.673
Proportion Var 0.294 0.135
Cumulative Var 0.294 0.428
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.31 on 1 degree of freedom.
The p-value is 0.0689
Call:
factanal(x = mat, factors = 2, rotation = "varimax")
Uniquenesses:
x1 x2 x3 x4 x5
0.005 0.631 0.710 0.747 0.766
Loadings:
Factor1 Factor2
x1 0.978 0.196
x2 0.607
x3 0.538
x4 0.211 0.457
x5 0.483
Factor1 Factor2
SS loadings 1.371 0.770
Proportion Var 0.274 0.154
Cumulative Var 0.274 0.428
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.31 on 1 degree of freedom.
The p-value is 0.0689
[edit] skip
Is the thing to do in R to compute means and substitute for NA values? Or do pairwise correlations to generate each missing value the probability of i|j and j|i for each pair of variables and get the probabilistic values, average them across the k variables that ae not missing , and then substitute these averages for missing values? Or, to make a dataframe for each pair of variables, compute the correlation, then rebuild the m x m correlation table?
# introduce missing values? x[sample(1:10,3),1] <- NA x[sample(1:10,3),2] <- NA x[sample(1:10,3),3] <- NA mat <- ccbind(x1,x2,x3,x4,x5,use="pairwise.complete.obs") #,x6 cor(mat) #THIS DOES CORRELATION replace missing data with random number
[edit] More
Convert alphanumeric to numeric NONE OF THIS WORKS YET --
- cols<-as.numeric(data.frame(v893,v894,v892,v900,v895)) #categorical N=134 p=.001 2-factor p=.06
cols<-data.frame(v893,v894,v892,v900,v895) #categorical N=134 p=.001 2-factor p=.06 #colsnum=as.numeric(m1) #cols) #colsnum=numeric(m1) #cols) #c<-cols[1:186,1:5] #cc=numeric(c) V891=v891[1:186] #lee[s alpha but eliminates missing data
# cor(c, use = "pairwise.complete.obs", method = c("pearson")) #, "kendall", "spearman"))
[edit] factor analysis with pairwise correlations
http://www.ma.hw.ac.uk/ams/Rhelp/library/stats/html/cor.html - R Documentation - Numeric Vectors
x1=(cols[,1])
x2=(cols[,3])
y=(cols[,2])
cor(x1,x2,y = NULL, use = "pairwise.complete.obs", method = c("pearson")) #, "kendall", "spearman"))
(not working)
[edit] more
length(x1)
length(m1)
http://www.personality-project.org/r/html/count.pairwise.html
http://dataninja.wordpress.com/2006/02/18/basic-factor-analysis-in-r/ (also sells help), or http://rss.acs.unt.edu/Rdoc/library/stats/html/factanal.html, e.g.
> factors = factanal(cols,factors,scores=c(”regression”),rotation=”varimax”)
where “cols” is our dataframe containing the appropriate variables, with no missing values, and “factors” is the number of factors to be extracted.
socres=”…” and rotation=”…” are optional, and varimax is the default rotation.
If you are doing factor analysis using the Standard Cross-Cultural Sample from any of the problem sets, consult the Index of Variables for a current factor list.
