External war factor analysis with SCCS - using the *.dta SCCS file

From InterSciWiki

Jump to: navigation, search


Contents

[edit] Obsolete

Right click to download data and R routines download, all in the same R folder: named sccs.RData (data and R programs): then click on your hard drive.

Review Factor analysis in R as part of this exercise and the - External War variable lists for factors.

We use open source advanced routines in R that don't want to bother with such intricate handling of test. These routines want you to preprocess the data to obtain numeric columns or data matrices, with "na.exclude" to exclude missing data.

[edit] Start

help.start()
library(foreign) #the format for the input dta file is Stata, which is foreign, it was made from Spss
getwd() #see your working director name: you might want to set it to
setwd("C:/Program Files/R/R-2.6.2/") #this for PC
setwd("/Users/doug/Desktop") #this for MacBook
#See: http://web.csb.ias.edu/library/foreign/html/read.dta.html
#sccs<-read.dta("SCCSvar1-2008NoMap.dta") #if this doesnt work THEN:
#http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008NoMap.dta Right click & save to your R working directory then repeat the command above
sccs<-read.dta("http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008NoMapStata8.dta")#download 1st time by right clicking the url and saving to your working directory
sccs<-read.dta("SCCSvar1-2008NoMapStata8.dta") 
attach(sccs)
plot(v891,v893,xlab="Int War",ylab="Ext War-Attacked") #(test whether data has been read)
length(v891) # check that length is 186 as for SCCS
library(gmodels)
CrossTable(v891,v893,expected=TRUE,prop.chisq=TRUE,fisher=TRUE,dnn=c("v891 Int War","v893 External War:Attacked")) #delete these options if not needed
names(sccs)
length(sccs) #number of variables

[edit] Start

We start with alphanumeric variables, found in the <SCCS variable codebook> and the first task after selecting variables e.g. from this list will be to convert them to numeric. --> Example

SCCS factors for analytic study]

  1. v893 FREQUENCY OF EXTERNAL WAR - BEING ATTACKED
  2. v894 FORM OF MILITARY MOBILIZATION
  3. v892 FREQUENCY OF EXTERNAL WAR - ATTACKING N=134 p=.002
  4. v900 MILITARY EXPECTATIONS II-STATE N=128 p=.002
  5. v895 DECISION TO ENGAGE IN WAR N=134 p=.001
  1. v896 COMMENCEMENT OF WAR N=113 p=.24
  2. v897 CONCLUSION OF WAR N=113 p=.27
  3. v891 FREQUENCY OF INTERNAL WAR
  4. v774 (Low)R External Warfare
  5. v783 //Un//Acceptability of violence toward people in other societies
  6. v780 Hostility toward other societies
  7. v775+ Compliance of individuals w/ community norms (see SCCS_test_of_hypotheses#Sample_table for a very interesting cross-tabulation result for this variable with external war)
  8. v903 //Low// PRESTIGE ASSOCIATED WITH BEING A SOLDIER OR WARRIOR
  1. sccs<-read.spss(source("http://intersci.ss.uci.edu/wiki/pub/SCCSvar1-2008Map.sav")) #OUGHT TO WORK BUT DOESNT
  2. sccs<-read.spss("sccs_R/SCCSvar1-2008Map.sav") #OUGHT TO WORK BUT DOESNT
  3. library(foreign)
  4. sccs<-read.spss("SCCSvar1-2008NoMap.sav") #New test
  5. RINNER Heinrich http://r-help.com/msg/53107.html says the problem is Spss 15.0 see http://wiki.math.yorku.ca/index.php/R:_Data_conversion_from_SPSS

[edit] More

The goal is to try to find some combination of variables -- initially categorical -- put them in a data frame with cases excluded with any missing data, then define variables x1-x6 as numeric variables, then bind these together into a matrix ml with some subset of these variables, then correlate the matrix, and finally factanal (factor analyze) the ml matrix, reducing the factors to 1 to test the single factor model. The results will give a significance test for the null hypothesis.

#cols<-na.exclude(data.frame(v774,v903,v783,v780,v775,v891,v893,v894,v896,v892)) #categorical 
#cols<-na.exclude(data.frame(v774,v783,v780,v775)) #categorical 
#cols<-na.exclude(data.frame(v892,v893,v894,v891)) #categorical N=130 p=.78
#cols<-na.exclude(data.frame(v892,v893,v894,v896)) #categorical N=134 p=.24
#cols<-na.exclude(data.frame(v892,v893,v894,v897)) #categorical N=134 p=.27
#cols<-na.exclude(data.frame(v892,v893,v894,v896)) #categorical N=134 p=.27
#cols<-na.exclude(data.frame(v892,v893,v894)) #categorical N=134 p=.002
#cols<-na.exclude(data.frame(v892,v893,v894,v900)) #categorical N=128 p=.002 2 factors is too many for the 4 variables
#cols<-na.exclude(data.frame(v892,v893,v894,v895,v900)) #categorical N=113 p=.002 
x1=as.numeric(v892)
x2=as.numeric(v893)
x3=as.numeric(v894)
x4=as.numeric(v895)
x5=as.numeric(v900) #these are numeric with NA, 186 cases
z5=as.numeric(x5)
cols<-na.exclude(data.frame(x1,x2,x3,x4,x5)) # reverts to categorical, right # cases
x1=as.numeric(cols[,1])
x2=as.numeric(cols[,2]) 
x3=as.numeric(cols[,3]) 
x4=as.numeric(cols[,4]) 
x5=as.numeric(cols[,5]) 
length(x5);length(x5) #check matching l.enths
mat <- cbind(x1,x2,x3,x4,x5) #this binds them in the proper alignment, and have checked this against Spss
[103,]  1  1  2  2  2
[104,]  2  2  1  1  1
[105,]  2  2  2  1  2
[106,]  2  2  1  1  2
[107,]  1  1  2  1  2
[108,]  2  3  1  1  2
[109,]  2  2  2  2  2
[110,]  2  2  2  1  2
[111,]  1  1  2  1  2
[112,]  2  2  2  1  2
[113,]  3  3  2  2  2
factanal(mat,factors=1,rotation="none")
http://tolstoy.newcastle.edu.au/R/help/04/12/8237.html
cor(mat)
x=edit(mat) #view data in editor
y=edit(cols) #view data in editor 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, NA, 1, NA, NA, 1, 1, NA, 2, 
#IN THIS METHOD THIS IS THE CORRECT MATRIX - THESE ARE CORRELATIONS AMONG LABELS !!
m1 #CORRELATION AFTER DELETING All CASES WITH MISSING VALUES
           x1          x2        x3        x4        x5
x1 1.0000000  0.59761795  0.10095934 0.2960275 0.1224148
x2 0.5976179  1.00000000 -0.05542798 0.1183924 0.1205201
x3 0.1009593 -0.05542798  1.00000000 0.2224718 0.2633117
x4 0.2960275  0.11839240  0.22247177 1.0000000 0.2491139
x5 0.1224148  0.12052008  0.26331167 0.2491139 1.0000000
#doesnt match Spss however.
Call:
factanal(x = mat, factors = 1, rotation = "none")
Uniquenesses:
  x1    x2    x3    x4    x5 
0.005 0.641 0.990 0.912 0.985 
Loadings:
  Factor1
x1 0.997  
x2 0.599  
x3 0.101  
x4 0.297  
x5 0.123  
              Factor1
SS loadings      1.467
Proportion Var   0.293
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 19.47 on 5 degrees of freedom.
The p-value is 0.00157 
Call:
factanal(x = mat, factors = 2, rotation = "none")
Uniquenesses:
  x1    x2    x3    x4    x5 
0.005 0.631 0.710 0.747 0.766 

Loadings:

  Factor1 Factor2
x1  0.997         
x2  0.599  -0.101 
x3  0.101   0.528 <-- higher on 2
x4  0.297   0.406 <-- higher on 2
x5  0.124   0.468 <-- higher on 2
              Factor1 Factor2
SS loadings      1.468   0.673
Proportion Var   0.294   0.135
Cumulative Var   0.294   0.428
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.31 on 1 degree of freedom.
The p-value is 0.0689
Call:
factanal(x = mat, factors = 2, rotation = "varimax")
Uniquenesses:
  x1    x2    x3    x4    x5 
0.005 0.631 0.710 0.747 0.766 
Loadings:
  Factor1 Factor2
x1  0.978   0.196 
x2  0.607         
x3          0.538 
x4  0.211   0.457 
x5          0.483 
              Factor1 Factor2
SS loadings      1.371   0.770
Proportion Var   0.274   0.154
Cumulative Var   0.274   0.428
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.31 on 1 degree of freedom.
The p-value is 0.0689

[edit] skip

Is the thing to do in R to compute means and substitute for NA values? Or do pairwise correlations to generate each missing value the probability of i|j and j|i for each pair of variables and get the probabilistic values, average them across the k variables that ae not missing , and then substitute these averages for missing values? Or, to make a dataframe for each pair of variables, compute the correlation, then rebuild the m x m correlation table?

# introduce missing values?
x[sample(1:10,3),1] <- NA
x[sample(1:10,3),2] <- NA
x[sample(1:10,3),3] <- NA

mat <- ccbind(x1,x2,x3,x4,x5,use="pairwise.complete.obs") #,x6
cor(mat) #THIS DOES CORRELATION 
replace missing data with random number

[edit] More

Convert alphanumeric to numeric NONE OF THIS WORKS YET --

  1. cols<-as.numeric(data.frame(v893,v894,v892,v900,v895)) #categorical N=134 p=.001 2-factor p=.06
cols<-data.frame(v893,v894,v892,v900,v895) #categorical N=134 p=.001 2-factor p=.06
#colsnum=as.numeric(m1) #cols)
#colsnum=numeric(m1) #cols)
#c<-cols[1:186,1:5]
#cc=numeric(c)
V891=v891[1:186] #lee[s alpha but eliminates missing data
# cor(c, use = "pairwise.complete.obs", method = c("pearson")) #, "kendall", "spearman"))

[edit] factor analysis with pairwise correlations

http://www.ma.hw.ac.uk/ams/Rhelp/library/stats/html/cor.html - R Documentation - Numeric Vectors

x1=(cols[,1])
x2=(cols[,3])
y=(cols[,2])
cor(x1,x2,y = NULL, use = "pairwise.complete.obs", method = c("pearson")) #, "kendall", "spearman"))

(not working)

[edit] more

length(x1)

length(m1)

http://www.personality-project.org/r/html/count.pairwise.html

http://dataninja.wordpress.com/2006/02/18/basic-factor-analysis-in-r/ (also sells help), or http://rss.acs.unt.edu/Rdoc/library/stats/html/factanal.html, e.g.

> factors = factanal(cols,factors,scores=c(”regression”),rotation=”varimax”)

where “cols” is our dataframe containing the appropriate variables, with no missing values, and “factors” is the number of factors to be extracted.

socres=”…” and rotation=”…” are optional, and varimax is the default rotation.

If you are doing factor analysis using the Standard Cross-Cultural Sample from any of the problem sets, consult the Index of Variables for a current factor list.

[edit] Back to A289

A289 required readings

Personal tools