Manual for Dataset Construction and Causal Analysis

From InterSciWiki
Jump to: navigation, search


http://capone.mtsu.edu/eaeff/Dow-Eff%20functions.html <-- manual site

RData and RDa read - SIL language classification

- Foragers under construction
have distBin339.Rdata, Binford47vars339.RData, need continent/country and eventually: Language. That requires two steps:
Import Language data from ETHNOATLAS for 196 societies
Use closest neighbors (distance matrix) to rank order likely societies with same neighbors
Pick the top rank guess.
EduFor1 -test run for binford data

Chapter 1: Create the elements of the database

Rectangular database: cases in rows, variables in columns
Distance matrix
Phylogenetic matrix
Auxiliary database needed for imputing missing data

Create the rectangular database

  • Begin with a rectangular database in Excel
  • Save as *.csv that can be read in R (comma separated variables)
  • Make a codebook, match simple variable naming, e.g., v1, v2 with columns.
  • Copy *.csv, rename, and replace column lables with v1, v2 ... and other short labels.
Calling variables, e.g., sccs$v49, facilitated by short variable names
  • The EduMod lab at UCI allows writing only to setwd("C:/My Documents/(subdirectories for databases)
This is where on-line projects can be run off the mediawiki site
Hence for compatability, created a "C:/My Documents/(subdirectory)
Copy the two *.cvs files and codebook to that subdirectory
  • Create the R file
setwd("C:/My Documents/YourSubDir")
YourRfile <- read.csv("YourDataFile.csv")
save(YourRfile,file="YourRfile.RData")
  • Restructure the distance matrix==
  • Format is 0 on diagonal, close pairs have high values (proximities), row normalized.

Create the great-circle distances matrix with the spdep package

p340: "The simplest function in this case is an exponentially decaying distance function such as dij–a, where dij is the distance from society i to society j and a is a suitable exponent."
p340: "White et al. (1981) and Dow (1984) measured linguistic distance using a count of the number of steps between societies on a hypothesized phylogenetic tree."

Raw SCCS Distance Matrix 4-2014

setwd('/Users/drwhite/Documents
library(spdep)
#http://cran.r-project.org/web/packages/Imap/Imap.pdf
library(Imap)
gdist(lon.1, lat.1, lon.2, lat.2, units = "nm", a = 6378137.0, b = 6356752.3142, verbose = FALSE)
gdist.total(longlat, units = "nm", segments = TRUE, digits = 2)
http://cran.r-project.org/web/packages/geosphere/geosphere.pdf
#http://stat.ethz.ch/R-manual/R-patched/library/stats/html/dist.html
#http://www.biostat.umn.edu/~sudiptob/Software/distonearth.R
ww<-read.csv("2colBin186.csv")
ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
#DISTANCES: For SplitsTree4
ww<-read.csv("2colBin186.csv")
ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
ww<-round(ww,0)
#ww<-1/(ww+.000001)^2  #Normalizes to row sum=1 for W
diag(ww)<-0
write.csv(ww, file = "SCCSdistances.csv"
#SCCSdistances&Names.csv  names added later
# FINAL: peoples186distance.csv  with spaces (->_) and parents eliminated
# ALSO: peoples186distance.xls  peoples186splits.nex   peoples186.nex    at FETCH mediamiki-1.20..w   0SplitsTree
#PROXIMITIES
diag(ww)<-1
proximities<-10000/ww
diag(proximities)<-0
#diagonalize(ww)  #Diagonalize in R?
write.csv(proximities, file = "SCCSproximities.csv") #, header=TRUE) #With names added:
Society Names	V1	V2	V3
Nama Hottentot	0	11	7
Kung Bushmen	11	0	7
Thonga	         7	7	0
#triangularize how-to-convert-a-symmetric-matrix-to-the-lower-triangle-of-a-matrix
#Dont Normalize
ww<-ww/rowSums(ww)
distBin<-ww
  1. The following program computes the distance on the surface of the earth between two points point1 and point2. Both the points are of the form (Longitude, Latitude)

geodetic.distance <- function(point1, point2)

{ 
R <- 6371 
p1rad <- point1 * pi/180 
p2rad <- point2 * pi/180 
d <- sin(p1rad[2])*sin(p2rad[2])+cos(p1rad[2])*cos(p2rad[2])*cos(abs(p1rad[1]-p2rad[1]))	
d <- acos(d) 
R*d 
}

Courtesy of Anthon Eff: Row Normalized distance matrix

setwd("C:/My Documents/Binford") 
setwd('/Users/drwhite/Documents/3sls/0Bin/')
#Courtesy of Anthon Eff:
library(spdep)
ww<-read.csv("2colBin.csv")
ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
ww<-1/(ww+.000001)^2
diag(ww)<-0
ww<-ww/rowSums(ww)
distBin<-ww
###save(distBin,file="distBin339.Rdata")
   write.dta(distBin,file="distBin339.dta")  ##not quite right
#or if you want a smaller file (3 decimals)
ww <- round(ww[1:339,1:339],3)
distBin<-ww
save(distBin,file="distBin2.Rdata")
Whenever you want to use the matrix, you can call it in this way:
load("distBin.Rdata",.GlobalEnv)
work on ecological W matrix

Goal: Binford LRB$et and LRB$pet measure amount of energy in sunlight through local temperature and precipitation, which are highly correlated. Networks of more similar energetic localities (etDiff) at closer distances (xdd) may be expected to be an important Instrumental variable for controlling autocorrelation effects. Problem is that I get a multiplication error in the second for loop.

setwd('/Users/drwhite/Documents/3sls/0Bin/')  #Rdata
load("xdd.Rdata",.GlobalEnv) #the autocorrelation matrix for distances
load("LRB.Rdata",.GlobalEnv)
et<-LRB$et #the Effective temperature variable
m <- 339; n <- 339
etDiff  <- array(list(),c(m,n))
for (i in 1:339) {
 etDiff [i,i]<-17
    for (j in 1:339) {
 etDiff [i,j]<-  (17 - abs (et[i]-et[j] ))/17  #these are similarities, i.e., large when differences are small, with higher autocorrelation effects.
 etDiff [j,i]<-  (17 - abs (et[i]-et[j] ))/17  #these are similarities, i.e., large when differences are small, with higher autocorrelation effects.
    }
}
diag(etDiff)<-0
etDiff[23:30,23:30]
xdd[23:30,23:30]
etDist  <- array(list(),c(m,n))
for (i in 1:339) {
    for (j in 1:339) {
 etDist [i,j]<- xdd[i,j] * etDiff[i,j]  #these are similarities, i.e., large when differences are small, with higher autocorrelation effects.
 etDist [j,i]<- xdd[j,i] * etDiff[j,i]  #these are similarities, i.e., large when differences are small, with higher autocorrelation effects.
    }
}
etDiff[23:30,23:30]
Error in xdd[i, j] * etDiff[i, j] :  non-numeric argument to binary operator
#class(etDiff) & class(xdd) are class:matrix
plots: right and wrong
The order of cases in Binford (2001:60-67) is that of successive sets of noncontiguous societies with a shared type of ecology, internally sorted from high to low ET (et: Effective Temperature, measured in degrees centigrade) for 9 sets with id numbers 1-28, 35-55, 60-79, 82-137, 143-189, 190-234, 240-260, 268-299, 315-390. ET is a calculated variable derived from MWM and MCM, the warmest and coldest month variables: see 2001:59, formula 4.01. cor(LRB$et,LRB$pet) = 0.9630294. See graphs p.81.
PET (Binford 2001:86-93, similar to that of ET, as shown on a separate graph), is potential evapotranspiration (2001:60-67): "Whereas AE is the amount of water, measured in millimeters (DRW: from weather station data), that is actually transpired and evaporated from the soil an the plant community during a given period of time, potential evapotranspiration (or PET) is the amount of potential water loss to the soil and the plant community, measured in millimeters, assuming that the only limiting factor is the availability of solar energy” (2001:75). See following paragraph for difference between AE and PET. Further definitions follow over several pages. cor(LRB$et,LRB$pet) = 0.9630294See graphs p.81.


THESE TWO PLOTS HAVE THE RIGHT lati variables

  • THOSE TWO BELOW HAVE THE WRONG lati variables
The order of cases in Binford (2001:60-67) is that of successive sets of noncontiguous societies with a shared type of ecology, internally sorted from high to low ET (et: Effective Temperature, measured in degrees centigrade) for 9 sets with id numbers 1-28, 35-55, 60-79, 82-137, 143-189, 190-234, 240-260, 268-299, 315-390. ET is a calculated variable derived from MWM and MCM, the warmest and coldest month variables: see 2001:59, formula 4.01. cor(LRB$et,LRB$pet) = 0.9630294. See graphs p.81.
PET (Binford 2001:86-93, similar to that of ET, as shown on a separate graph), is potential evapotranspiration (2001:60-67): "Whereas AE is the amount of water, measured in millimeters (DRW: from weather station data), that is actually transpired and evaporated from the soil an the plant community during a given period of time, potential evapotranspiration (or PET) is the amount of potential water loss to the soil and the plant community, measured in millimeters, assuming that the only limiting factor is the availability of solar energy” (2001:75). See following paragraph for difference between AE and PET. Further definitions follow over several pages. cor(LRB$et,LRB$pet) = 0.9630294See graphs p.81.
























old code
 #2colBin.csv has only 2 cols: lat, lon, and headers
setwd('/Users/drwhite/Documents/3sls/0Bin/')  #Rdata
     load("vaux.Rdata",.GlobalEnv)
     write.csv(vaux,"vaux2.csv")
     vaux2<-read.csv("vaux2.csv", header=TRUE)
     save(vaux2,file='vaux2.Rdata')
library(spdep)
ww<-read.csv("2colBin.csv")
ww<-read.csv("2colBin.csv") #latitude, longitude, i.e, header=TRUE) 
xx<-ww
ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
yy<-ww
ww<-1/(ww+.000001)^2                        #these are proximities, i.e., large when close, small when distances are large
diag(ww)<-0
ww<-ww/rowSums(ww)
ww[1:3,1:3]
ww[1:3,337:339]
ww[337:339,337:339]
ww<-ww/rowSums(ww)                #rowSums(ww) all 1
load("LRB.Rdata",.GlobalEnv)
et<-LRB$et
m <- 339; n <- 339
etDiff  <- array(list(),c(m,n))
for (i in 1:339) {
    for (j in 1:339) {
 etDiff [i,j]=.0000001+(.0000001+1/(.000001+(abs(et[i]-et[j])))) * (.0000001+ww[i,j])  #these are proximities, i.e., large when differences are small, small when differences are large
 etDiff [j,i]=.0000001+(.0000001+1/(.000001+(abs(et[i]-et[j])))) * (.0000001+ww[i,j])
    }
}
etD<-etDiff
diag(etD)<-0
dim(etD)  # ok to here  #[1] 339 339
class(etD)
etD<-etD/rowSums(etD)   #eliminated NANS and inf
dim(etD)  # ok to here  #[1] 339 339
class(etD)
############
#"numeric"
etD<-etDiff  #as.matrix(etDiff)
class(etD)
#[1] "matrix"
Old<etD
#class(etD)
#at this point, save as *.csv, read as *.Rdata 
diag(etD)<-0
etD<-1000*Old
etD<-etD/rowSums(etD)  # will then work
etD[1:3,1:3]
etD[337:339,1:3]
etD[337:339,337:339]
rowSums(etD) #not working
    write.csv(etD,"etD.csv")
#now clear col 1
    etD<-read.csv("etD.csv", header=TRUE)
    #etD[,1]<-0
    save(etD,file='etD.Rdata')
    load("etD.Rdata",.GlobalEnv)
rowSums(etD)
ETD<-etD/rowSums(etD)
ETD<-ETD/rowSums(ETD)


rowSums(etD)   *THERE ARE NANS
etD<-etD/rowSums(etD)  # will then work WATCH OUT FOR COLUMN 1
    write.csv(etD,"etD.csv")
    etD<-read.csv("etD.csv", header=TRUE)
    save(etD,file='etD.Rdata')
    load("etD.Rdata",.GlobalEnv)

Now copy labels left col, top row from xee.Rdata

    etD<-read.csv("etD.csv", header=TRUE)
    save(etD,file='etD.Rdata')
    load("etD.Rdata",.GlobalEnv)

    load("xdd.Rdata",.GlobalEnv)
    write.csv(xdd,"xdd.csv")
#Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
 'x' must be numeric
class(rowSums(etD))

try typing 'class(etDiff)' to see what kind of object it is. If it's not a matrix then you can convert it to one by saying as.matrix(etDiff)

#round(R,2)
 junk
ww<-read.csv("2colBin.csv")
ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
ww<-1/(ww+.000001)^2
diag(ww)<-0
ww<-ww/rowSums(ww)
dim(ww)
#distBin<-round(ww[1:339,1:339],3)
#save(distBin,file="distBin.Rdata")
To look at the matrix
dim(ww)
round(ww[1:11,1:11],3)
ww <- round(ww[1:339,1:339],3)


distBin339 <- round(ww[dim],3)
distBin339[1:11,1:11]

Create a continent/country matrix

setwd("C:/My Documents/Binford")
library(spdep)
input<-read.csv("regnMatBin339.csv") ##//Although you probably want to throw away 
 #the rows that have all missing values
source("proximity.R")
W<-proximity(input,4,1,-1) #//in this example 4 is the column representing continent,
 # 1 means to use 1 if two are the same continent, -1 otherwise.
dim(W)
round(W[1:339,1:5],3)
W=W+1
W<-1/(W+.000001)^2
diag(W)<-0
W<-W/rowSums(W)
round(W[1:339,1:5],3)
W=W+1
W<-1/(W+.000001)^2
diag(W)<-0
W<-W/rowSums(W)
round(W[1:339,1:5],3)
save(W,file="regionMatBin339.Rdata")
save(W,file="regBin.Rdata")
#========
#Code for proximity.R, stored at "C:/My Documents/Binford"
#========
#You should create a new .R file and paste the following code into it:
proximity <- function(data,class_idx,same_score,diff_score) {
size<-dim(data)1
M <- matrix(0,size,size)
for (x in 1:size) {
 continent_x<-c(data[x,class_idx])
 for (y in 1:size) {
   continent_y<-c(data[y,class_idx])
   if (continent_x == continent_y) {
     M[x,y] <- same_score
   } else {
     M[x,y] <- diff_score
   }
 }
}

}

Create the language distances matrix with the spdep package

  • Eff, E. Anthon. 2008. Weight Matrices for Cultural Proximity: Deriving Weights from a

Language Phylogeny. Structure and Dynamics: eJournal of Anthropological and Related Sciences. Vol. 3, No. 2, Article 9. http://repositories.cdlib.org/imbs/socdyn/sdeas/vol3/iss2/art9

  • for the Binford Forager data language phylogeny will be difficult or impossible.
  • Marcus Hamilton: "Unfortunately I don't think there are any language codes in the Binford data, and not sure how easy they would be to add as I'm not sure how well these language families are understood. We have run into the same problem as at various stages we would have liked to have controlled for phylogeny, but the best we could do was to introduce "continent" and "region" into a glm, and look for those effects, which end up basically controlling for colonization effects and spatial proximity."

So here is what we do,

setwd("C:/My Documents/Binford")
library(spdep)
regnBin<-read.csv("regnMatBin339.csv")
save(regnBin,file="regnMatBin339.Rdata")
  • For the 100 or so continents, read the regnMatBin339 column and
for each pair of countries add 10 to a similarity matrix ww if from the same continent.
  • For the 100 or so countries, read the regnMatBin339 column and
for each pair of countries add 5 to a similarity matrix if ww from the same country.
save(ww,file="regnWwBin339.Rdata")
ww<-ww/rowSums(ww)
dim(ww)
distBin<-round(ww[1:339,1:339],3)
save(distBin,file="distBin.Rdata")
To look at the matrix
dim(ww)
round(ww[1:11,1:11],3)


ww<-SpatialPoints(ww[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
ww<-1/(ww+.000001)^2
diag(ww)<-0
ww<-ww/rowSums(ww)
distBin<-ww
save(distBin,file="distBin339.Rdata")

Create a fake language matrix from Distance with random error

Random number is R: rnorm
R: matrices
setwd("C:/My Documents/Binford") 
library(spdep)
ww<-read.csv("2colBin.csv")
v1 <-rnorm(339)/5
v2 <-rnorm(339)/5
matRnd<-as.matrix(v1[1:339],v2[1:339]) ## I want to multiply 2 columns of random numbers 
 #to simulate geocoordinates to those of ww before the next line
Fake<-matRnd+ww
ww<-SpatialPoints(Fake[,c(2,1)])
ww<-spDists(coordinates(ww),longlat=TRUE)
ww<-1/(ww+.000001)^2
diag(ww)<-0
ww<-ww/rowSums(ww)
fakeBin<-ww
save(fakeBin,file="fakeBin339.Rdata")

Measure effective sample size from W matrix

Griffith, Daniel A. 2005. Effective Geographic Sample Size in the Presence of Spatial Autocorrelation. Annals of the Association of American Geographers 95(4): 740–760. http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8306.2005.00484.x/full

library(foreign)
ww<-as.matrix(read.dta("/Users/drwhite/Documents/sccs/examples/data/dist25wm.dta")[,c(-1,-2,-1,-189)])  
dim(ww)
round(ww[1:11,1:11],3)
ww<-ww/rowSums(ww) #making sure that ww is row normalized
n<-NROW(ww)
w_inv<-solve(ww) #inverse of V -V-1
round(w_inv[1:11,1:11],3)
chk<-ww%*%w_inv # verifying that w_inv is indeed the inverse
table(diag(chk));table(colSums(chk))
TRw_inv<-sum(diag(w_inv)) #trace of inverse
one<-matrix(1,n,1) #column vector of ones 
nstar<-TRw_inv/(t(one)%*%w_inv%*%one)*n #formula
nstar

n* is calculated with a covariance matrix. Its inverse for an SAR model is (I-rho*W)'(I-rho*W). You need to estimate the spatial autocorrelation parameter, rho, first. I also include a very good approximation equation for n*, which requires only the estimate of rho, and does not require the matrix inversion.



ww<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-1,-189)])
#To look at the matrix
dim(ww)
round(ww[1:11,1:11],3)
ww<-ww/rowSums(ww) #making sure that ww is row normalized
w_inv<-solve(ww)    #inverse of V -> V^-1
#dim(w_inv)
round(w_inv[1:11,1:11],3)
trace<- sum(diag(w_inv)) #trace of inverse          #sum( w_inv * t(w_inv) )
transpose<-t(w_inv)
tr<-transpose
tr
n<-NROW(ww)
one<-matrix(1,n,1) #column vector of ones 
nstar<-n*trace/(t(one) %*% t(w_inv) %*% one) #formula
nstar


nI<-c(n)
I <- matrix(0,nrow=nI,ncol=nI)
I[row(I)==col(I)] <- 1
trI<-t(I)
n_ef<-n*trace/(trI*w_inv*I)
n_ef
tr1<-t(1)
n_ef<-n*trace/(tr1*w_inv*I)
n_ef

Construct vaux.Rdata: Take out all columms with missing data

EduFor1
foragers.zip

Doug: the vaux data was based on the complete series from the SCCS (i.e., those with 186 observations), as well as data that I generated from GIS coverages, using the coordinates of the SCCS societies. So, for this forager data set, you can generate similar data by using those series that are complete, plus whatever data you can generate from a GIS, using the 339 coordinates. Once you have these data, you can group variables into groups with similar meanings (e.g., subsistence, climate, topography) and then create principal components for each group.

As far as the syntax of prcomp:

f<-prcomp(Vauxforag)

If you take a look at what f contains:

names(f) [1] "sdev" "rotation" "center" "scale" "x"

The object f$x contains the principal components; f$x[,1:3] is the first three principal components of Vauxforag.

f$rotation is the 35x35 matrix you were observing--it is the loadings for each component.

Anthon (Tony Eff)

From Doug: Anthon: I made an index of the completely coded variables in the Marcus Hamilton et al database (csv attached for this index and the data) focusing on group fractality and related variables. Some of these variables were collected from GIS. They were heavily analyzed by Binford into meaningful measures. I factor analyzed and rotated the factors, paid attention to factors 1-5. Looking over those variables I classified into three general categories and three single variables.

  1. FOOD - those features conducive to more or less food production, regardless of subsistence type, loaded on factor 1, including temperature, evapotranspiration
  2. SUBS - those features loaded on factor 2, contrasting hunting-gathering versus fishing, and loading with food type, mobility, rainfall,
  3. Dense - Pop.sqkm, sqkm territory, distance from Greenwhich (E/W),
  4. SOIL (did not load on any of the 3 factors)
  5. SNOW (did not load on any of the 3 factors)
  6. POPulation (did not load on any of the 3 factors)

Would you take a look and see if these would seem to produce reasonable factors in the first set (1-3) and separate variables in the second (4-6)? I wanted to start with the Hamilton data only as it bears on fractality (those data have missing values) and we wanted to some causal analysis of their dataset alone.

I have many other variables from the original Binford data that other researchers coded in excel. My idea is to add those later and at that point redo the vaux data.

From what I understand, I need to get 339 columns, with column labels for each society, and rows with the PCA values for the significant components, in this case a few from my FOOD, SUBS, Dense PCAs and three from my SOIL, SNOW and POP. Or do I have my axes reversed? Should I use rotated factors for the FOOD, SUBS, Dense PCAs or the unrotated PCA factors (if so, what is the command to get those?). And in general, what is the command for the factor loadings on societies?


Thanks, Doug


Code on Old PC

setwd("C:/My Documents/sccs")
library(sccs)
data(sccs)   #this works
#Binford47vars339narrowCompleteOnlyNo5No7.xlsx
setwd("C:/My Documents/Binford/Best_files/Vaux5_7")
Vauxforag <- read.csv("Binford47vars339narrowCompleteOnlyNo5No7.csv")
save(Vauxforag,file="BinfordVaux.RData")
data(Vauxforag) #doesnt work but no mind for now (needed when running main program)
library(stats)
princomp(Vauxforag) #works
#now make VauxB.csv from the principal components
setwd("C:/My Documents/Binford/Best_files/Vaux5_7")
VauxB <- read.csv("VauxB.csv")
save(VauxB,file="VauxB.RData")


Create Binford data

--Binford data--
setwd("/home/anthon/Dropbox/males")
#setwd("e:/Dropbox/males")
#setwd("/home/eaeff/males/")
rm(list=ls(all=TRUE))
options(echo=TRUE)
library(foreign)
library(psych)
sessionInfo()
.libPaths()
ls("package:foreign")

Binford data

uu<-read.spss("Binfords_HG_file_with_value_labels_current_Dec2008.sav",

             use.value.labels = TRUE, to.data.frame = TRUE, 
             max.value.labels = 30,trim.factor.names = TRUE)

gg<-read.spss("Binfords_HG_file_with_value_labels_current_Dec2008.sav",

             use.value.labels = FALSE, to.data.frame = TRUE,
             max.value.labels = 30,trim.factor.names = TRUE)

z<-which(sapply(gg,function(x) class(x))=="factor") gg[,z]<-sapply(gg[,z],function(x) (as.character(x)))

table(sapply(uu,function(x) class(x))) table(sapply(gg,function(x) class(x))) head(gg[,z]) names(gg)[z]


gSimpStat<-function(ww){

 a<-data.frame(psych::describe(ww))  
 a<-round(a[,c(2:4,8:9)],3)
 return(a)

}


auu<-sapply(uu,function(x) class(x)) agg<-sapply(gg,function(x) class(x)) ncats<-sapply(gg,function(x) length(table(as.character(x)))) a<-cbind(auu,agg,data.frame(ncats),matrix(NA,length(ncats),5))

  1. z<-which(a$a1!="factor")

z<-which(sapply(gg,function(x) class(x))!="character") f<-gSimpStat(gg[,z]) names(a)[-(1:3)]<-names(f) k<-match(rownames(f),rownames(a)) a[k,names(f)]<-f a[,"n"]<-apply(!is.na(gg),2,sum) pff<-a head(pff)

pff[c("subpop","perogat"),] levels(uu$subpop) table(uu$subpop,useNA="ifany") table(gg$subpop,useNA="ifany")

table(uu$perogat,useNA="ifany") table(gg$perogat,useNA="ifany") g<-levels(uu$perogat) levels(uu$perogat)<-paste(g,1:length(g))

table(uu$secno,useNA="ifany") table(gg$secno,useNA="ifany") gg$secno<-gsub(" ","",gg$secno) uu$secno<-as.factor(gg$secno)

write.table("Binford Sample",file="z.txt", sep="\t",append=FALSE,

           row.names=FALSE,col.names=FALSE,quote=FALSE,eol="\r\n")

knn<-rownames(pff) for (ii in knn){

 k<-match(ii,knn)
 write.table(" ",file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
 zzz<-paste(pff[ii,"agg"],"; ",pff[ii,"auu"],sep="")
 if (as.character(pff[ii,"agg"])==as.character(pff[ii,"auu"])){zzz<-pff[ii,"agg"]}
 write.table(paste(k,". ",ii,"; ",zzz,"; nobs=",pff[ii,"n"],"; ncats=",pff[ii,"ncats"],sep="")
             ,file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
 if(pff[ii,"ncats"]<30){
   b1<-table(factor(uu[,ii]),useNA="ifany")
   b2<-table(gg[,ii],useNA="ifany")
   b<-paste("DUPLICATED labels in factor: number of labels=",length(b1),"; number of factor categories=",length(b2),sep="")
   if (length(b1)==length(b2)){
   b<-cbind(data.frame(b2),names(b1))
   names(b)<-c("value","frequency","level")
   }
   write.table(b,file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
     }
 if(pff[ii,"ncats"]>=30 & pff[ii,"agg"]=="numeric"){
     b<-t(pff[ii,c("mean","sd","min","max")])
     write.table(b,file="z.txt", sep="\t",append=TRUE,row.names=TRUE,col.names=FALSE,quote=FALSE)
   }

}


ff<-read.csv("Binford_merged_dataset.csv",as.is=TRUE) names(ff)<-paste("W",tolower(names(ff)),sep="") head(ff)

k<-match(gg[,1],ff[,1]) j<-match((names(ff)),(names(gg))) names(ff)[which(!is.na(j))]

hh<-cbind(gg,ff[k,]) hh[,c("name","Wname")]

table(sapply(ff,function(x) class(x))) head(hh)

table(hh$Wsubpop,useNA="ifany") table(gg$subpop,useNA="ifany")

LRB<-hh

save(LRB,file="LRB.Rdata")

write.table(" ",file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE) write.table("Variables from InterSciWiki Binford_merged_dataset",file="z.txt", sep="\t",append=TRUE,

           row.names=FALSE,col.names=FALSE,quote=FALSE,eol="\r\n")

knn<-names(ff) for (ii in knn){

 k<-match(ii,knn)
 write.table(" ",file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
 write.table(paste(k,". ",ii,"; ",class(ff[,ii]),"; nobs=",sum(!is.na(ff[,ii])),
                   "; ncats=",length(table(ff[,ii])),sep=""),
             file="z.txt", sep="\t",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
 if(length(table(ff[,ii]))<30){
   b1<-table(ff[,ii],useNA="ifany")
   b<-cbind(data.frame(b1))
   write.table(b,file="z.txt", sep="\t",append=TRUE,row.names=TRUE,col.names=FALSE,quote=FALSE)
 }
 if (length(table(ff[,ii]))>=30 & class(ff[,ii])!="character"){
   write.table(t(gSimpStat(ff[,ii])),file="z.txt", sep="\t",append=TRUE,row.names=TRUE,col.names=FALSE,quote=FALSE)
 }

}

NewMac Code 2012, sort LRBinford.Rdata

Add Ethnologue data

 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/BinfordPrinComm') 
 load("LRBEff.Rdata",.GlobalEnv)
 #load('LRBEffnologue.Rdata',.GlobalEnv)
  #load("LRBinford.Rdata",.GlobalEnv) 
 LRBethno<-LRB  #(for some reason LRBEff attenuates to LRB)
 #write.csv(LRBethno, file = "LRBethno.csv")
 LRBethno<-LRB  #(for some reason LRBEff attenuates to LRB)
 write.csv(LRBethno, file = "LRBethno.csv")
 #transfer new data to cols in read(LRBsortPlusWvRecod3.csv
 LRBsorted <- read.csv("LRBsortPlusWvRecod3.csv", header=TRUE)
 #save.csv("LRBsortPlusWvRecod3.csv",file="
  LRBsorted <- read.csv("LRBsortPlusWvRecod3.csv", header=TRUE)
 B<-LRBsorted
 sys3rec=LRBsorted$systate3recod
 table(B$systate3recod, round(B$Wpet/200,0)) ##r=-0. p-value = 0. (curvilinear)
 cor.test(B$systate3recod, round(B$Wpet/200,0))
    1  2  3  4  5  6  7  8  9  r=-0.43  P=0.0000000000000056
 2  1  2  0  2  2  4  3  3  3
 3  0  6  4  0  2  3  5  6  2
 4 11 13 26 16 19 14 11 20  4
 5  0  5  7  4  3  1  1  0  0
 6  5 12 36 24  3  1  0  0  0
 7  0  2 21  3  2  0  0  0  0
 8  0  6 17  2  0  1  0  0  1


#plot(B$shaman, round(B$Wpet/200,0))
#plot(round(B$Wpet/200,0),B$shaman)
#pet=round(B$Wpet/10,0)
#pet=round(B$Wpet,0)
#sham=B$shaman
#plo=cbind(pet,B$crr,2) #this works
plo=cbind(pet,sys3rec,2) #this works
plo2=na.omit(plo) 
pet=plo[,1]
sys3rec=plo[,2]
#ploo=cbind(pet,B$system3recod,2) # 
#plo=na.omit(ploo)
#xxxxxxxxxxx
#
#sys=plo[,2]
#pt=plo[,1]
#sys
#pt
#plo=cbind(pet,B$crr) #this works
plot(plo, main = "lowess(plo)")
#plot(plo, main = "lowess(plo)",ylab="system3recod")
plot(plo, main = "lowess(plo)",ylab="sys3rec")
lines(lowess(plo, f=.5, iter=250), col = "green")
lines(lowess(plo, f=.05, iter=50), col = "red")



 save(LRBsorted, file="LRBsorted.Rdata")
 LB<-LRBsorted
 save(LB, file="LB.Rdata") 
 load("LB.Rdata",.GlobalEnv)
 B<-LB
   LRBsorted <- read.csv("LRBsortPlusWvRecod.csv", header=TRUE)
 #LRBsorted <- read.csv("LRBsortPlusWvRecod.csv", header=TRUE)
 save(LRBsorted, file="LRBsorted.Rdata")
 LB<-LRBsorted
 save(LB, file="LB.Rdata") 
 load("LB.Rdata",.GlobalEnv)
 B<-LB

LRBsortPlusWvRecod.csv

 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/BinfordPrinComm') 
 load("LRBsortPlusWvRecod.csv,.GlobalEnv")
   LRBsorted <- read.csv("LRBsortPlusWvRecod.csv", header=TRUE)
 #LRBsorted <- read.csv("LRBsortPlusWvRecod.csv", header=TRUE)
 save(LRBsorted, file="LRBsorted.Rdata")
 LB<-LRBsorted
 save(LB, file="LB.Rdata") 
 load("LB.Rdata",.GlobalEnv)
 B<-LB
 #table(B$shaman,B$systate3recod)
   table(B$shaman,B$systate3recod) #r=0.534 p-value = 0.00000000000000022
 table(B$shaman,B$systate3recod,B$forcol)
 #table(B$shaman,B$crr)
 table(B$shaman, round(B$Wpet/200,0)) ##r=-0.16 p-value = 0.004326
 #table(B$shaman,B$pet) 
#FROM BELOW:
 cor.test(LRB$shaman,pca$scores[,1]) #r=0.320 p-value = 0.0000000065
 cor.test(LRB$shaman,pca$scores[,2]) #r=0.114 p-value = 0.04306

Use LRBsortPlusWvRecod5.csv to make Vaux Ethno and VauxEnvirEthno.csv, Rdata

 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/BinfordPrinComm') 
 Bin_Vaux <- read.csv("VauxEnvirEthno.csv", header=TRUE)
 save(Bin_Vaux, file="Bin_Vaux.Rdata")
 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/BinfordPrinComm') 
 #load("LRBsorted.Rdata",.GlobalEnv)
 #load("LRBsortPlusWvRecod5.csv,.GlobalEnv)
 LRBsorted <- read.csv("LRBsortPlusWvRecod7.csv", header=TRUE)
 save(LRBsorted, file="LRBsorted.Rdata")
 LRB<-LRBsorted
 save(LRB, file="LRB.Rdata") 
 load("LRB.Rdata",.GlobalEnv)
 library(labdsv)
 year has eight zeros:  societies 56, 69, 88, 92, 150, 152, 153 , 249 
B$YEAR[56] <- 1970
B$YEAR[56] <-1970
B$YEAR[69] <-1970
B$YEAR[88] <-1930
B$YEAR[92] <-1925
B$YEAR[150] <-1870
B$YEAR[152] <-1920
B$YEAR[153] <-1900
B$YEAR[249] <-1880  
#write.csv(LRBethno, file = "LRBethno.csv") #read.csv("LRBsortPlusWvRecod6.csv", header=TRUE)
B$hunting[70]=2
B$forcol[173]=1.5
write.csv(B, file = "LRBsortPlusWvRecod7.csv")

1570 1600 1700 1715 1760 1770 1800 1805 1830 1840 1848 1850 1860 1865 1870

  1    1    1    1    1    1    4    1    3    3    2   32   73    1   38 

1880 1890 1900 1901 1903 1906 1909 1910 1912 1920 1922 1924 1925 1926 1928

 47   10   21    1    1    1    1    6    1   16    1    1    1    1    2 

1930 1933 1934 1936 1937 1938 1940 1941 1946 1950 1952 1954 1960 1962 1963

  8    1    2    1    1    1    3    1    1    6    1    1    4    1    3 

1965 1968 1970 1972 1973 1974 1975 1976 1977 1978 1979 1980 1989 1990

  3    7    4    1    1    1    1    2    1    2    1    3    2    1
#pca<-princomp(data = B, cbind(B$owners, B$packord,   library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/BinfordPrinComm') 
 #load("LRBsorted.Rdata",.GlobalEnv)
 #load("LRBsortPlusWvRecod5.csv,.GlobalEnv)
 LRBsorted <- read.csv("LRBsortPlusWvRecod6.csv", header=TRUE)
 save(LRBsorted, file="LRBsorted.Rdata")
 LRB<-LRBsorted
 save(LRB, file="LRB.Rdata") 
 load("LRB.Rdata",.GlobalEnv)
 library(labdsv)
 year has eight zeros:  societies 56, 69, 88, 92, 150, 152, 153 , 249 
B$YEAR[56] <- 1970
B$YEAR[56] <-1970
B$YEAR[69] <-1970
B$YEAR[88] <-1930
B$YEAR[92] <-1925
B$YEAR[150] <-1870
B$YEAR[152] <-1920
B$YEAR[153] <-1900
B$YEAR[249] <-1880 
#pca<-princomp(data = B, cbind(B$owners, B$packord, B$YEAR))
#pca<-princomp(data = B, B$YEAR, B$owners, B$packord)
cb=cbind(B$owners, B$packord, B$comstfun, B$money, B$dkinex, B$polpos, B$class, B$headman, B$war1, B$YEAR)
pca<-princomp(cb) 
 pca$scores
 pca$loadings 
#                                   1         2          3          4           5             6            7              8            9          10           11          12            13         14             15        16          
#18     19           20           21     22         23        24       25*H
pca$scores[,1:3]
plot(pca)
summary(pca)
biplot(pca)
#pca<-princomp(data = B, B$YEAR, B$owners, B$packord)
cb=cbind(B$owners, B$comstfun, B$packord, B$money, B$dkinex, B$polpos, B$class, B$headman, B$war1)
pca<-princomp(cb) 
 pca$scores
 pca$loadings 
pca$scores[,1:3]
plot(pca)
summary(pca)
biplot(pca)


#pca<-princomp(data = B, B$YEAR, B$owners, B$packord)
cb=cbind(B$owners, B$comstfun, B$packord)   #cmplxMod BEST MODEL
pca<-princomp(cb) 
 pca$scores
 pca$loadings 
pca$scores[,1:3]
plot(pca)
summary(pca)
biplot(pca)
#Make Vaux variables
pca$scores[,1]
pca$scores[,2]
pca$scores[,3] # is this needed?
#stack(pca$scores[,1:3]) http://cran.r-project.org/doc/manuals/R-data.html
write.csv(pca$scores[,1],"col1.csv")
write.csv(pca$scores[,2],"col2.csv")
write.csv(pca$scores[,3],"col3.csv")
cb=cbind(B$area, B$tlpop, B$systate3recod, B$occspe)   #THESE ARE NA, B$Wgroup1, B$Wgroup2) #scope-->  B$intrd not NA=0
#cb=cbind(B$sed, B$grpat, B$mobpat, B$mobpat2, B$nomov, B$dpmov)
cb=cbind(B$sed,  B$mobpat, B$mobp2)
cb=cbind(B$kincon, B$kinstr, B$kinbia1, B$kinder, B$kinbia2, B$nenept, B$kinexo)  #B$commun,
cb=cbind(B$subsp, B$subspx, B$forcol, B$hunt, B$huntfil, B$huntfil2, B$hunting, B$gath, B$gatherin, B$fish)  ###, B$fishing)
cb=cbind(B$area, B$tlpop, B$systate3recod, B$occspe)   #THESE ARE NA, B$Wgroup1, B$Wgroup2)
pca<-princomp(cb) 
 pca$scores
 pca$loadings 
pca$scores[,1:3]
plot(pca)
summary(pca)
biplot(pca)
#Make Vaux variables
pca$scores[,1]
#stack(pca$scores[,1])
write.csv(pca$scores[,1],"col.csv")

LRBsortPlusWvars.csv

Corrections: 
 Recode latitude NA for 68 Hai//om Northern Namibia to 18.40111
 Recode forcol NA for 173 Northern Pomo and take value for 163 Southern Pomo as 1
 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/') 
 load("LRBsorted.Rdata",.GlobalEnv)
 load("LRBsortPlusWvars.csv,.GlobalEnv)
 LRBsorted <- read.csv("LRBsortPlusWvars.csv", header=TRUE)
 save(LRBsorted, file="LRBsorted.Rdata")
 LRB<-LRBsorted
 save(LRB, file="LRB.Rdata") 
 load("LRB.Rdata",.GlobalEnv)
  1. pca<-princomp(data = B, Wae,bio5,bar5,nagp,snowac,growc,wltgrc,watret,density, defper,lden,setting,dposit, mrain,mtemp,sdrain,clim,et,rrcorr,rrcorr2,crr, Wmwm,mcm,pet,tlpop,area,subsp)
 library(labdsv)
 pca<-princomp(cbind(B$Wae,B$bio5,B$bar5,B$nagp,B$snowac,B$growc,B$wltgrc,B$watret))  #v2$loadings
 pca$scores
 pca$loadings #PERFECTLY INDEPENDENT
               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.125  0.125  0.125  0.125  0.125  0.125  0.125  0.125
Cumulative Var  0.125  0.250  0.375  0.500  0.625  0.750  0.875  1.000
#                                   1         2          3          4           5             6            7              8            9          10           11          12            13         14             15        16          17

pca<-princomp(cbind(B$Wae,B$bio5,B$bar5,B$nagp,B$snowac,B$growc,B$wltgrc,B$watret,B$density,B$defper,B$lden,B$setting,B$dposit,B$mrain,B$mtemp,B$sdrain,B$clim, B$et,B$rrcorr,B$rrcorr2,B$crr,B$Wmwm,B$mcm,B$pet,B$tlpop,B$area,B$subsp,B$Wsoil,B$hunt,B$wltgrc)))

#18     19           20           21     22         23        24       25*H
pca$scores[,1:3]
plot(pca)
summary(pca)
#biplot(pca)

pca<-princomp(cbind(B$crr,B$Wmwm,B$mcm,B$pet,B$tlpop,B$area,B$subsp,B$Wsoil,B$hunt))

#                                   1*         2          3          4***      5**         6            7          8            9
pca$scores[,1:3]
biplot(pca)
summary(pca)
#plot(pca)
LRBsortPlusWvars.csv ENVIRONMENT PCA BEST MODEL lowess
table(B$shaman, round(B$Wpet/200,0)) #works
pet=B$Wpet
sham=B$shaman
plo=cbind(pet,B$crr) #this work
plo=cbind(pet,B$system3recod,NA=2) 
pet=round(B$Wpet,0)
sham=B$system3recod
plo=cbind(pet,B$crr) #this works
plo=cbind(pet,B$crr) #+sham) 
plo=cbind(pet,sham) #this plots but doesnt do lowess
plo=cbind(pet,B$crr) #this works
plot(plo, main = "lowess(plo)")
plot(plo, main = "lowess(plo)",ylab="System3recod")
lines(lowess(plo, f=.5, iter=250), col = "green")
lines(lowess(plo, f=.05, iter=50), col = "red")


pet=round(B$Wpet/10,0)
sham=B$shaman
plo=cbind(pet,sham)
#plo=cbind(pet,B$crr)
plot(plo, main = "lowess(plo)")
xlab=Ubb["pdens","label"],ylab="Effect on Freq. Ext. War")
lines(lowess(plo, f=.5), col = "green")
lines(lowess(plo, f=.05), col = "red")


#GOT TO GET RID OF NAs
lines(lowess(plo), col = "red")
lines(lowess(plo, f=.1), col = "blue")
legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)
plo=cbind(pet,B$crr)
lowess(plo)
plot(pet,sham), lines(lowess(pet,B$shaman),col="red")
abline(h=0,col="blue",lty=3)
#TAKEN FROM EXAMPLE:
plot(g[,"pdens"],jj*g[,"pdens"],
main=paste("Effect of ",Ubb["pdens","label"],sep=""),
xlab=Ubb["pdens","label"],ylab="Effect on Freq. Ext. War")
lines(lowess(g[,"pdens"],jj*g[,"pdens"]),col="red")
abline(h=0,col="blue",lty=3)
pca<-princomp(cbind(B$crr,B$pet,B$tlpop))
pca<-princomp(cbind(B$crr,B$pet,B$wltgrc))  ###########BEST MODEL
#                                   1*         2          3          
pca$scores[,1:2]
biplot(pca)
summary(pca)
#plot(pca)
env1=pca$scores[,1]
env2=pca$scores[,2]
env1=pca$scores[1,]
env2=pca$scores[2,]
library(nnet)
cor.test(LRB$shaman,pca$scores[,1]) #r=0.320 p-value = 0.0000000065
cor.test(LRB$shaman,pca$scores[,2]) #r=0.114 p-value = 0.04306
#if LRB$shaman=1 LRB$shaman=8

pca<-princomp(cbind(B$crr,B$pet,B$wltgrc,B$Wsoil))

#                                   1*         2          3          4***      5**         6            7          8            9
pca$scores[,1:2]
biplot(pca)
summary(pca)
#plot(pca)

Loadings:

     Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
[1,]         0.275  0.287  0.887               -0.167  0.147               
[2,]                             -0.836  0.400 -0.317 -0.200               
[3,]                                                                       
[4,]                      -0.192        -0.569 -0.788  0.102               
[5,]                      -0.157  0.126  0.494 -0.203  0.799  0.177        
[6,]                                                                       
[7,]                                                                       
[8,]                              0.529  0.518 -0.441 -0.488 -0.118        
[9,]                                                  -0.181  0.716  0.574 

[10,] 0.102 -0.576 [11,] [12,] [13,] [14,] -0.288 0.216 [15,] [16,] 0.577 -0.523 [17,] [18,] [19,] [20,] [21,] 0.655 0.640 -0.382 [22,] [23,] [24,] 0.994 [25,] 0.104 -0.700 0.707 [26,]

     Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 Comp.18
[1,]                                                                
[2,]                                                                
[3,]                  0.308  -0.929  -0.116                         
[4,]                                                                
[5,]                                                                
[6,]          0.129                  -0.174           0.294         
[7,]                                 -0.149          -0.501   0.410 
[8,]                                                                
[9,] -0.285  -0.160           0.115                                 

[10,] -0.753 -0.274 [11,] [12,] 0.587 0.688 [13,] -0.108 -0.101 [14,] -0.300 0.818 0.278 -0.143 [15,] -0.228 0.696 -0.147 0.470 0.139 [16,] 0.392 0.157 0.429 0.111 [17,] -0.111 0.159 [18,] -0.158 -0.392 0.429 [19,] -0.108 -0.691 [20,] -0.108 -0.691 [21,] [22,] 0.175 -0.781 0.188 -0.320 [23,] -0.192 0.551 -0.226 -0.170 [24,] [25,] [26,] -0.216 -0.131

     Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25 Comp.26
[1,]                                                                
[2,]                                                                
[3,]                                                                
[4,]                                                                
[5,]                                                                
[6,] -0.919                                                         
[7,] -0.117  -0.708           0.106                                 
[8,]                                                                
[9,]                                                                

[10,] [11,] 0.175 0.274 0.875 -0.342 [12,] 0.289 0.274 [13,] 0.194 0.291 0.917 [14,] [15,] -0.100 0.395 -0.106 [16,] [17,] 0.256 0.105 0.927 -0.111 [18,] 0.617 0.327 -0.347 [19,] 0.707 [20,] -0.707 [21,] [22,] 0.201 0.359 -0.103 [23,] 0.107 0.143 0.156 -0.690 0.161 [24,] [25,] [26,] 0.891 0.172 -0.288 -0.145


 THIS HAS BEEN DONE
 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/') 
 load("LRBinford.Rdata",.GlobalEnv) 
 write.csv(LRB, file = "LRBpresort.csv")
 #hsb2 <-  read.table('LRBpresort.csv', header=T, sep=",")
 #attach(hsb2)
 #hsb2[1:10, ]
 #sorted hand with excel
 LRBsorted <- read.csv("LRBsort.csv", header=TRUE)
 LRBsorted <- read.csv("LRBsortPlusWvars.csv", header=TRUE)
 save(LRBsortPlus, file="LRBsorted.Rdata")


 LRB<-LRBinford
 LRB$subpop
  [1] "x" "n" "n" "n" "n" "n" "x" "x" "x" "x" "x" "x" "n" "n" "x" "x" "n" "x" "x" "n" "n" "x" "n" "n" "x"
 [26] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "x" "n" "n" "n" "n"
 [51] "n" "n" "n" "n" "x" "n" "x" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
 [76] "n" "n" "n" "n" "n" "n" "x" "n" "x" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[101] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[126] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[151] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "x" "x" "x" "n" "n" "n" "x" "n" "x" "x" "x" "x" "x" "x"
[176] "x" "x" "x" "x" "x" "x" "x" "n" "n" "n" "x" "x" "x" "n" "x" "n" "x" "n" "n" "n" "n" "x" "x" "x" "x"
[201] "x" "n" "x" "x" "x" "n" "x" "x" "n" "x" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "x" "n"
[226] "n" "x" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "x" "x" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[251] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[276] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[301] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
[326] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n"
 LRB$systate3
  [1] 1 2 2 2 6 4 2 2 4 2 4 4 1 4 3 3 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [51] 4 4 4 4 2 4 2 4 4 4 4 4 4 4 1 4 4 4 4 4 4 1 4 4 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[101] 6 6 6 6 5 4 6 6 5 5 3 3 7 6 3 3 3 3 5 1 4 4 4 6 7 4 6 4 4 4 4 6 5 4 5 3 4 7 6 6 4 6 5 6 5 6 4 4 4 6
[151] 6 4 4 4 4 4 4 4 4 6 4 3 2 3 2 4 4 2 4 3 3 3 3 3 3 2 3 3 3 3 3 3 5 6 7 4 2 2 4 2 7 5 4 4 4 4 3 3 3 3
[201] 3 4 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 4 4 4 6 4 4 4 5 2 6 4 5 6 6 6 6 5 6 6 6
[251] 7 5 6 6 6 7 4 6 6 6 6 6 5 6 6 6 6 6 6 6 6 5 6 6 6 6 6 6 6 6 6 6 6 6 5 4 6 4 4 4 4 4 6 4 5 5 6 6 6 6
[301] 6 7 6 6 7 7 6 6 6 6 7 7 6 7 7 7 7 7 7 7 7 7 7 6 7 7 6 6 6 6 6 6 7 6 5 6 7 7 6


noantrap mhs kmov kspmov lkspmov lkmov famsz sz1fam szjoint szcomu szmean forcol packing





 library(foreign)
 setwd('/Users/drwhite/Documents/3sls/sccs/') 
 SPSSforag <- read.spss("Binfords_HG_file_with_value_labels_current_Dec2008.sav", use.value.labels = FALSE)
#SPSSforag <- VauxSPSSforag
 save(SPSSforag,file="Binford80vars339.RData") #not edited
 write.table(SPSSforag,file="SPSSforag.csv",sep=",",row.names=T)
 #save(SPSSforag,file="Binford80vars339.RData") #edited
 #SPSSforag <- read.csv("Binford80vars339extra3a.csv", use.value.labels = FALSE)
 SPSSforag <- read.csv("Binford80vars339extra3a.csv") #edited
s11=round(round(as.numeric(SPSSforag$s11,digits=0))/100, digits=0)
s11
v11=round(round(as.numeric(SPSSforag$v11,digits=0))/100, digits=0)
v11
table(v11,s11)
  s11

v11 0 1 2 3

 0  50   0   0   0
 1   0  99   0   0
 2   0   0 101   0
 3   2   0   0  92
s=SPSSforag$s13
v=SPSSforag$v13
rs=1
rv=10
s=round(round(as.numeric(s,digits=0))/rs, digits=0)
v=round(round(as.numeric(v,digits=0))/rv, digits=0)
table(v,s)
cor(v,s)
  s

v 0 1 2 3 4

 0  34   0   0   0   0
 1   1  93   0   0   0
 2   0  19 122   0   0
 3   0   0  21  39   0
 4   1   0   0   0  14

> cor(v,s) [1] 0.9215152

s=SPSSforag$s14
v=SPSSforag$v14
rs=1
rv=1
  s

v 0 1 2 3 4 5

 0 37  0  0  0  0  0
 1  1 97  0  0  0  0
 2  0 36 79  0  0  0
 3  0  0  1 49  0  0
 4  0  0  0  2 32  0
 5  1  0  0  0  1  8

> cor(v,s) [1] 0.9430436

s=SPSSforag$s15
v=SPSSforag$v15
rs=10
rv=10
s=round(round(as.numeric(s,digits=0))/rs, digits=0)
v=round(round(as.numeric(v,digits=0))/rv, digits=0)
table(v,s)
cor(v,s)
  s

v 0 1 2 3 4 5 6

 0 53  0  0  0  0  0  0
 1  1 64  0  0  0  0  0
 2  0 19 55  1  0  0  0
 3  0  0  2 53  0  0  0
 4  0  0  0 12 27  0  0
 5  0  0  0  0 12 24  0
 6  1  0  0  0  0  6 12
 7  0  0  0  0  0  0  2

> cor(v,s) [1] 0.9624305

s=SPSSforag$s16
v=SPSSforag$v16
rs=10
rv=10

NONONO


s=SPSSforag$s18
v=SPSSforag$v18
rs=10
rv=10
  s

v 0 1 2 3 4 5 6

 0 127  20   8   2   3   5   3
 1   0  22   5   0   0   0   1
 2   2   0  31   2   0   0   0
 3   3   0   1  25   0   0   0
 4   2   0   0   2  27   3   0
 5   2   0   0   1   2  24   0
 6   1   1   0   2   0  11   6

> cor(v,s) [1] 0.7826439

> s=SPSSforag$v018 > v=SPSSforag$v18 > rs=10 > rv=.1 > > s=round(round(as.numeric(s))/rs, digits=0) > v=round(round(as.numeric(v))/rv, digits=0) > table(v,s)

  s

v 0 1 2 3 4 5 6

 0 165   0   0   1   0   0   2
 1   3  25   0   0   0   0   0
 2   0   4  31   0   0   0   0
 3   0   0   1  28   0   0   0
 4   0   0   0   0  34   0   0
 5   0   0   0   0   0  29   0
 6   1   0   0   0   0   0  20

> cor(v,s) [1] 0.9568045

NOW DO 26

 s=SPSSforag$s27
 v=SPSSforag$V27
 rs=10
 rv=10

cor(v,s) [1] 0.964592

   s

v 0 1 2 3 4 5 6 7 8 9 10 11

 0  208   3   3   2   3   0   1   3   4   1   2   0
 1    3   5   4   0   0   0   0   0   0   0   0   0
 2    0   0   8   4   0   0   0   1   0   0   0   0
 3    0   1   0   4   6   1   0   0   0   0   0   0
 4    0   0   0   0   4   8   1   0   0   0   0   0
 5    2   1   0   0   0   2  10   0   0   0   0   0
 6    2   0   0   0   0   0   3  13   0   0   0   0
 7    1   0   0   0   0   0   0   0   7   0   2   0
 8    2   0   0   0   0   0   0   0   1   8   0   0
 9    1   0   0   0   0   0   0   0   0   0   7   1
 10   0   0   0   0   0   0   0   0   0   0   0   1

> cor(v,s) [1] 0.7966424 > s=SPSSforag$s28

v=SPSSforag$v28
rs=100
rv=100
  s

v 0 1 2 3

 0  56   0   0   1
 1   0 101   6   0
 2   0   7 118   2
 3   2   0   2  49
 SPSSforag$subsp
 [1] 1 3 1 1 3 2 3 1 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 1 1 1 2 1 1 2 2 1 1 1 2
[83] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 3 1 3 1 1 3 1 3 1 3 1 3 3 3 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2

[165] 2 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 3 3 3 2 3 2 2 2 3 2 2 3 3 3 2 2 2 2 2 2 2 2 3 3 2 3 3 3 2 2 2 3 3 2 3 3 3 3 2 3 3 2 2 1 2 3 2 3 2 2 2 3 3 2 3 3 2 2 2 2 2 2 2 2 [247] 2 2 2 2 3 2 2 2 3 3 3 2 2 2 3 2 3 3 2 2 2 2 2 2 3 3 3 2 3 3 3 1 3 2 3 1 2 3 3 2 2 2 2 1 1 2 1 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [329] 3 3 3 3 3 3 3 3 3 3 3 attr(,"value.labels")

   Herding Agriculture    Aquatics   Gathering     Hunting 
       "5"         "4"         "3"         "2"         "1" 

> table(VauxSPSSforag$subsp)

 1   2   3 
77 142 120 
#                                                                   setwd("C:/My Documents/sccs")  #<-- changed
library(sccs)
data(sccs)   #this works
#                                                                   setwd("C:/My Documents/sccs")  #<-- changed
#                                                                   library(sccs)
#                                                                   data(sccs)   #this works
#				Binford47vars339narrowCompleteOnlyNo5No7.xlsx 
#		Vauxforag <- Binford47vars339narrowCompleteOnlyNo5No7.csv")
#		 setwd("C:/My Documents/Binford/Best_files/Vaux5_7")
 setwd('/Users/drwhite/Documents/3sls/sccs/') 
 Vauxforag <- read.csv("Binford47vars339extra1a.csv")
 Vauxforag <- read.csv("Binford80vars339extra1a.csv") 

Print0

  > Vauxforag[1:12,1:91]
   v1            v2              v3   v4   AtlasName Atlas. EA Lang       v5    v6
1  NA                                                    NA           SUBPOP VEGNU
2  NA                                                    NA             p118  p118
3   1         Punan       Indonesia ASIA  PUNAN . .    1243      Mp  suspect     1
4   2         Batek     Philippines ASIA                 NA          suspect     1
5   3          Kubu       Indonesia ASIA  KUBU. . .    1114          suspect     1
6   4       Shompen Nicobar Islands ASIA                 NA         HGF only     1
7   5          Onge Andaman Islands ASIA                 NA         HGF only     1
8   6         Jarwa Andaman Islands ASIA                 NA         HGF only     1
9   7 Ayta-Pinatubo     Philippines ASIA                 NA          suspect     1
10  8  North Island Andaman Islands ASIA                 NA         HGF only     1
11  9        Semang        Malaysia ASIA  SEMANG. .     148          suspect     1
12 10        Veddah       Sri Lanka ASIA  VEDDA . .     145          suspect     2
                                                     v7   v8     v9   v10     v11      v12  v13  v14
1                                                   VEG SOIL SNOWAC GROWC     CRR    NAGP  GATH HUNT
2                                                  p118 p118    p86   p86     p60      p86 p118 p118
3                                  Tropical rain forest    U      0    12 3444.31 4,738.19   65   30
4                                  Tropical rain forest    H      0    12 2546.75 3,852.15   65   30
5                                  Tropical rain forest    U      0    12 3138.13 4,603.80   70   25
6                                  Tropical rain forest    O      0    12  3189.8 5,514.85   50   15
7                                  Tropical rain forest    O      0    12 3081.84 4,400.17   35   20
8                                  Tropical rain forest    O      0    12 2301.48 3,713.79   50   20
9                                  Tropical rain forest    H      0    12  3051.2 2,922.91   87    8
10                                 Tropical rain forest    O      0    12 1752.74 2,966.27   60   35
11                                 Tropical rain forest    U      0    12  2560.7 3,315.37   50   40
12 equatorial savannah woodland/broadleaf tree savannah    O      0    12 1728.16 3,251.70   65   30
    v15   v16    v17       v18   v19     v20     v21   v22    v23    v24    v25   v26         v27
1  FISH SUBSP GRPPAT %polygyny  HHSZ HHSZdry HHSZwet FAMSZ GROUP1 GROUP2 GROUP3 TLPOP     DENSITY
2  p118  p118   p118      p281  p288    p288    p288  p288   p118   p118   p118  p118        p118
3     5     2      1         4  2.91                  3.61     22     30     62   349 0.117905405
4     5     2      2         5  5.06                  3.95     19     58          424 0.432653061
5     5     2      1                                           12               11800 0.092000624
6    35     2      1         7  3.05                  3.62   19.3     31          342 0.397674419
7    45     3      1         0 14.04    2.08   26.00  3.11     10     23     70   393 0.401020408
8    30     2      1         0 17.86    3.72   32.00          8.6   25.8     65   255 0.447368421
9     5     2      2         8                                 11     35    200  1645 0.918994413
10    5     2      1            4.10                  3.77     11     43     80   464  0.33381295
11   10     2      1         3 13.21    2.91   23.50  3.45     17     34     71   366 0.175961538
12    5     2      1         5  4.23                  3.85     14     29     72    72 0.184615385
         v28     v29     v30      v30.1   v31    v32    v33    v34   v36   v37   v38 v39         v40
1  AREAx1000     PET      AE         ET   LAT   LONG R HIGH  R LOW   CRR  CMAT   MCM MWM male Height
2       p118     p86     p86        p60   p60    p60    p60    p60   p60   p60   p60 p60        p183
3       2960 1637.66 1637.66 25.2727273     3    114 361.02 211.58  26.4    26  26.8   4      1532.5
4        980 1778.69 1445.65 25.1911021    10 119.11 445.26  26.04 27.84 26.89 28.78   5        1531
5     128260 1609.52 1609.52 24.3897764  3.04 102.69 378.42 141.98  26.2  25.5 26.89            1578
6        860 1794.49 1794.47 24.3272727     7  93.77  485.8   49.1  28.2  26.7  29.7   7      1594.5
7        980 1679.34 1566.25  24.275122  10.7  92.47 521.72   28.7 27.17 26.04 28.29   0            
8        570 1650.56 1414.14 23.6888889 12.19  92.37 388.75  12.95 26.54 25.36 27.71   0            
9       1790 1678.16 1224.17 23.3745455  15.5 120.33 753.87   9.14 26.89 25.39 28.39   8        1464
10      1390 1605.91 1235.08 23.2911628 13.32  92.89 391.92   0.02 26.49 25.11 27.86            1491
11      2080 1320.69 1320.69 23.2576419  5.86    101 349.44 144.08  24.6 24.02 25.18   3        1491
12       390 1779.68 1305.34 22.8852989  8.58  81.25 375.36  18.54 27.75 25.56 29.94   5        1484
             v41     v42       v43   v44   v45     v46      v47       v48 CASE..      v49       v50
1  female Height male kg female kg MAAMm MAAMf     TFR  MALE PI AUSTRALIA CASE #       AE      BIO5
2           p183    p183      p183  p281  p281 MARLOWE MARLOWE?      DOUG             p86      BIOS
3           1468                                                        0      1 1,637.66 56,660.51
4           1432    46.5      40.6  19.9  15.1     5.2                  0      2 1,445.65 30,378.80
5           1462                                                        0      3 1,609.52 56,157.12
6           1480    57.9     47.55    18    15                          0      4 1,794.47 59,274.80
7                                                  2.6                  0      5 1,566.25 57,386.48
8                                                                       0      6 1,414.14 29,082.34
9           1388                      14    12                          0      7 1,224.17 57,366.23
10          1400                                                        0      8 1,235.08 20,875.11
11          1408                      19  14.5    4.85       80         0      9 1,320.69 50,350.34
12          1420                            14                          0     10 1,305.34 17,266.13
     v51    v52    v53    v54    v55   v56    v57       v58 v339list  X            X.1      v59
1   BAR5  WATD  WLTGRC WATRET DEFPER NOMOV DISMOV     HYTYP       NA NA          NAME  SYSTATE3
2  BARS            p86    p86    p86   p86    p86 p288col 3       NA NA                    p320
3  11.96      0      0 174.52   0.00    45    250      (Fn)        1  1         Punan         2
4   7.89 333.05      3  84.98  25.00     6     50      (Fn)        2  2         Batek         2
5  12.12      0      0 174.52   0.00     7    140                  3  3          Kubu         3
6  10.75      0      0 306.71   0.00                    (N)        4  4       Shompen         3
7  13.05 113.09      0 259.95  25.00                    (M)        5  5          Onge         4
8   7.83 236.42      2 196.74  41.67                      M        6  6         Jarwa         4
9  19.63 453.99      4  65.80  41.67                               7  7 Ayta-Pinatubo         2
10  7.04 370.84      3 178.00  41.67                               8  8  North Island         4
11 15.19      0      0 174.52   0.00                      N        9  9        Semang         4
12  5.31 474.32      2 129.28  41.67                     Fm       10 10        Veddah         4
      v60     v61     v62   v63      v64      v65      v66      v67       v68      v69      v70
1  MHSSET MHSSET2 FAMHOUS G2MHS G2HMSET2 G2HMSET3 G2BASORD PREVALUE PREDG2MH  GIFAMSZ  G2FAMSZ 
2    p320    p320    p320  p320     p320     p320     p320     p320      p320     p320     p320
3       3       2    1.68  4.93        1        1        3     7.98      6.35     6.09     8.31
4       5       1    1.30 11.42        5        3        6     4.20      4.21     4.88    14.91
5                                      9                                                       
6       5       1    0.84 10.16        6        3        4     3.75      3.75     5.33     8.56
7       1       2                                                                 3.27     7.52
8       1       2                                                                              
9       5       1                                                        3.82     2.72     8.66
10      5       1                                                        3.95     2.92    11.41
11      1       2                                                                 5.21    10.43
12      5       1                                                        5.07     3.64     7.53
      v71       v72     v73    v74      v75    v76     v77     v78       v79     c80  X.2
1  GIMHS  POLYSCAL  POLPOS  CLASS  PEROGAT  MONEY  OCCSPE  COMMUN  COMSTFUN  OWNERS  NO. 
2    p320      p320    p320   p320     p320   p320    p320    p320      p320    p320 Case
3                 2       3      1        2      2       1       4         1       1    1
4   3.752         2       3      1        1      1       1       2         1       4    2
5                 2       3      1        1      1       1       5         1       1    3
6   6.332         2       3      1        1      1       1       3         1       2    4
7   0.882         2       1      1        1      1       1       1         1       1    5
8   0.272         2       1      1        1      1       1       1         1       1    6
9   2.622         2       3      1        1      1       1       1         1       3    7
10  4.392         2       1      1        1      1       1       4         1       1    8
11  0.592         2       3      1        1      1       1       4         1       1    9
12  3.312         2       3      1        1      1       1       5         1       1   10
                X.3            X.4
1            STATE           NAME 
2                                 
3        Indonesia          Punan 
4      Philippines          Batek 
5        Indonesia           Kubu 
6  Nicobar Islands        Shompen 
7  Andaman Islands           Onge 
8  Andaman Islands          Jarwa 
9      Philippines  Ayta-Pinatubo 
10 Andaman Islands   North Island 
11        Malaysia         Semang 
12       Sri Lanka         Veddah 


 save(Vauxforag,file="BinfordVaux.RData")  #<-- March 10, 2012
 data(Vauxforag) #doesnt work but no mind for now (need with main program)
 data(BinfordVaux)
 library(stats)
 princomp(Vauxforag) #works
#now make VauxB.csv from the principal components
setwd("C:/My Documents/Binford/Best_files/Vaux5_7")
VauxB <- read.csv("VauxB.csv")
save(VauxB,file="VauxB.RData")
save(VauxB,file="VauxB.RData")

Extract results from princomp

Std and Rotation
#http://stat.ethz.ch/R-manual/R-devel/library/stats/html/princomp.html
##https://stat.ethz.ch/pipermail/r-devel/1998-August/018454.html 1998 revised
###http://www.stat.ualberta.ca/~kcarrier/STAT575/PCA_R.doc Labdsv library
#Modify their example:
princomp(Vauxforag, cor = TRUE) # =^= prcomp(Vauxforag, scale=TRUE)
prcomp(Vauxforag, scale=TRUE) #THESE ARE ALL THE RESULTS NEEDED STOP HERE!!!
## Similar, but different:
## The standard deviations differ by a factor of sqrt(49/50)

summary(pc.cr <- princomp(Vauxforag, cor = TRUE))
loadings(pc.cr)  ## note that blank entries are small but not zero
print(pc.cr, digits = 4, cutoff = 2.5)  ## note that blank entries are small but not zero
prcomp(Vauxforag, scale=TRUE)
plot(pc.cr) # shows a screeplot.
biplot(pc.cr)

#loadings(x)
## S3 method for class 'loadings':
#print(x, digits = 3, cutoff = 0.1, sort = FALSE, ...)
## S3 method for class 'factanal':
#print(x, digits = 3, ...)


## Formula interface
princomp(~ ., data = Vauxforag, cor = TRUE)

## NA-handling
Vauxforag[1, 2] <- NA
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
                  data = Vauxforag, na.action=na.exclude, cor = TRUE)
pc.cr$scores[1:5, ]  #works

Chapter 2. Specific databases

Foragers

setwd("C:/My Documents/Binford/best_files")
forag <- read.csv("Binford47vars339.csv")
save(forag,file="Binford47vars339.RData")
#try:
forag <- read.csv("bin.csv")
save(forag,file="bin.RData")
save(forag,file="Binford47vars339.RData")


Hamilton, Marcus J., Bruce T. Milne, Robert S. Walker, Oskar Burger, and James H. Brown. 2007. The complex structure of hunter–gatherer social networks. Proceedings of the Royal Society B (UK). James H. Brown is known for The fractal nature of nature: power laws, ecological complexity and biodiversity with Vijay K. Gupta, Bai-Lian Li, Bruce T. Milne, Carla Restrepo and Geoffrey B. West.] Data supplement.

Read and change an entry in an Rdata file

edit the csv file
setwd("C:/My Documents/MIjaj")
jajman<-read.csv("jaj.csv")
save(jajman,file="jajman.Rdata")
EduMod85 - EduMod86

Chapter 3. Extending the database

Use Microsoft Access to