Submission Summary


Screen Shot 2013-08-16 at 10.24.02 AM.png
Title: Complex Social Science (CoSSci) Supercomputer Gateway: Autocorrelation Modeling, Kinship Network Modeling, k- and pairwise cohesion in Large Networks & Open Opportunities for Online Education.
Author(s): White, Douglas R.1, Oztan, Tolga B2, Sinkovits, Robert3, Menezes, Telmo4

Institute(s): 1UC Irvine, IMBS, La Jolla, CA, United States, 2UC Irvine, MBS, Irvine, CA, United States, 3SDSC San Diego Supercomputer, Gordon Applications Lead, San DIego, CA, United States, 4French :National Center for Scientific Research (CNRS), EHESS, CAMS, Paris, France

Co-PIs Sept 2013-Sept 2014
Suresh Marru
Tolga Oztan
Paul Rodriguez
Michael D. Fischer CSAC
Potential 2014: David Henig at Kent, PhD of Lyon, CSAC
Potential 2014: Daniel Wigmore-Shepherd, M.Phil. of Lyon CSAC
Potential 2014: Tom Uram
Potential 2014: Stephen M. Lyon CSAC
"Complex Social Science at SDSC Progress Report 2012-2013"

Progress Report ComplexSocialScienceatSDSCprogressReportB.docx

The Complex Social Science (CoSSci) Supercomputer Gateway project developed a Galaxy gateway site at UCI connected to Trestles and to a Virtual Machine at UCI (See: with analytic R software, much improved from last year, duplicated at each machine. For the Standard Cross-Cultural Sample (N=186, V=2800) VM run time for a single variable is two minutes; but 15-20 minutes at Trestles because of queue time. In 2014, with the help of Paul Rodriguez, we will implement a randomForest application at Trestles and the VM that will (1) estimate likely near-complete subsets of variables so that Trestles can do more of the main modeling in 1-3 runs. Results of any early modeling projects on Trestles and R gui have had outstanding results, each matching the other, and with world maps of key variables. Multiple mapping using R scripts for original and imputed variables was a major accomplishment that greatly enhances research and classroom learning. The Galaxy site is much easier for students to use than R gui modeling at work, home or classroom computers. Downloads of working R gui scripts from Trestles with model output provides a learning ramp for students and researchers. The first online classroom startup on Sept 15, 2013, runs for 12 weeks. Online coursework will include distribution through C-Commons to new instructors. This will greatly boost usage. Some fundamental research questions have already been addressed by some of the 30 chapter authors of the Wiley Companion to Cross-Cultural Research and Conference presentations of the core researchers (White, Eff, Dow, Oztan) have spread the word about the new statistical modeling and datasets now widely available through the CoSSci project. We are working on eventual servicing to expand usage and software communities of use and courseware with the help of Co-PI Suresh Marru and Adobe software (e.g., Education and Science Communities). UCILearn online is distributing courseware for our social science Gateways projects, which also include access to large-network software for measuring cohesive subgroups and effects of multiconnectivity at GORDON. The CoSSci Gateway will grow to also include Complex Network analysis and simulation models of the evolutionary aspects of human complexity. Datasets include not only environmental and climatic data, but will grow to include disease and genetic data at the population level; also historical data on growth of cities and trade routes, historical empires, and complex economies, while also modeling interfaces between ethnographic and historical data and archaeology. A related historical project will provide data interfaces with data on comparative study of historical of Empires. A new database corrects postcolonial Ethnographic Atlas coding biases when compared against coded archaeological data. As computational power grows for managing networked data (limited causal graph explorations but also larger networks of observed data path analysis, and panel analysis of temporal sequences), larger-scale modeling can make use of more complex questions in supercomputer modeling in the social, economic and historical sciences. In addition to updated analytic software contributions from Fischer's group at CSAC, University of Kent (UK), Co-PI Fischer will provide the resource services framework for people to integrate summaries of ethnographic information relevant to coded data variables and provide modeling examples and discussions of statistical inferences and problems of interpretation and validation. He and UK’s Janet Bagg have created a summarizing algorithm for ethnographic literature that can link specific categories in coded data, through Murdock's Outline of Cultural Materials (OCM), to deliver summarized content from ethnography page references, a tremendous boon for students, coders, and analysts. Virtual servers at Kent (UK) will link to CoSSci. In 2014 we will provide major online service to 30 chapter authors of our groups Wiley Companion to Cross-Cultural Research (Editors White, Dow, Eff and Gray) who are using our CoSSci modeling facilities . These authors are likely contributors to future courses (online and off) that use our portal for their students.

From Mike Fischer, CSAC, Kent: Know not how this figures into the process, but if not premature, might be worth mentioning that in 2014 will be an attempt to build up the services framework for the resource so that people can integrate into specific applications (probably mostly for teaching in the form of canned examples/problems), and integrating external services to support the work on the resource platform (e.g. summaries of ethnographic information relevant to the variables).

Janet Bagg and I have created a summarising algorithm for ethnographic literature that be standalone or leverage the OCM. If we can link the variables used in the SCCS with some content to the OCM we should be able to deliver summarised content w/page references (which avoids many of the problems we have with copyright at HRAF). I am going to New Haven last week of this month to finalise an architecture that reorganises the back end of the HRAF application, which includes hooks for services like this. I also have the full contents of eHRAF at Kent, and can provide this service on an experimental basis from one of my virtual servers rather sooner than an official HRAF service which would be at least late summer of 2014.

I also have to check to see if I have to do any paperwork even though there are no resources in the proposal for Kent. Will do that today.

I'm travelling on Thursday to US, but will have good email etc. while there.

I mean relate the codes used in the SCCS to the OCM codes which can then be used to fetch text. HRAF analysts use OCM subject categories and values to mark up every paragraph of each text. Its actually a poor thesaurus, but works surprisingly well as a context coding systems, which is what you want for comparative research … Janet and I have alternative ways to find topics in the texts, leveraging the OCM coding they do helps a lot.

Abstract The Complex Social Science (CoSSci) Supercomputer Gateway (portal implementation 2013 at UCI/SDSC@UCSD) provides remote access for researchers and classrooms or online classes to do advanced computing in social science and environmental comparative studies of human societies. Four major comparative databases are available to date with the following N=cases and V=variables: Standard Cross-Cultural Sample (N=186,V=2800); Binford Foragers (N=339,V=1800); Ethnographic Atlas (N=1270,V=399); Western Indians: Comparative Environments, Languages and Cultures of 172 Western American Indian Tribes. Modeling includes autocorrelation controls, imputation of missing data, Hausman tests for exogeneity, and many other inferential statistical tests, world and detailed mapping of variables. Work in 2014 will include randomForest estimation of clusters of independent variables and systemfit modeling of networks of variables to obtain path analyses and temporal panel effects.


  • The UC Complex Social Science (CoSSci) Supercomputer Gateway (portal implementation 2013 at UCI/SDSC@UCSD) provides remote access for researchers and classrooms or online classes to do advanced computing. (Large) network k-cohesion (White et_al.) and pairwise cohesion (Oztan et_al.) return linked lists of all k-connected subsets and k-connected pairs. Menezes´ Synthetic tools analyze and perform evolutionary modeling of complex networks, including the 90+ kinship networks in *net format hosted at the Kinsources website, and return variables for societal databases such as those below.
  • Causal graphs modeling for rectangular databases with network W matrices for inclusion of autocorrelation effects are available on-line for a growing number of datasets. Currently these include the Ethnographic Atlas (n=1500 societies), Standard Cross-Cultural Sample (n=186), Binford´s Foragers (n=339), Jorgensen´s Western Indians (n=172), and will eventually include many new cross-national, cross-polity, cross-corporate and comparative psychology datasets. Each new dataset requires its own W-matrix networks, and if missing data are to be imputed, with principal components of fully coded data suitable for multiple imputation. These datasets are intended for use in online courses (Coursera; Moodle) on Complex Networks, Cross-Cultural/-Polity/-National/-Economic studies, quantitative methods in the Social Sciences, and a great variety of topical courses. Results of early studies are reported. A Wiley 2013 textbook, Companion to Cross-! Cultural Research (Eds. White, Eff, Dow, Gray), will be useful for instructors and contains chapters published on-line that are useful guides for students learning complex network and comparative approaches in the Social Sciences. Principal keywords: Causality, Complexity.
Presentation type: Paper
Session title: Large Scale Networks Analysis
Keywords: Community, Software, Statistics

Models for v51 FaHelpsMoWithInfant

UCI VM CLICK AND EAF1c LOCAL for a 2 minute model.

Click each image twice to enlarge. Click command +++ to enlarge wiki page size

FaHelpsMoWithInfant rsq=.38 v51 v1257,v1258,v154,v52,v53,v626,v817,v921,v819,sqv819 rsq=.38 delete v819
FaHelpsMoWithInfant rsq=.NONE v51 v819 was dropped BUT THIS DOES NOT WORK WITHOUT v819 as an UNrestricted variable (see below)
FaHelpsMoWithInfant rsq=.38 v51 v819 an UNrestricted variable which is squared in sqv819 to work as an independent variable
SQUARE OF FaHelpsMoWithInfant rsq=.35 sqv51 as depvar created as NEW VARIABLE but Rsq not improved
v1197 WiMo Avoidance v152,v154,v1685,v203,v234,v236,v64,v68 delete,v80

Models for v1197 Wi Mo Avoidance

CLICK AND EAF1c LOCAL for UCI VM and 2 minute model results. Results show that you have to Delete v80 because of high VIF (variable inflation with v68). You can get your own copy of a csv file of results equivalent to Galaxy1-EAF1c.csv - 13 - 15 - 17 by filling the EAF1c LOCAL windows appropriately and pressing the blue execute button at the bottom of the Galaxy screen. Each result will vary slightly because of probabilistic variation in imputation of missing data.

Screen Shot 2013-06-05 at 9.35.47 AM.png v1197 Wife's Mother Avoidance rsq=.44 v152,v154,v1685,v203,v234,v236,v64,v68,v80

v152	+Scale 4- Urbanization
v154	-Scale 6- Land Transport
v1685	-Chronic Resource Problems (resolved Ratings)
v203	+Dependence on Gathering
v234	+Settlement Patterns (Complex settlements)
v236	+Jurisdictional Hierarchy of Local Community
v64	-Population Density
v68	-Form of Family (see 79, 80)
v80	+Family Size (Delete because of high VIF)

Screen Shot 2013-06-05 at 12.59.17 PM.png Wife's Mother Avoidance rsq=.425 v152,v154,v1685,v203,v234,v236,v64,v68 DROP v152

Screen Shot 2013-06-05 at 1.14.31 PM.png Wife's Mother Avoidance rsq=.414 v154,v1685,v203,v234,v236,v64,v68 DROP v154

Screen Shot 2013-06-05 at 1.41.20 PM.png Wife's Mother Avoidance rsq=.463 v154,v1685,v203,v234,v236,v64,v68,v818 ADDED v818

v154	-0.046 p=.157 Scale 6- Land Transport	
v1685	-0.089 p=.012 Chronic Resource Problems (resolved Ratings)
v203	+0.169 p=.003 Dependence on Gathering
v234	+0.055 p=.085 Settlement Patterns (Complex settlements)
v236	+0.246 p=.003 Jurisdictional Hierarchy of Local Community
v64	-0.142 p=.000 Population Density
v68	-0.036 p=.011 Form of Family (see 79, 80)
v818	-0.013 p=.022 Imptnc Gathering
table(sccsA$v203,sccsA$v818)  # cor.test = 0.7861941
    0  5 10 15 20 25 30 35 40 45 50 65 75
 0 19 63  0  1  0  3  0  0  0  0  0  0  0
 1  1 39  0  0  5  5  1  0  0  0  0  0  0
 2  0 11  1  0  3  8  0  0  0  0  0  0  0
 3  0  2  0  0  2  2  0  1  1  0  1  0  0
 4  0  0  0  0  0  3  1  1  2  1  0  1  0
 5  0  1  0  0  0  1  0  0  0  2  0  0  0
 6  0  0  0  0  0  0  0  0  0  2  0  0  1
 8  0  0  0  0  0  0  0  0  0  0  0  0  1


CoSSci Background, Screenshots and Instructions

Screen Shot 2013-04-09 at 4.34.21 PM.png
Screen Shot 2013-04-09 at 4.18.52 PM.png

The Galaxy/CoSSci screen will have the blanks prefilled for entering variables for models in the EAF1c Dow & Eff Functions1 Model. A new dependent variable is being added at the top screen. Note: After 10 minutes click the Name of your request and the upper right whiry; when the diskette image appears, click, and the *.csv can be downloaded from your "downloads" list which may not be visible on your screen but in the background.

How to upload large datasets

Run CoSSci

1: EAF1
format; tabular, database ?
press green button: to run this job again
1: EAF1
format; tabular, database ?
runs relaimpo: says
Unnamed history
9.3 KB
2: DRW2
1: EAF1
192 lines
format: tabular, database: ?

This is the global version of package relaimpo. If you are a non-US user, a version with the interesting additional metric pmvd is available from Ulrike Groempings web site at [1] "addesc" "args

"1","Dependent variable='valchild': Degree to which society values children"
"v1260","Total Pathogen Stress",0.116312467067,0.078,0.394,"",1.406,0

Direct download

Use Dow and Eff Simple Functions Vers 0 CoSsci ---> <-- Alan Lomax youtube added - CLICK THE SCREEN

If you want to save the workspace on your own machine do the following: from your root directory on a mac, for example setwd('sccs') load(url(""),.GlobalEnv) save(bdd,bew,bll,brr,doOLS,doMI,kln,gSimpStat,CSVwrite,mkdummy,addesc, chkvarbs,chkpmc,newaux,sccsA,tt,sccsAkey, file="DE7.Rdata")

That saves all the stuff in the single Rdata file DE7.Rdata. Of course, if you just want to save one of the data files, you can do this (example is for sccsA): save(sccsA,file="sccsA.Rdata")

Now go to Dow and Eff Simple Functions

Now go to Dow_and_Eff_Simple_Functions_Vers_0

Start downloading from

Then, skip the load just below and substitute



a<-sccsAkey[evm,];a[grep("catego",a$varbtype),] #make sure variables are ordinal
Error: object 'sccsAkey' not found

Gateway = UC Complex Social Science (CoSSci) SupercomputerGateway

tool1 = to be named

later tool=

CoSSci: UC Complex Social Science (CoSSci) Supercomputer Gateway

Trestles & Gordon

  • Trestles - online courses usage
  • Gordon - researcher projects
Menezes - Synthetic tools - kinship data
Oztan - pairwise cohesion - foragers, coauthorships
White & Sinkovits - k-cohesion - Gordon: World economy 5 sectors, coauthorships
White & Sinkovits - regge - World economy / Tlaxcala 2 villages


- Core Software - Irvine Social Science Gateway Anthon Eff -- Manualv6.pdf-- CCDmanual0.pdf / ACCCR

How to Turn Your Project into a Science Gateway (background: Obsolete);jsessionid=0E8E2CB0EEB79C44B477B01653849973.myaccount_a_14b?link=kln2s.redirect&changedAlts=

INSNA paper May 2013

What goes into CoSSci

Opening Screenshot

Screen Shot 2013-04-21 at 9.19.38 AM.png