CoSSci Supercomputer Gateway

From InterSciWiki
Jump to: navigation, search
DELETE: NOT NOT and John Saska. Tom: that name will be fine. The only thing that matters is that we can ssh to it.

Submission Summary


Screen Shot 2013-08-16 at 10.24.02 AM.png
Title: Complex Social Science (CoSSci) Supercomputer Gateway: Autocorrelation Modeling, Kinship Network Modeling, k- and pairwise cohesion in Large Networks & Open Opportunities for Online Education.
Author(s): White, Douglas R.1, Oztan, Tolga B2, Sinkovits, Robert3, Menezes, Telmo4

Institute(s): 1UC Irvine, IMBS, La Jolla, CA, United States, 2UC Irvine, MBS, Irvine, CA, United States, 3SDSC San Diego Supercomputer, Gordon Applications Lead, San DIego, CA, United States, 4French :National Center for Scientific Research (CNRS), EHESS, CAMS, Paris, France

Community Account Completed by Tom Uram, next: [ Nicole Wolter} <>Galaxy Certificate? no phone UCSD LEG-2230

Co-PIs Sept 2013-Sept 2014
Suresh Marru
Tolga Oztan
Paul Rodriguez
Michael D. Fischer CSAC
Potential 2014: David Henig at Kent, PhD of Lyon, CSAC
Potential 2014: Daniel Wigmore-Shepherd, M.Phil. of Lyon CSAC
Potential 2014: Tom Uram
Potential 2014: Stephen M. Lyon CSAC
Potential 2014: [[
Potential 2014: [[

for 2014

"Complex Social Science at SDSC Progress Report 2012-2013"

Progress Report ComplexSocialScienceatSDSCprogressReportB.docx

The Complex Social Science (CoSSci) Supercomputer Gateway project developed a Galaxy gateway site at UCI connected to Trestles and to a Virtual Machine at UCI (See: with analytic R software, much improved from last year, duplicated at each machine. For the Standard Cross-Cultural Sample (N=186, V=2800) VM run time for a single variable is two minutes; but 15-20 minutes at Trestles because of queue time. In 2014, with the help of Paul Rodriguez, we will implement a randomForest application at Trestles and the VM that will (1) estimate likely near-complete subsets of variables so that Trestles can do more of the main modeling in 1-3 runs. Results of any early modeling projects on Trestles and R gui have had outstanding results, each matching the other, and with world maps of key variables. Multiple mapping using R scripts for original and imputed variables was a major accomplishment that greatly enhances research and classroom learning. The Galaxy site is much easier for students to use than R gui modeling at work, home or classroom computers. Downloads of working R gui scripts from Trestles with model output provides a learning ramp for students and researchers. The first online classroom startup on Sept 15, 2013, runs for 12 weeks. Online coursework will include distribution through C-Commons to new instructors. This will greatly boost usage. Some fundamental research questions have already been addressed by some of the 30 chapter authors of the Wiley Companion to Cross-Cultural Research and Conference presentations of the core researchers (White, Eff, Dow, Oztan) have spread the word about the new statistical modeling and datasets now widely available through the CoSSci project. We are working on eventual servicing to expand usage and software communities of use and courseware with the help of Co-PI Suresh Marru and Adobe software (e.g., Education and Science Communities). UCILearn online is distributing courseware for our social science Gateways projects, which also include access to large-network software for measuring cohesive subgroups and effects of multiconnectivity at GORDON. The CoSSci Gateway will grow to also include Complex Network analysis and simulation models of the evolutionary aspects of human complexity. Datasets include not only environmental and climatic data, but will grow to include disease and genetic data at the population level; also historical data on growth of cities and trade routes, historical empires, and complex economies, while also modeling interfaces between ethnographic and historical data and archaeology. A related historical project will provide data interfaces with data on comparative study of historical of Empires. A new database corrects postcolonial Ethnographic Atlas coding biases when compared against coded archaeological data. As computational power grows for managing networked data (limited causal graph explorations but also larger networks of observed data path analysis, and panel analysis of temporal sequences), larger-scale modeling can make use of more complex questions in supercomputer modeling in the social, economic and historical sciences. In addition to updated analytic software contributions from Fischer's group at CSAC, University of Kent (UK), Co-PI Fischer will provide the resource services framework for people to integrate summaries of ethnographic information relevant to coded data variables and provide modeling examples and discussions of statistical inferences and problems of interpretation and validation. He and UK’s Janet Bagg have created a summarizing algorithm for ethnographic literature that can link specific categories in coded data, through Murdock's Outline of Cultural Materials (OCM), to deliver summarized content from ethnography page references, a tremendous boon for students, coders, and analysts. Virtual servers at Kent (UK) will link to CoSSci. In 2014 we will provide major online service to 30 chapter authors of our groups Wiley Companion to Cross-Cultural Research (Editors White, Dow, Eff and Gray) who are using our CoSSci modeling facilities . These authors are likely contributors to future courses (online and off) that use our portal for their students.

From Mike Fischer, CSAC, Kent: Know not how this figures into the process, but if not premature, might be worth mentioning that in 2014 will be an attempt to build up the services framework for the resource so that people can integrate into specific applications (probably mostly for teaching in the form of canned examples/problems), and integrating external services to support the work on the resource platform (e.g. summaries of ethnographic information relevant to the variables).

Janet Bagg and I have created a summarising algorithm for ethnographic literature that be standalone or leverage the OCM. If we can link the variables used in the SCCS with some content to the OCM we should be able to deliver summarised content w/page references (which avoids many of the problems we have with copyright at HRAF). I am going to New Haven last week of this month to finalise an architecture that reorganises the back end of the HRAF application, which includes hooks for services like this. I also have the full contents of eHRAF at Kent, and can provide this service on an experimental basis from one of my virtual servers rather sooner than an official HRAF service which would be at least late summer of 2014.

I also have to check to see if I have to do any paperwork even though there are no resources in the proposal for Kent. Will do that today.

I'm travelling on Thursday to US, but will have good email etc. while there.

I mean relate the codes used in the SCCS to the OCM codes which can then be used to fetch text. HRAF analysts use OCM subject categories and values to mark up every paragraph of each text. Its actually a poor thesaurus, but works surprisingly well as a context coding systems, which is what you want for comparative research … Janet and I have alternative ways to find topics in the texts, leveraging the OCM coding they do helps a lot.

Abstract The Complex Social Science (CoSSci) Supercomputer Gateway (portal implementation 2013 at UCI/SDSC@UCSD) provides remote access for researchers and classrooms or online classes to do advanced computing in social science and environmental comparative studies of human societies. Four major comparative databases are available to date with the following N=cases and V=variables: Standard Cross-Cultural Sample (N=186,V=2800); Binford Foragers (N=339,V=1800); Ethnographic Atlas (N=1270,V=399); Western Indians: Comparative Environments, Languages and Cultures of 172 Western American Indian Tribes. Modeling includes autocorrelation controls, imputation of missing data, Hausman tests for exogeneity, and many other inferential statistical tests, world and detailed mapping of variables. Work in 2014 will include randomForest estimation of clusters of independent variables and systemfit modeling of networks of variables to obtain path analyses and temporal panel effects.


  • The UC Complex Social Science (CoSSci) Supercomputer Gateway (portal implementation 2013 at UCI/SDSC@UCSD) provides remote access for researchers and classrooms or online classes to do advanced computing. (Large) network k-cohesion (White et_al.) and pairwise cohesion (Oztan et_al.) return linked lists of all k-connected subsets and k-connected pairs. Menezes´ Synthetic tools analyze and perform evolutionary modeling of complex networks, including the 90+ kinship networks in *net format hosted at the Kinsources website, and return variables for societal databases such as those below.
  • Causal graphs modeling for rectangular databases with network W matrices for inclusion of autocorrelation effects are available on-line for a growing number of datasets. Currently these include the Ethnographic Atlas (n=1500 societies), Standard Cross-Cultural Sample (n=186), Binford´s Foragers (n=339), Jorgensen´s Western Indians (n=172), and will eventually include many new cross-national, cross-polity, cross-corporate and comparative psychology datasets. Each new dataset requires its own W-matrix networks, and if missing data are to be imputed, with principal components of fully coded data suitable for multiple imputation. These datasets are intended for use in online courses (Coursera; Moodle) on Complex Networks, Cross-Cultural/-Polity/-National/-Economic studies, quantitative methods in the Social Sciences, and a great variety of topical courses. Results of early studies are reported. A Wiley 2013 textbook, Companion to Cross-! Cultural Research (Eds. White, Eff, Dow, Gray), will be useful for instructors and contains chapters published on-line that are useful guides for students learning complex network and comparative approaches in the Social Sciences. Principal keywords: Causality, Complexity.
Presentation type: Paper
Session title: Large Scale Networks Analysis
Keywords: Community, Software, Statistics

Models for v51 FaHelpsMoWithInfant

UCI VM CLICK AND EAF1c LOCAL for a 2 minute model.

Click each image twice to enlarge. Click command +++ to enlarge wiki page size

FaHelpsMoWithInfant rsq=.38 v51 v1257,v1258,v154,v52,v53,v626,v817,v921,v819,sqv819 rsq=.38 delete v819
FaHelpsMoWithInfant rsq=.NONE v51 v819 was dropped BUT THIS DOES NOT WORK WITHOUT v819 as an UNrestricted variable (see below)
FaHelpsMoWithInfant rsq=.38 v51 v819 an UNrestricted variable which is squared in sqv819 to work as an independent variable
SQUARE OF FaHelpsMoWithInfant rsq=.35 sqv51 as depvar created as NEW VARIABLE but Rsq not improved
v1197 WiMo Avoidance v152,v154,v1685,v203,v234,v236,v64,v68 delete,v80

Models for v1197 Wi Mo Avoidance

CLICK AND EAF1c LOCAL for UCI VM and 2 minute model results. Results show that you have to Delete v80 because of high VIF (variable inflation with v68). You can get your own copy of a csv file of results equivalent to Galaxy1-EAF1c.csv - 13 - 15 - 17 by filling the EAF1c LOCAL windows appropriately and pressing the blue execute button at the bottom of the Galaxy screen. Each result will vary slightly because of probabilistic variation in imputation of missing data.

Screen Shot 2013-06-05 at 9.35.47 AM.png v1197 Wife's Mother Avoidance rsq=.44 v152,v154,v1685,v203,v234,v236,v64,v68,v80

v152	+Scale 4- Urbanization
v154	-Scale 6- Land Transport
v1685	-Chronic Resource Problems (resolved Ratings)
v203	+Dependence on Gathering
v234	+Settlement Patterns (Complex settlements)
v236	+Jurisdictional Hierarchy of Local Community
v64	-Population Density
v68	-Form of Family (see 79, 80)
v80	+Family Size (Delete because of high VIF)

Screen Shot 2013-06-05 at 12.59.17 PM.png Wife's Mother Avoidance rsq=.425 v152,v154,v1685,v203,v234,v236,v64,v68 DROP v152

Screen Shot 2013-06-05 at 1.14.31 PM.png Wife's Mother Avoidance rsq=.414 v154,v1685,v203,v234,v236,v64,v68 DROP v154

Screen Shot 2013-06-05 at 1.41.20 PM.png Wife's Mother Avoidance rsq=.463 v154,v1685,v203,v234,v236,v64,v68,v818 ADDED v818

v154	-0.046 p=.157 Scale 6- Land Transport	
v1685	-0.089 p=.012 Chronic Resource Problems (resolved Ratings)
v203	+0.169 p=.003 Dependence on Gathering
v234	+0.055 p=.085 Settlement Patterns (Complex settlements)
v236	+0.246 p=.003 Jurisdictional Hierarchy of Local Community
v64	-0.142 p=.000 Population Density
v68	-0.036 p=.011 Form of Family (see 79, 80)
v818	-0.013 p=.022 Imptnc Gathering
table(sccsA$v203,sccsA$v818)  # cor.test = 0.7861941
    0  5 10 15 20 25 30 35 40 45 50 65 75
 0 19 63  0  1  0  3  0  0  0  0  0  0  0
 1  1 39  0  0  5  5  1  0  0  0  0  0  0
 2  0 11  1  0  3  8  0  0  0  0  0  0  0
 3  0  2  0  0  2  2  0  1  1  0  1  0  0
 4  0  0  0  0  0  3  1  1  2  1  0  1  0
 5  0  1  0  0  0  1  0  0  0  2  0  0  0
 6  0  0  0  0  0  0  0  0  0  2  0  0  1
 8  0  0  0  0  0  0  0  0  0  0  0  0  1


Slides for talk at INSNA Hamburg 2013

CoSSci Background, Screenshots and Instructions <-- click here

Screen Shot 2013-04-09 at 4.34.21 PM.png
Screen Shot 2013-04-09 at 4.18.52 PM.png

The Galaxy/CoSSci screen will have the blanks prefilled for entering variables for models in the EAF1c Dow & Eff Functions1 Model. A new dependent variable is being added at the top screen. Note: After 10 minutes click the Name of your request and the upper right whiry; when the diskette image appears, click, and the *.csv can be downloaded from your "downloads" list which may not be visible on your screen but in the background.

How to upload large datasets

A new message has been posted to XSEDE User News.

Categories: Training, Science Gateways, Conferences
Start time: 16 Apr, 2013 09:00 CDT
End time: 18 Apr, 2013 18:00 CDT
Posted on 25 Mar, 2013 21:15 UTC by Suresh Marru

Globus Online is a service that makes reliable, secure transfer and sharing of large scientific datasets as simple as Dropbox. It’s also the first service accepted by XSEDE Operations for production deployment. That means Globus Online is an official software service on XSEDE, and an excellent tool to complement your use of XSEDE resources.

Whether your campus is a current or future user of XSEDE resources, we would like to invite you to GlobusWORLD 2013, April 16-18 at Argonne National Laboratory.

There will be a strong schedule of speakers and tutorials showcasing the latest and greatest Globus features, all geared toward solutions for research computing and HPC facility managers, administrators, and developers. We're also launching our new Globus Online sharing service, making GlobusWORLD the ideal time to learn about the powerful new capabilities Globus offers for managing big research data. Learn more at and register today.

Here are a few highlights from our program of speakers: · David Lifka from Cornell on campus research computing needs and the role of cloud technology · Brock Palen of the University of Michigan on the dreams and desires of a centralized HPC provider · David Skinner from NERSC on web APIs for big science · State of the (Globus) Union from Ian Foster of Argonne and University of Chicago · Sneak Previews of Globus Online dataset and metadata management, and Globus Genomics · Tutorial topics include Advanced Scripting, Endpoint Setup with Globus Connect Multiuser, the Globus API, and Advanced Endpoint Configuration

You’ll also have plenty of time for networking and computing camaraderie. On Wednesday evening, we’ll head to the Alder Planetarium for dinner under the stars and the Chicago skyline.

More info and registration are here:

Proposed additions

Run CoSSci

--> <--

1: EAF1
format; tabular, database ?
press green button: to run this job again
1: EAF1
format; tabular, database ?
runs relaimpo: says
Unnamed history
9.3 KB
2: DRW2
1: EAF1
192 lines
format: tabular, database: ?

This is the global version of package relaimpo. If you are a non-US user, a version with the interesting additional metric pmvd is available from Ulrike Groempings web site at [1] "addesc" "args

"1","Dependent variable='valchild': Degree to which society values children"
"v1260","Total Pathogen Stress",0.116312467067,0.078,0.394,"",1.406,0

Direct download

Use Dow and Eff Simple Functions Vers 0 CoSsci ---> <-- Alan Lomax youtube added - CLICK THE SCREEN

If you want to save the workspace on your own machine do the following: from your root directory on a mac, for example setwd('sccs') load(url(""),.GlobalEnv) save(bdd,bew,bll,brr,doOLS,doMI,kln,gSimpStat,CSVwrite,mkdummy,addesc, chkvarbs,chkpmc,newaux,sccsA,tt,sccsAkey, file="DE7.Rdata")

That saves all the stuff in the single Rdata file DE7.Rdata. Of course, if you just want to save one of the data files, you can do this (example is for sccsA): save(sccsA,file="sccsA.Rdata")

Now go to Dow and Eff Simple Functions

Now go to Dow_and_Eff_Simple_Functions_Vers_0

Start downloading from

Then, skip the load just below and substitute



a<-sccsAkey[evm,];a[grep("catego",a$varbtype),] #make sure variables are ordinal
Error: object 'sccsAkey' not found

The shell

is it

Setup at UCI

Here is your current vhost config.

  • Virtual host socsci
  • <VirtualHost>


   DocumentRoot /home/socsci/public_html    http://home/socsci/public_html        404 - Not Found
   ErrorLog /home/socsci/logs/error_log    http://home/socsci/logs/error_log        404 - Not Found
   TransferLog /home/socsci/logs/access_log    http://home/socsci/logs/access_log        404 - Not Found
   CustomLog /home/socsci/logs/access_log combined    http://home/socsci/logs/access_log combined        404 - Not Found

ScriptAlias /cgi-bin/ /home/socsci/public_html/cgi-bin/ http://home/socsci/public_html/cgi-bin/ 404 - Not Found Options SymLinksIfOwnerMatch Includes DirectoryIndex index.php index.htm index.html index.cgi index.shtml

RewriteEngine on

  1. RewriteRule ^/static/style/(.*)

/home/nate/galaxy-dist/static/june_2007_style/blue/$1 [L]

  1. RewriteRule ^/static/scripts/(.*)

/home/nate/galaxy-dist/static/scripts/packed/$1 [L]

  1. RewriteRule ^/static/(.*) /home/nate/galaxy-dist/static/$1 [L]
  2. RewriteRule ^/favicon.ico /home/nate/galaxy-dist/static/favicon.ico [L]
  3. RewriteRule ^/robots.txt /home/nate/galaxy-dist/static/robots.txt [L]

RewriteRule ^(.*)$1 [P]

   <Directory "/home/socsci/public_html">
       AllowOverride All
       Options Indexes FollowSymLinks ExecCGI
       Order allow,deny
       Allow from all
   TypesConfig /etc/mime.types



Gateway = UC Complex Social Science (CoSSci) SupercomputerGateway

tool1 = to be named

later tool=

UChicago pre-version

Hi Doug:

While waiting for the UCI OIT setup, I've installed Galaxy at UChicago to get started on a system to migrate/recreate at UCI. This initial deployment can be found here:

On that first page, at the bottom of the left column, you'll see a UCISCCN tools section, which contains a ComputeTool1 entry. The next steps here are to include: - Integrating your initial analysis tools - Enabling compute on trestles - Migrating the site to UIC Not necessarily in that order.

You can read more about Galaxy here:

At this point, I just wanted you to see which direction we're headed. There's clearly still plenty to be done. Have you decided on "CoSSci Gateway" as the name?



Saska says:


working now. your home dir perms were modified.

/jsaska UCI/OIT/EUS CATEGORY = Unix PROBLEM TYPE = Miscellaneous

Thomas had said: not working yet is an alias for has address


Hi John:

I am able to log in. Could you address these questions from earlier?

>> - Will I have full filesystem access to the application directory used by Apache? If so, I can set up Galaxy once you've set up the related Apache bits.

>> - Do you run a database server where you could host a database for us? If not, I could start with just the Sqlite support for now.


John to Tom

you can log in here but site will not be active until DNS is updated

(every time i delegate a domain to a server, it normally takes about 2 whole days before i ever see the website again.WHY SO LONG...

Nameserver propogation - Web hosting‎)

host: user: socsci

mysql: user: socsci

  • you can log in but site will not be active until DNS is updated *

/jsaska UCI/OIT/EUS John asks:

- Will I have full filesystem access? Francisco says:

no, you have full rwx access to your home dir.

- Do you run a database server where you could host a database for us? If

 not, I could start with just the Sqlite support for now.

Francisco says:

/usr/bin/mysql -u socsci -p user: socsci

On Mon, 17 Dec 2012, Francisco Lopez wrote:

> Hi John, > > Please create a temporary site with MySQL database for Thomas. > > Sincerely, > > -Francisco

CoSSci: UC Complex Social Science (CoSSci) Supercomputer Gateway - List of XSEDE Gateways

Trestles & Gordon

  • Trestles - online courses usage
  • Gordon - researcher projects
Menezes - Synthetic tools - kinship data
Oztan - pairwise cohesion - foragers, coauthorships
White & Sinkovits - k-cohesion - Gordon: World economy 5 sectors, coauthorships
White & Sinkovits - regge - World economy / Tlaxcala 2 villages


- Core Software - Irvine Social Science Gateway Anthon Eff -- Manualv6.pdf-- CCDmanual0.pdf / ACCCR

How to Turn Your Project into a Science Gateway (background: Obsolete);jsessionid=0E8E2CB0EEB79C44B477B01653849973.myaccount_a_14b?link=kln2s.redirect&changedAlts=

INSNA paper May 2013

Dear Douglas White,

Thank you for submitting your abstract proposal entitled "Complex Social Science (CoSSci) Supercomputer Gateway: Autocorrelation Modeling, Kinship Network Modeling, k- and pairwise cohesion in Large Networks & Open Opportunities for Online Education." for the XXXIII Sunbelt Social Networks Conference of the International Network for Social Network Analysis (INSNA) which will take place in Hamburg, Germany from May 21-26, 2013!

For future reference, please keep your abstract reference number:


You will be notified about the acceptance by January 20, 2013.

For all questions concerning Lecture- and Poster-sessions please contact us at:

For technical questions regarding your abstract submission please contact

Abstract submission system:

Thank you again for your contribution.

Best regards,

The local organizers: Betina Hollstein, Sonja Drobnic and Michael Schnegg

What goes into CoSSci

Opening Screenshot

Screen Shot 2013-04-21 at 9.19.38 AM.png