May 20, 2014

From InterSciWiki
Jump to: navigation, search

ECSS symposium -- Organized by Nancy Wilkins-Diehr

  • Click for recording with the live demo section by Lukazs followed by Doug's presentation for the May 20, 2014 ECSS symposium, "CoSSci High Performance Computing for Anthropology and the Social Sciences." Lukasz's actual slides start at minute 4:21, include a live demo; Doug's at 23:00, and the youtube runs for an hour.
  • News item for the May 20, 2014 ECSS symposium
  • Comments: Woodrow W. Denham: I was one of 19 people logged in. I got the video with no problems and watched for an hour ... very interesting. As you know, I have always been especially concerned with Galton's Problem with regard to Aboriginal Australia - maybe what you're doing will solve it. -- good presentation (May 20, 2014). Watched the whole thing and I think was excellent. Congratulations to all. Mike
  • Doug, you asked about the recording from the tail end of your April 2014 gateway presentation Science Gateways Community Talk: Complex Social Science Gateway. Its up at It starts off near where the May 20 talk ends, with the example of the High Gods model explored with the HPC library(bnlearn) and library (bootstrap) at Trestles.

CoSSci High Performance Computing for Anthropology and the Social Sciences. May 20, 2014


Douglas White and his co-authors, mathematical anthropologist Malcolm Dow and sociocultural econometrician Anthon Eff, editing the Wiley Companion to Cross-Cultural Research, designed R software functions (the Dow-Eff functions) that solved the crucial problems of controls for autocorrelation needed for vastly more accurate research in the social sciences as well as any of the other observational sciences. They extended on-line access to the four large anthropological datasets that now cover 3-5,000 coded variables for nearly all of the ethnographic literatures that apply to specific times and locations. They also implemented the most powerful statistical tools for imputation of missing data.

Under an ECSS award, XSEDE science gateway developers at Argonne National Lab (Tom Uram) and then the University of Chicago (Lacinski and Rachana Ananthakrishnan) designed the Complex Social Science Gateway (CoSSci ). This was designed using the Galaxy framework, is hosted at UC Irvine, and will be replicated at the Santa Fe Institute and elsewhere for classroom use.

Currently, the Dow-Eff functions (DEf) are aimed at best practices in finding the hows and whys of variations in human culture and behavior, in the most general but also very specific terms that include environments internal or external to human communities, or that manifest in disease or other and biological or biosocial processes. Such findings may be of immeasurable value worldwide given that autocorrelation controls help to establish the equivalents of randomly chosen rather than clustered samples that yield biased significance tests that distort research findings. The samples vary from foragers to a full range of human societies that with contributions of other datasets may be complemented by cross-national, regional, or other types of units of study.

With recent awards to others of White's research colleagues engaged in coding databases for longitudinal research on historical economics (Culture and Economic Growth; Evolution of World Religious), working out of the Evolution Institute, later developments of the XSEDE CoSSci Galaxy project will help to explore the temporal dynamics of behavioral, cultural, economic, and political aspects of human societies across time and including historical as well as archaeological studies.

The next two years of this project will utilize large sets of variables that are imputed in the process of DEf modeling and can be analyzed as networks of variables where analysis benefits from HPC methods. These are complex models that may use Akaike's AiCc Information Criterion in modeling and, especially, Marco Scutari's (2014) new library(bnlearn) applied to networks of related variables that can reveal precisely which complex interrelated subsets of theoretically defined observed variables form potentially causal networks. This is illustrated for world-scale religions in today's discussion.

Whereas simpler models can be computed as an aspect of students' work in the courseware that the project facilitates, storing modeling histories that can be reviewed by an instructor (and then compared to earlier attempts at similar models in the literature which are often seen to fail without controls for autocorrelation). The use of high-end HPC analyses of the more complex interactions in observed Bayesian networks of variables, however, allows experts to study more complex relationships among multiple observations with HPC.

In learning how to test theories based on massive amounts of data, the "Galton's problem" that has plagued the analysis of samples based on naturalistic observations is of paramount importance, at both the simpler and more complex levels of analysis. At the end of the next two years it should be possible to see a new florescent of coursework and research publications (including the Wiley Companion) that are likely to have transformed many of the subfields of cross-cultural studies in providing new discoveries.

----- Bob's slides

White is also an SFI networks and complexity researcher and is collaboration under a second ECSS award with Bob Sinkovits of SDSC to achieve new measurements for one of the most important and complex problems in network mathematics, that of large overlapping sets of nodes that are structurally cohesive in both multi-connectedness and cluster inseparabilities, two measures that were proven to be precisely commensurable as a fundamental theorem of networks. These larger-scale network models lend a high level of predictability to sets of network science measures that are often loosely defined and imprecise. More complex methods of network measurement provide a potential for transforming our understanding, at a much higher scale of study, how complex networks act dynamically on today's globally networked world, having the tools for understanding the effects of how the larger contexts of human societies and their multilevel organizational entities are embedded.

 ----- Afterthought to organizing the slides, live demo, slides, and whats gained in terms of theory

In one case (Tolga's material) I do show the importance of multiple variables, and the table in this case with yellow arrows show the evolutionary sequence that is discovered from multiple models where the source of autocorrelation is also a clue to the evolutionary process: original invention of Wife's mother avoidance from originating language groups, then Husband's Father avoidance in a new ecological setting, then diffusion following distance autocorrelation. I think understanding that one example conveys what can be learned and how it links in this case to the development of greater complexity, cooperation and larger scale cohesion in human societies --> which then leads to the network project with Bob. And a next project at the end that takes the imputed data from individual models, combines it all, and uses new algorithms like library(bnlearn) on Trestles to do the really large-scale modeling of complex networks of variables. Anyway that's what makes sense out of all this to me.