# Cross-cultural causality project

Causal graphs from cross-cultural research - GIS server -- Ruth Mace

## Contents

## New references

- Holden, C. J. and Mace, R. (In Press). The cow is the enemy of matriliny' using phylogenetic methods to investigate cultural evolution in Africa.Mace, R., Holden, C. J. and Shennan, S.. The Evolution of Cultural Diversity: a Phylogenetic Approach. London: UCL Press.

•Holden, C. and Mace, R. (2003). Spread of cattle led to the loss of matriliny in Africa: a co-evolutionary analysis.Proceedings of the Royal Society B. Vol 270. 2425-2433.

## Proposal

## Effects of unobserved noise: Latent variable model

Mooij et al [http://books.nips.cc/papers/files/nips23/NIPS2010_1270.pdf Probabilistic latent variable models for distinguishing between cause and effect] Abstract. We propose a novel method for inferring whether X causes Y or vice versa from joint observations of X and Y. The basic idea is to model the observed data using probabilistic latent variable models, which incorporate the effects of unobserved noise. To this end, we consider the hypothetical effect variable to be a function of the hypothetical cause variable and an independent noise term (not necessarily additive). An important novel aspect of our work is that we do not restrict the model class, but instead put general non-parametric priors on this function and on the distribution of the cause. The causal direction can then be inferred by using standard Bayesian model selection. We evaluate our approach on synthetic data and real-world data and report encouraging results.

## eLorak

Doug:

- We can update our Java code to use either 1.1 or 1.2 formats in
**Jung**. - As a note, have you considered using GNU Octave

(http://www.gnu.org/software/octave/) instead of MATLAB? Octave is a free and open source mathematical system that uses MATLAB syntax. MATLAB is OK too of course.

- Mike

## Cassandra NoSQL vs. HBase

which is best? Scott D. White joined Cassandra as a test.

## Connecting with Repast

Scott,

I am already doing a project using Repast with Pajek with Mike's collaborator Mark Altaweel. I sent Mark and Mike the timecoding rules to do network freezedries.

Note below quoting from the Repast site how Jung and Pajek as well as MatLab are included in the automated connections. Maybe there is a way to build those connections into our project.

For example, We build a network for how the indep and depvars connect in a network of significant regression coefficients for 100s of our variables analyzed for Bayesian causal graph inferences. There may be discrete generational levels (partitions of DAG structures) for connected components. This network could be made by our software, output in Pajek, read by Repast, with intra-layer permutation of connections among connected nodes to see if the clustering is greater than random. If so, then what kinds of nonrandom network motifs are forming in these graphs?

This procedure corresponds to what I am already planning to do in Repast with Mark Altaweel analyzing my test data as a validation example against a published result where I used my fortran programs to identify the network motif patterns for a kinship and marriage network (i.e., identify marriage types that occur more often than expected given random intragenerational permutations of the marriages, thus holding everything else constant). Each motif has actual and simulated frequencies compared to actual and simulated subgraphs what could have generated the motif if one appropriate link had been completed, so the Fisher exact test is used to compare expected and actual motif frequencies in fourfold contingency tables. This is a great improvement over motif analyses that consider only the relative frequencies without expected values.

Mike,

I am going to revise a first paper exemplifying our results with cross-cultural ethnographic variables for submission to Sociological Methods and Results but need to add a section that connects with Stephen L. Morgan and Christopher Winship. 2007. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research) which is the major current work in Sociology at present. That will take a week or so at which point I will send the revised paper to give you an idea of exactly what Scott and I are doing with Chalak, Hal White and Judea Pearl on the NSF proposal project.

from Repast site-------------------

Repast Simphony is a free and open source agent-based modeling toolkit that simplifies model creation and use. Repast Simphony offers users a rich variety of features including the following:

* Fluid model component development using any mixture of Java, Groovy, and flowcharts in each project; * A pure Java point-and-click model execution environment that includes built-in results logging and graphing tools as well as automated connections to a variety of optional external tools including the R statistics environment, *ORA and Pajek network analysis plugins, A live agent SQL query tool plugin, the VisAD scientific visualization package, the Weka data mining platform, many popular spreadsheets, the MATLAB computational mathematics environment, and the iReport visual report designer; * An extremely flexible hierarchically nested definition of space including the ability to do point-and-click and modeling and visualization of 2D environments; 3D environments; networks including full integration with the JUNG network modeling library as well as Microsoft Excel spreadsheets and UCINET DL file importing; and geographical spaces including 2D and 3D Geographical Information Systems (GIS) support; * A range of data storage "freeze dryers" for model check pointing and restoration including XML file storage, text file storage, and database storage; * A fully concurrent multithreaded discrete event scheduler; * Libraries for genetic algorithms, neural networks, regression, random number generation, and specialized mathematics; * An automated Monte Carlo simulation framework which supports multiple modes of model results optimization; * Built-in tools for integrating external models; * Distributed computing with Terracotta; * Full object-orientation; * Optional end-to-end XML simulation * A point-and-click model deployment system; and * Availability on virtually all modern personal computing platforms including Windows, Mac OS, and Linux. etc

## Kinship motifs

Here is the Pul Eliya data, organized by 8 generational layers. When drawn in Pajek with the *.net and *.clu variable the early generations are higher, later generations lower. See if these uncompress for you. This is an amazing case, the egocentric rule is "marry on the opposite side" as computed through female links (e.g. your MoBrDa is opposite side: two female links) but this only applies to blood kin not to affinals, e.g., two sisters marrying two brothers. This gives room for slippage from a global structure with two opposing sides defined by male inheritance of sidedness and taking wives from the opposite side. The slippage is all how links work when they are not through blood kin. Doing say 10 Repast permutations of the dotted (female) lines still generates perfect sidedness when computed THROUGH blood ties (i.e. those with common ancestors). No English-speaking ethnographer or sociologist was ever able to comprehend this. Being able to do the 10 or so permutations through Repast will be a major computational advance.