Genealogies and networks from anthropological field data

From InterSciWiki
Jump to: navigation, search

For more on A.F.C. Wallace see Positivism v. realism - Alyawarra kinship graphs


Among the oldest and most continuous ways of keeping track of people in anthropological studies are genealogies, censuses, 3x5 person-fiches, lists of office-holders, membership lists, marriage-lists, attendance-lists, place-lists and cross-references between them. Many anthropologists use free or commercial software to keep track of the most complex of these interpersonal data sources, in the form of genealogies and all the attendant data that go with them. What few researchers realize is how easy it has become to export systematic files of all sorts to more convenient forms that can be used to study social organization, social dynamics, social structure, networks, and emergent phenomena.

Most genealogical software, for example, exports to a standard GEDcom format or computerized GEnealogical Data COMmunication. For up to 2,000,000 people, these can be read by Pajek, the free "Spider" package (pronounced with a soft j as in Payek with accent on the first syllable) for large network analysis. Pajek exchanges data with open source R into which many analytic and data processing packages can be downloaded, including excellent network analysis packages (SNA, network, statnet, igraph, and many others).

The three ree types of canonical kinship graphs discussed below vary in the elements modeled by the network: whether, individuals only, marriages and parentage only, or individuals and marriages as separate kinds of nodes. Each has its advantages.

Coding field data

The essential step in coding anthropological field data for social networks of genealogical and other relations is to assign unique numbers to individuals in a sequential series from 1 to N, similarly for unique numbers for places, 1 to P, offices, 1 to F, houses, 1 to H, group, 1 to G, and so forth. Don't number such things as marriages because they can be specified by pair of individuals, i-j, in the set of persons 1 to N. Similarly for parent-child links.

Note that we typically don't know the "real" biological father, and some cultures (the Trio, for example) recognize two concurrent fathers, so what is taken down in a genealogy is what person i says about whether j is the father of k, and so forth, and another person may say something different. There is no need to "doctor" the data and decide which is correct: be precise in recording what people say, checking accuracy. But let the separate perspectives stand, if accurate, as part of the data. Data analysis can later take care of determining consensus judgments about relationships, and then flag the exceptions. Both may be important. This caveat takes care of David Schneider's criticism of kinship studies as tracing what were often thought to be biological relationships when a more correct view is that they are always sociological unless identified by DNA testing.

Generative relationships

One of the biggest mistakes in early computer analysis of genealogies was to try to code relationships in the form of a matrix: i-j and i-k are father-son pairs, so code j-k as brothers, and so forth. The proper procedure is to code generative relationships: marriages and parenthood in relation to offspring and, if at all possible, birth order. Typically, the marriage tie and parent ties are recognized reciprocally, but if not, then make careful notes of this fact, where it occurs, and code i→j (father recognizes son j) and j→i (son recognizes father) or j→|→i (son does not recognize father) as two separate entries. A child born outside wedlock, for example, might have a legal father (the mother's spouse) and a putative father, or no legal father and two or more recognized alternatives as putative fathers.

Similarly for coding other relations, like recognized friendships, where i→j AND j→i only if each person names the other as a friend.

Generated relationships

Generated relationships include friend of friend, "brother" (son of parents), or friend's "brothers" are potentially infinite. Pajek and other network programs will compute them so don't bother coding these by hand! These are generated logically in the network "grid".

Recognized relationships

Relationship in the network "grid", especially if generated logically, may or may not be recognized. If yours is one of the very few network studies concerned with relationship recognition -- like those of W. W. Denham for the Australian Alyawarra or f David Krackhardt for who says who is related to whom in a corporation -- then (and only then) you need to code your data as triples: i says j-->k. Each respondent i might have a separate network recognition subfile. When the code numbers match across these subfiles, they can be compared analytically for consensus and different sources of clustering or disagreement. In these studies, cognition and conceptual labeling can be connected to network structure through the analysis.

Multiple relationships

Keep in mind that while kin, for example, might have consensus on their primary relationships (parents, children, marriage, for example), they may have many different paths of primary relations by which they are connected. If kinship terms (like uncle or cousin) are names that label a type of relation defined by a path, and there are two or more such paths, then there are alternative possible labels that must be reconciled. Here it would be useful to have data on the contexts of use of different alternatives.

The local (egocentric) networks and the global (whole network)

Earlier generations of anthropologists struggled to examine the variety of egocentric networks in the data they collected, as do those today who use only the conventional genealogical software. They looked for common patterns or differences and tried to draw inferences, generalities, or ranges of variation from them. Anthropologists from Goldenweiser to Levi-Strauss (1949) noted the bewildering complexity of social networks, and settled on the notion that the models or generalities they could provide were merely provisional and relative to the point of view of the observer as well as that of different members of a culture.

The interdependence of local and global networks

It turns out that the relativism spawned by early attempts to deal with network complexity were unjustified except in a more limited sense: there are definable but delimited sets of models for relational data. The mathematicians who developed the theory of graphs discovered basic theorems that exemplified this principle:

  1. 1735 Euler's Degree theorem. All lines in a network can be traversed by a single path if and only if a network has no more than two odd vertices.
  2. etc.
  3. 1927 Menger’s Connectivity Theorem. Let G be a graph and A, B be vertices in G. Then the minimum number \kappa> of vertices separating A from B in G is equal to the maximum number of A-B disjoint paths in G. It follows that the maximal \kappa>-separable subgraphs of G are maximal subgraphs of G with \kappa-disjoint paths between every pair of their nodes, and vice versa.
  4. 1927 Menger and 1956 Ford-Fulkerson Max-flow Min-cut Theorem.
  5. 1930 Ramsey's Dinner Guests Theorem...
  6. 1931 König's Bipartite Graphs Theorem...
  7. 1953 Harary signed graph balance theorem.
  8. 1953 Newcomb clustering theorem.
  9. 1976 Appel and Haken Four-color theorem.

Alternate models, e.g., genealogy

Fig.1A. Alyawarra genealogies from Woodrow W. Denham assembled by D.R. White as marriage p-graph
Fig.1B. Alyawarra genealogies from Woodrow W. Denham assembled by D.R. White as marriage p-graph

Mathetician Oystein Ore (1960) created the "Ore graph" to represent the relations between father and children and mother and children, noting simply that in a circle of marriages among two men and two women, each pair producing "biological" offspring, the cycles connecting these individuals are always even-numbered in length.

Algebraist Andre Weil (1949), however, had noted a more interesting "cultural" kinship graph which took marriage types as nodes to express "rules of marriage" in graphs.

White and Jorion (1992) generalized Weil's insights by taking real empirical marriages as the nodes of a network...

White, Batagelj, and Mrvar (1999) developed the Pajek algorithms for kinship analysis.Renderings in SVG (scalable vector graphics) provided visualizations such as this genealogy for W. W. Denham's study of the Alyawarra. The pages show structural features of the Alywarra kinship network, as in Fig.1B, which show the exact same data as in Fig.1A, simply rearranged to show numbered patrilines, marriages, and skewed generational structure.

Harary and White (2001) completed the generalization of network models for kinship.

Canonical kinship and network graphs


This first canonical format, the p-graph, invented by White and (p-)Jorion by generalizing Weil (1949) to the real world rather than the world of algebraic structures, is not only extremely useful for the study of marriage patterns but allows the use of the structural cohesion, cohesive blocks and Pajek's bicomponent algorithms to find the limits of structural endogamy.

examples: Alyawarra

Figures 1A and 1B are drawn in p-graph format where the nodes are marriages, red dotted lines connect to daughters and solid black lines to sons (i.e., patrilines). These graphs are ideal for showing patterns of marriage. The yellow strips that overlay one set of red (daughter) lines show a pattern where sisters marry earlier than brothers (and wives yearlier than husbands), and the tend to cross-cut the patrilines in a pattern that resembles cycles of length six, and reconnecting to patriline descendants four generations down. For regular marriages of this sort, male generation time is 1.5 times as long as that of female generation time, a feature noted by Denham. There is a second pattern of marriages, however, where instead of daughters marrying the lineage to the left (wrapping around in a circle) they marry two lineages over to the right. These "girls" however are often older women (often widowed) marrying younger husbands.

The visual network analysis is confirmed by but goes beyond further analysis of these data by members of the machine learning team at the MIT Department of Brain and Cognitive Sciences: Kemp, C., Griffiths, T. L. & Tenenbaum, J. B. (2004) Discovering latent classes in relational data. AI Memo 2004-019 (pdf) - see Part 4 on blocking Alyawarra kin terms ckemp at

What the visual analysis shows that the MIT algorithm does not is that the older women (having had daughters by their first husbands) are marrying in the right section but in an alternate implicit section to those of their sisters.

kin-tipp graphs

Ore graphs had the advantage of showing individuals, and kin-tipp graphs add marriage links and distinguish parent-child links by sex of parent and sex of offspring. This provides the basis for the use of Pajek to do specialized computing on genealogies by the use of macros that analyze types of marriage and show how they distribute in the genealogical networks.

Alywarra 3D

thumn There are a bunch of blue nodes towards the top. Then a long row of vermillion nodes lower middle. Might these be the lineage Denham discussed that have the large populations? START NOW has messed up my typing into this page. I can see why people want no part of it, another territorially motivated MICROSOFT BID FOR DOMINANCE.

Petri graphs

Petri kinship graphs have two types of nodes, one for couples (parents) and one for individuals, types C and I. Couples (C) spawn children (I) who marry (C) to have offspring (I). In kinship relations C do not connect to C nor I to I.

Friendships and other personal relationships can be added to a Petri kinship graph, to connect I to I. Similarly, certain types of socially institutionalized or ritual kinship relations, such as the Latin American compadrazgo or Italian compadraggio, can be formed only between couples. Godparenthood, in contrast, is either between C and I nodes, like kinship, or I and I nodes.

Pajek, attributes, texts, and photos

Pajek handles all three types of canonical relationships through is Options/Read-Write, which include Ore (kin-tipp), p-graph, and bipartite p-graph (Petri graphs).

What about the attributes of the nodes: individuals, couples, places, organizations, offices, contexts, etcetera in a single or multirelational Pajek graph? They are handled in three ways:

  • Attributes are handled as discrete characteristics in a 1 to D integer and color-partition assigned to nodes of a given class.
  • Quantities are handled as vectors: real number signed values assigned to nodes.
  • Texts are handled by hyperlinks embedded in a node that take you to the url(s) where the text files and photos are stored.

Programs like Family Origins can turn a GEDcom file into an on-line genealogical website, like that of the Aydinli nomads. The same GEDcom file can also be read by Pajek but Pajek cannot output a GEDcom file. That can be done, however by an old-timer Fortran program called Ego2cpl from one of its ascii-file genealogical data entry formats. Contact Doug White for information.

SVG, movies, intranets and internets

As we have seen, Pajek makes SVG images (Fig. 1A) that are zoomable, high resolution, and copiable to other image formats. By the use of time-codes for each node and edge -- e.g., the actual marriage dates in the Alyawarra p-graph or the actual birthdate in an Alyawarra net-tipp, or both in the Alyawarra Petri net -- you can automate the making of a network movie (see thh Biotech page where a dynamic gif shows the year-by-year evolution of interfirm contracts and grants in the industry).

You can keep all your files on your private internet as *.html or use the internet to put them on a web site or, easiest of all, a wiki.

Simulations with R software

Because Pajek exports and imports *.net files to and from R, there are some programming options to develop in addition to those in the kin-tipp macros. White (1999), for example, used his Fortran package (now mostly obsolete) to take each generation of marriage in a kinship network and permute either the marriages of daughters only or sons only, keeping the male or female descent lines constant, respectively. What this does is to keep constant the size of nuclear families in each generation and the compositions of one of the gender lines, randomly reassigning marriage partners within each generation. He then used these random permutations to compute the expected distributions of different marriage types under a null hypothesis control group where ONLY the marriages are randomized but the demographic structure is kept constant. Compared to this controlled simulation, the Austrian farmer sample marriages relinked with non-link into cohesive marriage cycles within one and two generations (sibling set relinking, cousin set relinking). The Sri Lankan sample consistently married on opposite consanguineal patrisides in a globally consistent but bilateral dual organization consistent with egocentric (two-dided) Dravidian kinship terminology. The Muslim elites of a Javanese peasant village married endogamously with close kin, but no more than expected randomly from a constriction on available mates in a marriage system with status homophily but no other biases toward marriages with consanguines. In each case, the marriage pattern was verified as valid against a random baseline.

To replicate this controlled simulation in R is simply a matter of exporting the generational partitions of the kinship structure to R, along with the kinship network, and randomly permuting the marriages of either sons or daughters within each generation, then importing the simulated kinship network back into Pajek for a comparative analysis against the original. Given the ease of programming in R, this should only take a few lines of code, plus the import and export statements.

Simulations for other kinds of questions can also be done in R, and the degreenet, statnet and latentnet programs used to evaluate network and attribute structure. It is also possible, instead of tests of the null hypotheses, to test rigorous goodness-of-fit probabilities to specific network models and use likelihood tests to compare the fit of alternative models.

Alicia's question on ABMs and R

Alicia's question

Social structure and cognition

Murray Leaf (2007) reviews the "empirical formalist" paradigm for anthropological studies of cognition, including his own and Dwight Read's (1984) and A.F.C. Wallace and John Atkin's (1960) direct interview modeling of relative products (like those described above for kinship relations) in cognition. Houseman and White (1998) showed how network sidedness as an emergent structural property of the kinship network described in full genealogical detail of Edmund Leach's (1961) Pul Eliya monography matches up almost perfectly with the Pul Eliyan two-sided egocentric Dravidian kinship terminology. Read's cognitive modeling of kin terms, that is, match perfectly the behavioral patterns by which sidedness is instantiated in marriage choices.

Policy implications of cognitive anthropology

Rockridge Institute: Comparing Climate Proposals

Network Ecolanguage and Representation

Edward Tufte provides new grammars of visualization, including statistical visualization

Lee A. Arnold has some network visualizations for systems, information, emergence, cognition, semiotics, holism. These can get you tuned to representions in a mental and graphic interactive data language.

Visual complexity


White et al

  1. Classificatory kinship
  2. 1992 White & Paul Jorion. Representing and Computing Kinship: A New Approach. Current Anthropology 33(4): 454-463.
  3. 1996 Houseman & White. Les structures réticulaires de la pratique matrimoniale. (L'Homme 139: 59-85.
  4. 1997 Structural Endogamy and the Graphe de Parenté. Mathématiques, Informatique, et Sciences Humaines 137:107-125.
  5. 1997 Lilyan A. Brudner & White. Class, Property and Structural Endogamy: Visualizing Networked Histories. Theory and Society 25(2):161-208.
  6. 1998 Houseman & White. Network Mediation of Exchange Structures: Ambilateral Sidedness and Property Flows in Pul Eliya. pp. 59-89 in Kinship, Networks and Exchange, eds. T. Schweizer and drw. Cambridge University Press.
  7. 1998 White & Schweizer. Kinship, Property Transmission, and Stratification in Javanese Villages. pp. 59-89 in Kinship, Networks and Exchange, eds. T. Schweizer and drw. Cambridge University Press.
  8. 1999 Controlled Simulation of Marriage Systems. Journal of Artificial Societies and Social Simulation 2(3).
  9. 1999 White, Batagelj, and Mrvar (1999) Analyzing Large Kinship and Marriage Networks With Pgraph and Pajek Social Science Computer Review, Vol. 17, No. 3, 245-274 (1999)
  10. 2001 Harary and White P-Systems: A Structural Model for Kinship Studies CONNECTIONS 24(2): 35-46
  11. 2002 White & M. Houseman. Navigability of Strong Ties: Small Worlds, Tie Strength and Network Topology. Complexity 8(1):72-81
  12. 2005 White & Ulla Johansen. Chapter 1 Network Analysis and Ethnographic Problems: Process Models of a Turkish Nomad Clan. Boston: Lexington Press. 2006 Paper
  13. 2004 Ring Cohesion Theory in Marriage and Social Networks. Mathématiques et sciences humaines 43(168):5-28
  14. 2004 Klaus Hamberger, Michael Houseman, Isabelle Daillant, Douglas R. White and Laurent Barry. Matrimonial Ring Structures Mathématiques et sciences humaines 43(168):83-121. TIPP Kinship and computing replaces:
  15. 2005 Woodrow W. Denham and drw. Multiple Measures of Alyawarra Kinship. Field Methods 17: 70-101.
  16. 2008 White and Woodrow W. Denham. The Indigenous Australian Marriage Paradox: Small-World Dynamics on a Continental Scale. Structure and Dynamics 3:1(forthcoming).

Topical publications: Douglas R. White

Other references

  1. Leach, Edmund 1961 Pul Eliya. Cambridge University Press.
  2. Leaf, Murray 2007 Empirical Formalism Structure and Dynamics 2(1):804-824.
  3. Read, Dwight. 1984. An Algebraic Account of the American Kinship Terminology Current Anthropology 25(4):417-449.
  4. Wallace, Anthony F. C. and John Atkins. 1960. The Meaning of Kinship Terms. American Anthropologist 62(1):58-80.

Genealogical software

Back to Anthropological Methods and Models 2008