A289 required readings

From InterSciWiki
Jump to: navigation, search


Exercises are also listed here.

Contents

week 1 - introduction

by week 2 - multiple working hypotheses and simple statistics

In my post-hoc weekly riff on multiple working hypotheses, models, and statistics summarizes some of the main points of the discussion.

INSTALL R: go to http://www.r-project.org/ and click FAQ to choose MAC/Windows: First FAQs are for installation instructions

  • DW tip: Before you buy into your model: look under the hood. Then read the fine print. If its not what you thought was advertised, don't buy in. Find better alternatives. Replace the model with a better one

You can experiment with the contingency table calculators as listed at Chi-squared#Calculation

by week 3 - weaker models (more general) and stronger testing - using R for baby steps

R cross-tabs tutorial that involves v891 Freq of Internal War and v893 Freq of External War-Being Attacked and QuikStart_R#Comparative_research_R_tutorial.

Hand in at week 4 six contingency tables from SCCS -- heterogeneous pairs of variables of your choice,

This is in-class (above) and at-home (after class) exercise posted on contingency table analysis: having done V891 and v893, now do 10 other crosstabs for your favorite pairs of variables from the SCCS dataset. Bring printouts of your results.

Why R?

R is the most widely used open source (free and mostly fully documented) software and provides the equivalent of the best statistical, scientific, and commercial software for every sort of purpose. One you get used to a command language and looking up (copy and pasting) commands as found in manuals and on google) it will bring you growing benefits in your data analysis. The level of sophistication far exceeds Spss for example because those at the cutting edges of their fields typically write new software in R that goes far beyond the capabilities of older methods. R is available in alot of the ACS computer labs on campus (--Alicia--), and has recently been installed at the SocSciBldg.

Class notes for learning and using R

http://www.ats.ucla.edu/stat/R/

Weaker models

As listed at Chi-squared, take a look at 2008 The Indigenous Australian Marriage Paradox: Small-World Dynamics on a Continental Scale, (drw and Woodrow W. Denham) Structure and Dynamics 3:1 (forthcoming: LAST REVISED 4/14/2008). http://intersci.ss.uci.edu/wiki/pub/Paradox07b.pdf if you interested in a critique and replacement of 5 anthropological models:

  1. The classical Swanton models of Natchez social organization
  2. The classical mathematical model of Australian kinship and section systems (from Radcliffe-Brown to Harrison White to Eugene Hammel)
  3. The standard Australian ethnographic models of Australian kinship and section systems (with many ethnographers who fail to buy in)
  4. A standard simulation model of hunter-gatherers: Read, D. & S. LeBlanc, 2003. Population Growth, Carrying Capacity and Conflict. Current Anthropology 44(1):59-85.
  5. The double-helix model of closed Australian section-system kinship
  • Exercises: Chi-squared and Fisher tests for categorical variables - and the assumptions involved {usually invalid) about statistical independence.

Getting set up with network software (Mac users: in R)

For PC Users - the asymmetry here is more package programs with GUI menus

For Mac Users - to run PC software you might want to get vmware FUSION pc emulator

Carter Butts Mark Handcock

  • see: Category:R software

(by) week 4 - Reliability and Single Factor Test multivariate analysis ! Prelude to Network Analysis

Ideas for readings (more to be nominated by participants)

  1. Doug White: Since we are reviewing Borgatti's network slides, review again 2007 Murray Leaf Empirical Formalism Structure and Dynamics 2(1):804-824 -- for his analysis of anthropological positivism. In Positivism v. realism (my first weekly riff) I review my take on where most of network analysis has gone wrong from Clyde Mitchell to the present. It also references an Introduction I wrote to network analysis.
  2. Sarah Baitzel: I was browsing JSTOR and found this, which might interest the archaeologists: Numerical Taxonomy, R-Mode Factor Analysis, and Archaeological Classification. Andrew L. Christenson and Dwight W. Read. American Antiquity, Vol. 42, No. 2 (Apr., 1977), pp. 163-179. Dwight is one of the good guys while not infallible and while the article is 1977, the points are still valid -- the critique of the taxonomic structures found by cluster analysis on raw data echo my essay's critique of the taxonomic approaches of positivist componential analysis of the 1960s and 70s. As in my essay, use of positivist (imposed) taxonomies fails to capture the nomothetic quality of how archaeologists normally classify. -----The discussion of the appropriate use of R-analysis with rotation--which we can already do with our R software, option=rotation factors=3 or more--is extremely valuable and still pertinent (check out the work by Robert Drennan for example). -----Q-mode analysis btw is turning the problem around to analyze cases rather than variables; we now have 2-mode and 3-model factor analysis, e.g., cases, variables, and time, and correspondence analysis that handles all 2-3 dimensions.
  3. Cluster Analysis and Archaeological Classification. American Antiquity, Vol. 43, No. 3 (Jul., 1978), pp. 502-505. Mark S. Aldenderfer and Roger K. Blashfield -- For an alternative view to Christenson and Read, these authors disassociate cluster analysis from the goals of numerical taxonomy. They want "investigators [to] become more aware of the great diversity of clustering methods which currently exists, and of the appropriate conditions for their use."
  4. Maybe some of the archaeologists in the seminar can lead further discussion of the topics, and even prepare to look at Drennan's work later in the quarter.

After class additions

Scale development

We'll do a Factor analysis in R

Anthropological_Methods_and_Models_2008#Tutorials_in_data_analysis tutorial

Wikipedia:Factor analysis

Using the single factor model / Factor analysis in R

Factor Analysis in R - External War

In-class exercise: 1) try to find a better External War factor, changing variables. At home: Same for Internal War variables; one other topic drawing from the codebook.

We may be able to work out the procedures for pairwise missing data to create the correlation matrix, and with that the factor analysis.

BEFORE NEXT CLASS: Go through Steve Borgatti's Syllabus for Social Network Analysis seminar - Syllabus U. Kentucky - UCInet User's group - Install

Bone up on Structural cohesion (with R software: follow the links)

And do photos

A289 photos

And poll for substitute dates for two sessions when DRW is at the ETHZ in Zurich

May wed 7th 9-12 May wed 21st 9-12

(by) week 5 - Consensus and network analysis: Groups (aka Structural cohesion) and roles (aka Blockmodeling) in networks; kinship and kinship simulation; and all other relational or spatial data

This week's readings tie in with main topics on wiki seminar home page -- go through those items too: --- to ---. The focus here is on Genealogies and networks from anthropological field data, that is, how to think about, code, analyze and present your analytical field data as a relational database. My weekly riff is on Unification.

This begins SNA: Social Network Analysis (We will be joined by Carolina Berys, an undergraduate at UCSD majoring in Cognitive Science)

The tutorials InterSci wiki R Paper Examples: Using R to replicate a published study site now has extended tutorials.

Readings

  1. Weller, Susan C. (2007) Cultural consensus theory: Applications and frequently asked questions pdf. Field Methods 19: 339-368. CONSIDER: the multiple working hypotheses (a) single or multiple-group consensus? (b) from single source (book, teacher, mass media) (c) from common experience (d) vertical transmission in families (e) horizontal network influence. THEN (a) can be tested by Q-factor analysis where the there is a single factor only. Do commonality scores then reflect "competence"? or (b) can the source be identified, and are there differences in closeness/exposure to source? (c) how to text independent experience? (d) Network correlates for family/kinship closeness (e) Network correlates for social interaction influencing exchange of information.
  2. Google: Cultural " Consensus analysis"
  3. 2003 Structural Cohesion and Embeddedness: A Hierarchical Conception of Social Groups. (J. Moody, drw) American Sociological Review 68(1):1-25. http://www2.asanet.org/journals/ASRFeb03MoodyWhite.pdf
  4. (newest application:) 2008 Cohesion and Power-Law as an “Elite Club” in a Large-Scale Industrial District: Flexible Specialization or Dual Economy? Tsutomu Nakano and Douglas R. White. Submitted to Industrial and Corporate Change http://intersci.ss.uci.edu/wiki/pub/ms_to_Ind_and_Corp_Change-ver16-bw.pdf - see p27 at 400% magnification
  5. 1999 Controlled Simulation of Marriage Systems. Journal of Artificial Societies and Social Simulation 2(3). http://jasss.soc.surrey.ac.uk/2/3/5.html - http://ideas.repec.org/a/jas/jasssj/1999-12-1.html
  6. click and read the "DURING CLASS" articles, wiki pages and short-takes below
  7. student suggestions: Preview Alicia's question

Browse

BEFORE CLASS: Go through Steve Borgatti's Syllabus for Social Network Analysis seminar -

          Syllabus U. Kentucky [http://tech.groups.yahoo.com/group/ucinet 
          UCInet User's group] - Install

DURING CLASS: We will cover

Software: Kinship simulation Readings item 3: Pajek to R, network analysis in R - not fully debugged

... and other topics? (Deferred to class 9

Personal networks analysis using Egonet Chris McCarty, University of Florida -- suggested by Ashwin Budden

by Meeting 6 WEDNES MAY 7- Univariate distributions, entailment analysis, multiple regression

Cover last week's on-line tutorials: and review the articles modeling that go with them in the replication of published articles pages. My weekly riff is on generating processes as the ways that variables relate -- to themselves (univariate distributions) and to others (bivariate, multivariate) -- but is mercifully short (so far).

From last and this week: discuss one reading - exercise in R - replication

  1. (Sarah: will discuss Friday. That doesnt mean that this approach is "taken" as there are other modifications and variants on this approach which the large SCCS database. Sarah will explain her variant on the core EFF model) DURING CLASS practicum #1 below Does Mr. Galton still have a Problem? - using network or Spatial Auto-Regression with software by Anthon Eff, - SAR code in R plus replication study #1 of Female Contribution to Subsistence
  2. DURING CLASS potential practicum #2 below [R software by Carter Butts, UCI Sociology, for Doug's Entailment analysis, applied in a paper of Division of labor by Gender. This is replication study set #3. Sociology students have used this successfully, but it has a series of components. Would take a longer time and needs a wiki tutorial to be put together. Great for an article or a thesis if you have a suitable problem.
  3. paper 3 above plus Structural cohesion - Wikipedia:Menger's theorem plus replication study #7 - this could be a class report and useful for any problem where you cant to compare the intensive and extensive aspects of structural cohesion over time or between groups or as a predictive or consequential variable related to other variables. The http://intersci.ss.uci.edu/wiki/pub/Dynamics_of_Human_Behavior6_abstract2.pdf Roadmap to a longer Encyclopedia article on the Dynamics of Human Behavior may be of interest to elucidate, if only in outline form.
  4. DURING CLASS Univariate Distributional analysis. The R code by Cosma Shalizi, the speaker for next friday afternoon (May 16th) does Maximal Likelihood Estimation for all kinds of distributions - Pareto (Power law), q-Exponential (open system entropy, power law tail), Exponential (what uniform randomness produces), Normal, Log-Normal. These are documented at Pareto I and II and Open systems entropy and [Aaron Clauset and Cosma Shalizi's R and Matlab software]. Some special assistance would be required here but this is the best instance to data where we have R and Matlab code for the all-important [Kolgomorov-Smirnov test] that I discuss under [http://intersci.ss.uci.edu/wiki/index.php/Multiple_working_hypotheses#The_Basics_of_Modeling_.28and_Multiple_working_methods.29 Basics of Modeling (and Multiple Working Hypotheses). Complicated, but illustrates probabilistic model fitting: testing alternative models of possible generative processes.
  5. paper 5 above plus Software: Kinship simulation plus replication study #7 - this could be a class report and useful for any problem where you compare an actual distribution of network ties across categories of an ordered permutation with simulated ties....
  6. Since we are partly in the business this week of looking for project materials, consider Henry T. Wrights codebook and database in his Atlas of Chiefdoms and Early States Structure and Dynamics 1#4(1).
  7. Andrew Somerville's project - combining Q and R analysis (and possibly spatial autocorrelates) calls up the need for correspondence analysis in R.

DURING CLASS: We will cover

multiple regression, network autocorrelation --this is working!
Entailment analysis (with R software)
Fitting Univariate Distributions with bootstrap probabilities --this needs *.m or Matlab programs, will try to find equivalents in R. Some recent univariate distribution examples were never fully worked out, but required only a format for the data to be entered.

by meeting 7 - Fri MAY 9 - bootstrap and MLE - advanced stats

weekly riff is on what does it mean to find generating processes?

Student presentations

  • Sarah Baitzel - Eff in R (Eff added the routine for extimating R2 or R^2 (Rsq), Doug added his model with a new measure of polygyny), Sarah added other variables)
  • Ashwin Budden - how he is coding his questionnaire for Correspondence Analysis, Brazilian Amazon religious pluralism (right title?)

Initial ideas for using correspondence analysis -

  • Andrew Somerville - Comparison features for Mesoamerican and Southwestern sites
  • Alicia Boswell - Comparison features for site groups of a Tiwanaku colony
  • Laurent Tambayong - relating his simulation findings to the q-exponentials we used for cities, and that was found for the "Generative Feedback Model" (Social Circles Complex Network Model) discussed in class. Starting to find these everywhere. Cumulative distributions: exponential distribution linear in semilog, power law in loglog, q-exponent is semi-q-log. Now investigating relation between very low-density graphs and complex dynamics. See: Estimating Tsallis q for degree distributions where the figure is from his study, also the fitting, and the working out of the fitting method with Doug.

Other

Preparation for talk the following week by Cosma Shalizi, University of Michigan CSCS postdoc, SFI postdoc and CMU Asst. Professor will give half of this seminar and open discussion on methods of complexity research in anthropology and the social sciences. Class will end at 11:20, followed by the Videoconference presentation by Shalizi, same room, 1:00-3:30. Assignments will include two articles in which he is co--author on bootstrap analysis of univariate distributions, and the Distribution -fitting Practicum with Archaeological site data

Maybe here the complexities of open system entropy, testing the model: q exponential, Kolmogorov-Smirnov tests, and Open entropy historical cities and city-sizes, Oscillatory dynamics of city-size distributions in world historical systems, etc. Then a riff?, perhaps: what it means to find generating processes? Description, comparison, simulation, explanation, causality?

Readings:

  1. 2008 Oscillatory dynamics of city-size distributions in world historical systems. (drw, L. Tambayong, and N. Kejžar). In, G. Modelski, T. Devezas and W. Thompson, eds. Globalization as Evolutionary Process: Modeling, Simulating, and Forecasting Global Change. pp. 190-225. London: Routledge. http://intersci.ss.uci.edu/wiki/pw/ModelskiCh9WTK.pdf
  1. 2008 Innovation in the Context of Networks, Hierarchies, and Cohesion. To appear in, Complexity Perspectives on Innovation and Social Change. D.Lane, D.Pumain, S. van der Leeuw and G.West (eds) (Springer Methodos series). http://eclectic.ss.uci.edu/~drwhite/pub/ch5revMay-20.pdf

-- Here is great introduction to social network methods by by R. Hanneman and M. Riddle. http://faculty.ucr.edu/~hanneman/nettext/ -Ashwin

by meeting 8 May 16 also Shalizi - methods for complexity analysis, dynamics, univariates

R Package list for Social Science and Networks now has all the resource packages in R that will be installed (hopefully) at the ACS and SS labs at UCSD.

Weekly riff: From *simple* systems to *complex* -- the revolution in methods and models

Readings and wiki reports to read

Roadmap for Complex Systems --> selected 2-page readings by topic

Estimating Tsallis q for degree distributions, distributions for site sizes, city sizes, etcetera --> this first experiment appears to be revolutionary for site-size analysis in archaeology

Community structure and structural cohesion

Journal of Quantitative Anthropology content pages (by authors) available now for browsing in PDF (see our course page for JQA and other online journals, including Methods-R-us by Jeffrey Johnson

Student presentations

Some related pdf demos by Doug White: dataset on trade networks in Medieval Europe also available in .xls
http://eclectic.ss.uci.edu/~drwhite/pub/WorldCities.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/DRWsfiAug04a.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/CivsAsDynNetsSASci.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/BudaCivsAsDynNets.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/CivilizationsasDynamicNetworks.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/QuantitativeNetworkAnalysis2.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/Eurasian_city_system_dynamics_in_the_last_milleniumColumbia.pdf
http://eclectic.ss.uci.edu/~drwhite/ppt/CivilizationsasDynamicNetworksParis.pdf Merchant and financial capital alternation slide 22

Other

Cosma Shalizi, University of Michigan CSCS postdoc, SFI postdoc and CMU Asst. Professor will give half of this seminar and open discussion on methods of complexity research in anthropology and the social sciences. Class will end at 11:20, followed by the Videoconference presentation by Shalizi, same room, 1:00-3:30. Assignments will include two articles in which he is co--author on bootstrap analysis of univariate distributions, and the Distribution -fitting Practicum with Archaeological site data

Simulation: in R, specialized ABM languages (Repast, Swarm, etc)

Alicia's question re: ABM and R

Hi Doug, I have a question about R. How can it be useful for agent-based modeling systems? Can you perform the statistics in R and then use agent-based modeling software for simulation? Or are the statistics for ABMS usually done in the ABM software?

Doug's response: Good question, and yes, given data you can do stats in R, simulate process in ABMs, analyze stats from simulation in R.

Some use R for the simulation itself. I wrote a simulation published in 1999 and another published in 2006 in Fortran originally, am now rewriting the latter in just two dozen lines in R; and my student-colleague-coauthor Natasa Kejzar rewrote the other in R, then it was rewritten more succinctly in Python's NetworkX program. Many of the older abm modeling platforms are getting obsolete, but those that have good internal diagnostics to show what is going on as the simulation runs ("probes" that show on your screen) are very worthwhile. Laurent wrote his simulation -- with simply 2 parameters -- in a few pages of Fortran; it would easily be rewritten in (translated into) R. For probes he simply output the networks as they evolve successively with time codes, and used their equilibrium states to analyze for statistical outcomes.

In general, although the advantages of ABMs often lies in the display screens and probes that display status while you run the simulation and view the screen to see interaction of agents and environment, I think one of the great ways to understand and explain why the ABM leads to its results is to put network probes -- keeping track of the interactions -- INTO the simulation with time codes, and then then network analysis can help solve for the explanation (tipping points, etc.) of emergent phenomena. All this can be done easily making R the simulation language. OR the ABS can save the output needed to code the longitudinal interaction network.

Social-circles as complex networks: Generative feedback model provides an example where the simulation written in R outputs a new network at each time step and adds time codes to each network so as to be able to play back a Pajek movie of the evolution of the simulated network. Network analysis and statistical analysis can also be done at each step to understanding the processes of evolution.

by meeting 9 WEDNES MAY 21

Readings and resources

  • Andrew Somerville and Alicia Boswell discovered the ANTHROPAC site, Steve Borgatti's analytical tools, which include techniques that are unique to Anthropology, such as consensus analysis, as well as standard multivariate tools such as multiple regression, factor analysis, cluster analysis, multidimensional scaling and correspondence analysis. In addition, the program provides a wide variety of data manipulation and transformation tools, plus a full-featured matrix algebra language.

Friday's speaker Cosma Shalizi, coauthoring with Doug and Laurent Tambayong finished a new paper based on Using the new discrete estimator and producing sampling distribution plots that will allow the q-exponential fitting of data from networks, archaeological sites, city sizes, and the like. The tutorial in R is finished and working. The new article is at q-Exponential Distributions in Empirical Data (c) 2008 Laurent Tambayong, Aaron Clauset, Cosma Shalizi, and Douglas R. White

Student presentations

  • Celia de Jong - will continue with her discussion
  • Erez Ben-Yosef Archeological methods in event reconstruction
  • Catherine Forsman - will be posted to wiki - Ethnographic interview and text coding design for the study of Katrina refugees

DURING CLASS: We will cover

weekly riff on Musing: connecting dynamics to statistical inference
Structural cohesion (with R software) (compare to the disparate materials on the Borgatti site) See Dynamics of human behavior#Background
Roles in Networks summary see 1988 summary (precursor): world trade network roles

by meeting 10 May 23

weekly riff is on why quantify?

Student presentations

  • Karen Nickels The relationship between agriculture and the environment. I found Pryor's 1986 article that's referenced in the Standard Cross Cultural Codebook. I'd like to look at his statistical analysis and see what other factors may come into play.
  • Yong Ming Kow Ethnography and web trawler data

Second rounds: progress made, suggesstions asked at stage 2 papers due at the end of finals week, you can sub by email or by wiki site

Other

Reviewing and Expanding Practice in Replication of published studies with R

InterSci wiki R Paper Examples: Using R to replicate a published study

Continuance by wiki and email

I have to leave for Zurich for 1 one-month visiting research appointment (leave of absence) at ETHZ but we can continue on the wiki, by email, and by FTP of papers; I will enter grades remotely.

back to Anthropological Methods and Models 2008