Spss dataset used in class

From InterSciWiki
Jump to: navigation, search

When using the SCCS (Standard Cross-Cultural Sample) database with Spss in a computer lab or at home, these are some basic instructions. To open the Spss dataset: navigate to directory M:Anth174 and click on one of the SPSS....sav files with the longer names (e.e., SCCSvar1-2008Map.sav, which will eventually enable you to make maps of the distributions of sociocultural features across the continents). At the bottom left make sure you are seeing |Data view| not |Variables view} in the tab. Then the names of the societies for rows 1-186 are in the first columns. From 187 forward in SCCSvar1-2008Map.sav are coastal city coordinates that are used in making maps. The |Variables view| lets you see the names of the variables in the data columns.

Codebook for variables

on-line Codebook of variables for the Spss data

Start Spss

  1. Click Windows Explorer folder
  2. go to and open M: (Uglapps directory for undergrad lab applications)
  3. click Ant174
  4. find and click to open file SCCSDatabase.sav in spss
  5. spss will open the file (186 rows for societies, click a column to see the variable)
  6. click view / value labels to see codings

To see a particular code of one society

  1. Spss is a spreadsheet, societies are in the rows, variables in the column.
  2. Know the number of your variables from the on-line Codebook of variables for the Spss data
  3. Know the number (row) of your society from the <list of SCCS societies>)
  4. Back to Ethnographic reading essay

In Spss, do a single-factor test

  1. In Spss main menu
  2. click edit / options / display names (under "Output" tab, and in each box choose: Names and Labels / Apply / OK
  3. & in edit / options / display names (under "General" tab, Variable lists: Display labels / Apply / OK
That option will list variables for the analysis by Number, e.g., v892 v894. If you had used 
the "general" tab to chose Variable lists: Alphabetical and Display Names/ Apply / OK
you would be able to select variables alphabetically such as ExtWar - External War - War - 
etcetera - but you might miss relevant variables in the codebook, and you might mistakenly 
include a variable that already measures a factor constructed for some of your variables, 
e.g., the ExtWarFactor.
...
You can use <Guttman scale> variables (codebook v239, 615, 
663, 669, 694, 695, 711, 877, 878) but (a) not together with the variables used in them
(b) you might consider instead using the variables used for the Guttman scale, individually. 
...
BEFORE running factor analysis (below), check your variables to see  
1. Do they form a proper ordering of categories to use correlation?
2. Do they need recodes?
Recodes are done from the menu by: Transform / Recode / Same or Different variable.
You could recode 99 - Unknown, for example to "." which is the missing value code, or you 
recode 1 2 3 as 1 and 4 5 as 2, to create 1 = absent 2 = present.
You could reorder a series such as 1 2 3 4 5 so as to exchange the 4 and 5 to get a proper order.
WHEN YOU DO THIS is is best to use a Different variable because labels for categories are not
automatically reordered when you use Same variable. Use Variable View (bottom of page) to give 
your new variable a names, and there is a column where you can insert category labels.
  1. In Spss main menu#click analyze / data reduction / factor
  2. click a variable from the left box and send with arrow click to the right box
  3. continue to add new variables to the right box SO LONG AS YOU THINK THEY MEASURE (NEARLY) THE SAME THING, e.g., measures of external war: frequency, duration, intensity
  4. --OR-- you want to construct a "composite variable" of quite different but correlated variables
  5. --AND-- be very careful never to include variables that are not defined independently of one another, e.g. variables v664-668 were used to construct v669 and v669 was use to construct v670, so use ONLY ONE of these variables.
  6. before you click "Ok" when your list is done, click "Options" in the lower right and then under "Missing values" click the second item, "exclude cases pairwise"
  7. before you click "Ok" click "Scores" and "Save as new variable" then before click "Ok" to generate an output file (separate window)
  8. check the last table in your output file labeled "Component matrix." If there is only one column of numbers you have a single factor. You want those numbers, ideally, to be -- greater than 0.6 -- (the maximum is 1.0), and they measure the extent to which each variable shares co-variance with the single factor.
  9. the first value in the Eigenvalues column in the "Total Variance Explained" should ideally be 3.0 or greater (or close to 3, e.g. 2.6).
  10. if you have any variables that load highly on a second, third, or higher component in this table, redo the analysis and remove those variables.
  11. now look at the % of variance explained for the first component in the output table labeled "Total variance explained". The higher that % the better the factor.

Save your single factor and correlate other variables with it

9 If you have a single factor result and you have saved your factor scores, then

In your database window, bottom left, click "Variable View", move the slider to the bottom, 
and give your saved factor scores a name, e.g. MyExtWarFactor56, the 56 meaning 56 cases 
in the factor.

10 Now correlate other new variables with this factor: At Main menu, click analyze / correlate / bivariate

then move your factor from the left to the right window,  
and add as many other variables as you want (e.g. 5 or 10 or 20, not too many or you wont be
able the read the table). (Don't include variables used in your factor.) Click "Ok" when done.
In the output window, scan the first column. Correlations close to -1 or +1 are better than 
those near 0, and "Sig (2-tailed) closer to 0 are better than those closer to one
(these are the null hypothesis probabilities). The third item, N = xx is the number of cases
for each correlation.

11 The upper left FACTxFACT correlation 1 box has N = xx for the number of cases assigned a factor score

background on Factor Analysis and single-factor models

http://www.statsoft.com/textbook/stfacan.html

http://eclectic.ss.uci.edu/~drwhite/pub/Single-Factor_Models.pdf

http://eclectic.ss.uci.edu/~drwhite/pub/Reliability1990e.pdf

Back to Factors of culture analytic essay (#2)

This ends Paper 2 Part 2 (8 pages). You will come back here for Paper 3.

In Spss, do a cross-tab with stats and a graph with trendline

Configure Spss for cross-tab labels

  1. In Spss main menu
  2. click edit / option > output Labels
  3. change each of the four windows, e.g.
  4. Names and Labels etc if you want both numbers of variables or values AND the title

Make a crosstab

  1. FOR ONE CROSS-TAB click analyze / descriptive / crosstabs
  2. pick two variables (from the codebook, one or both may be single factor (composite or single-concept factors). (Why? Because one of your hypothesis has made a prediction about how these two variables are related. Note that smaller sample size or total N automatically lowers significance but does not affect the strength of correlation.).
  3. If you use a factor then recode the factor variable using breakpoints.
  4. click CELLS and percentage for rows (or cols, but not by totals), then OK. (Why? You want to interpret correlations in terms of differences between the parcentages by row or by column, according to your hypothsis, e.g., (a) where variable x is low, y has higher (lower) percentages. Is the converse also true or not? That is, (b) where variable y is low, x has higher (lower) percentages).
  5. click STATISTICS and then click buttons "Cramer's V" and "Tau-B" (only these). (Why? If both (a) and (b) are true, then interpret "Tau-B" as a correlation coefficient, and the square of "Tau-B" as percent variance explained. If only (a) or (b) are true, but not both, then "Cramer's V" (which is always positive, but should be interpreted as negative if "Tau-B" is negative can be interpreted for a conditional relationship where it is only the higher (lower) values of one variable that are predictive of the other, not the converse.
  6. "Ok" to run cross tab
  7. check the categories in the rows and the columns:
are they ordered? If not, then RECODE
do any of the categories in one have exactly the same meaning for the other,
for example in variable v1745 and v1756 0=no formal political office present so all the 0 cases for the row will be in column 0: so recode one or both as "." for MISSING DATA. Then rerun the cross tab. The 0=0 equivalence will no longer create a spurious correlation, true by definition that 0=0.

Then make a graph to go with the crosstab

  1. GRAPH WITH TRENDLINE for the same two variables
  2. In Spss main menu
  3. click Graph / Interactive / Scatterplot
  4. Move first variable to y axis box, second to x axis box, then OK
  5. doubleright click the image / click SPSS Interactive graph image / Edit
  6. click the second icon on the top row / Dot-line (pretty crude)

Now make crosstabs and graphs for testing each of your hypotheses

Interpreting the cross-tab

see SCCS test of hypotheses but let me put some caveats here (and will repeat there):
  1. compare percents that are significantly greater than total percents over the whole table.
  2. If you want to compare, say, two same-column row percentages (percentaged on rows) then also use the Fisher Exact 2x2 test to compare raw frequencies of cases for row a freq / row a total / row b freq / row b total. Percent differences dont mean much if the samples are small.
  3. if you have more that 3 x 3 or 8-9 cells total Cramer's wont be valid because the expected values will be too close to zero. Then rely on tau-B or on a graphic result, and use the extrapolaion line in the graph.
  4. If you do a cross tab that includes a FACTOR SCORE,
use only the Tau-b statistic, not Cramers (there are too many cells for it to be valid)
include only the statistic results DO NOT include the cross-tab itself in the paper
it would be better only to CORRELATE and show results graphically than do the cross-tab
  1. Never include the Case Processing Summary in a paper!

Legacy graph with trendline

  • doubleright click the graph / SPSS Chart object / open
  • click the 7th icon (squigglty line) in the lowest icon bar (Properties)
  • In properties, clic / Interpolation line/ step / center step

for multiple crosstabs

  1. In step 1 for Make a crosstab, put two or more variables in the ROWS
  2. or two or more variables in the COLS

Control variables

With any given cross-tab or set of cross-tabs, you can use a "control variable" in the third window, below the "row(s)" window and the "column(s)" window. Ideally this should be a binary variable or one with not too many categories. v200, world regions, is always a useful control to see if correlation replicate on different continents, though you needn't include it in your papers.

<An example of a binary controls> came up in our discussion of Karina Ritter's paper 2 results (she did this on day 1 of the paper 2 tutorial although I added the codes from the variable list/codebook). The idea was to use the binary ISLAM yes/no variable at the end of the Spss data file as a control for her factor/other variable correlations to see if they replicated within Islam and for non Islamic societies, as her results might reflect simply an historical contrast between world religions and their social practices and general social organization.

Back to Human_Social_Complexity_and_World_Cultures#Hypothesis_test_.28SCCS.29_on_how_a_pair_of_topics_relate

This ends Paper 3 (8 pages).

background to SCCS

more on Spss

You might be able to consult the Tutorial on the c: drive of the lab.

If you want to run tables from home using Spss

If you want to run tables from home with free software you can use R

Links

SCCS library references

Howto set up a nnew SPSS file with headers and data- Tutorial Video