CoSSci Background, Screenshots and Instructions

From InterSciWiki
Jump to: navigation, search

Doug will be away till May 9. Meanwhile Tolga Oztan <boztan@uci.edu> is willing to take questions and can made corrections in the wikipages. Thanks, Tolga -- Doug - Email invitation: Potential participants in the Wiley Companion to Cross-Cultural Research and the accompanying Complex Social Science Galaxy (CoSSci)

The http://socscicompute.ss.uci.edu now has the two instructional youtubes (saved histories 2min and overview of CoSSci Galaxy 20min) and http://capone.mtsu.edu/eaeff/DEf01SCCS.html has the codebooks and R gui code used in CoSSci. Let me know about doing a chapter for the Wiley Companion based on one or more CoSSci-derived models. The trick about improving a model from the depvar and a few independent plus Independent UNrestricted model variables (used to help with covariates for imputation) is to inspect the *.csv output for the "To Try" variables which may be tested as additions; each alone tends to increase predictive variance (if taken one at a time) and may provide meaningful additions to the model. The "Galaxy" framework is commonly used in many of the Science Gateways listed at http://intersci.ss.uci.edu/wiki/index.php/Main_Page (https://www.xsede.org/gateways-listing).

-- Tolga will respond to requests for help, you can pepper me with more general questions. You and students will find CoSSci Galaxy much easier to use than the R gui, which R afficionados will like.

Douglas R. White

http://intersci.ss.uci.edu/wiki/index.php/DW_home Editor: ________________________________________________________________________ Structure and Dynamics: eJournal of Anthropological and Related Sciences https://submit.escholarship.org/ojs/index.php/imbs_socdyn_sdeas

What (click here) is DEf? and (next) CoSSci? http://socscicompute.ss.uci.edu and the http://capone.mtsu.edu/eaeff/DEf01SCCS.html codebooks < - - - These are the entry point for the Complex Social Science Gateway (CoSSci) at UCI, its also the startup of the for Wiley Companion to Cross-Cultural Research Chapters. Its been used in a fall 2013 class by Ren Feng and by Doug and 9 prospective authors at the session in Albuquerque for Celebrating, a site that now contains all the power points and two youtubes with instructions (1) See you tubes How to share CoSSci histories with Students: see your Published Histories and the 20 min clickable CoSSci overview. See also Gateway Instructions. We may be building (2) "An inventory of CoSSci data entry variables" for users to see the kinds of entries to put into the screen for a given model. (3) Better screenshots of the site with explanations of steps in analysis (4) a series of screenshots that "open the site", items 6 - NoVA - and 7 are new Jan 25th 2014 and will be edited for better detail what the screenshots show by way of instructions. (5) Output will come in the form of a *.csv file as a download on your computer (just click the image of a diskette to open *.csv results).

  • Users can login, set their password, then change the name of the upper left "History" file, share the history on CoSSci or publicly by pressing the upper right *. That history can include models saved at that time, and instructors can then seen students' work, or students can share with other students, etc. Don't rely much on the screenshots below, this was day one, all this will be edited (and gotten ready for online courses, or use of CoSSci from classrooms), each student can open their own CoSSci site.
What you'll see when CoSSci starts is a yellow window above the green (upper right), when finished the yellow window turns green and eventually a "Diskette" images appears. Click that for your *.csv output
Screenshot4CoSSci2.jpeg

Other Screenshots 1: Using a Gateway shell (one used by biologists)

http://www.apple.com/findouthow/mac/#capturescreen

Ctrl+/- to affect screenshot or Click to enlarge: Looking at some bio tools we can use

Tom and I were discussing the statistical packages used by biologists - note they are using lm() without missing data imputation of 2SLS detection of autocorrelation (Galton's problem).

Our tools are further up on this page. We will leave the biology tools up until we learn to use some of them, like text manipulation, filter and sort, statistics, graphing and sorting data, etc., at the upper left of this screen.

On the upper right is a slideable window of Tom and Doug's review of Tom's History site where 57:EAF1 is his 57th use of the "EAF1" tool which is one of Anthon's scripts that takes variables of interest, maps them into Dow-Eff functions format, prints the compilation of input variables (where we see "Dependent variable='valchild' from the article Eff and Dow (2009) in Structure and Dynamics, an early prototype now obsolete. So for the time being ignore the biology-specific tools.

This is the top half of that page:

Ctrl+/- top half of screenshot above

the X and Y and lm are from the biologists's tools for regression, not ours. We cannot use their tools for our data and Dow-Eff functions. But we can get some ideas from exploring their use of a Galaxy site, which is a common format for building [https://portal.xsede.org/science-gateways Science Gateways).

The biology prototype that we built from (Thomas Uram at Argonne labs): modifying Galaxy python scripts] is part of the Galaxy project.

Now lets go to our own CoSSci Galaxy, http://socscigate.oit.uci.edu/uci/root, installed at UCI, thanks to OIT's John Saska and Francisco Lopez. The "Gateway" was installed at UC Irvine but links to XSEDE, which is part of UCSD's San Diego Supercomputer Center, SDSC, right near Doug's home.

Choosing options for autocorrelation now available (distance, language, ecology)

Screenshots 2 getting output

At the top dark row of the previous screens you'll see "Analyze data" at one end and "User" at the other. I just logged in with my email id and made up a password. Now we see 56: EAF1 under 57: EAF1, so we're working backward. At the bottom of that green text-filled rectangle there is an image of a whirling blue arrow. When clicked it says "Run this job again."

Now the history that Tom has been sharing with me disappears, and the screen is filled with text and its not aligned. The file that is downloaded to your pc, however, is a a *.csv file and it looks like this

Screen shot 2013-01-24 at 5.33.29 PM.png

Thats the top of one of the olsresults.csv ouputs that you would have gotten if you ran this script at home in R. To see how this was generated click the (static image of) the whirling blue arrow that will say "Run this job again." When done, click the diskette image in the row of three and you get the output on directly the white screen.

Screen shot 2013-01-24 at 6.11.27 PM.png












Screenshots 3 setting input for a new model

Click the little i in the row of three and you get the input specifications of the job just run, on the screen. (BTW if after pressing i Filesize: ? rather than KB, the run is waiting for output)

Screen shot 2013-01-24 at 6.23.53 PM.png


But where did the definition of the depvar disappear to? Well, lets do the data definitions again: click the swirling blue arrow to the right. We see the screen below: Now we see where to define the dependent variables as (dx$v473+dx$v474+dx$v475+dx$v476)

Screen shot 2013-01-24 at 6.23.53 PM.png





We can see now that the entire model can be defined by the

  1. depvar name
  2. all UNRESTRICTED indep var definitions, e.g.. v1260,v203,v204,
  3. all RESTRICTED indep var definitions, e.g., v1260,v155,v233d4,
  4. the DATASET Name of the depvar dx$valchild
  5. the DATASET Definitions of the depvar (dx$v473+dx$v474+dx$v475+dx$v476)

But what about other variables that require DATASET Names and DATASET Definitions?

Ingeniously, Tom Uram as observed that these are defined IN THE SAME WAY as the depvar, so JUST CLICK |Add a new variable| and you will see:

the same combination of a DATASET Name and DATASET Definition for each new variable
and then you simply name these added variables to the list of UNRESTRICTED or RESTRICTED independent variables.

Screenshots 4: How to add the Named variables that are from recodes of the original dataset dx$v000 numbers?

Well, here you have it: these added Named variables can be continued indefinitely but when adding to the RESTRICTED variables list you must also add to the UNRESTRICTED list (altho not necessarily in the same order).

Screen shot 2013-01-24 at 6.43.09 PM.png Now, when you're done with Added variables (as many as needed) then press EXECUTE Once EXECUTE IS FINISHED click the diskette image and receive *.csv as a download -- once downloaded, just click to open the *.csv results.

So print, memorize, split screen or annotate these pages and go to http://socscigate.oit.uci.edu/uci/ or http://socscigate.oit.uci.edu/uci/root Doug (talk) 8:55, 25 January 2013 (PST) (Previous 18:55 the 24th)

Temporal Sequence of Screenshots

Sequence of Screenshots This what what you see as a few minutes go by -- system works but we need more explanation

THIS YOU CAN IGNORE - Illustrating NoVA - Networks of Variables Analysis

These are put together from four dependent variables models. Illustrations from Wiley Companion Chapter 5 Diagrams 4 & 5 <-- click for the pdf of the draft article - Invitations to Critique

ReissRelabelv54Sanday2Yellow.png

<-- Legend: Arrows point to Dependent variables. Blue arrows are consistent positive regressions among variables that are anti-female. Red arrows are consistent negative regressions among variables that are anti-female. Together they form a cluster of anti-female variables. Below: --> Names of variables reversed to form the same cluster of variables, now pro-female. All dotted (red) variables show negative regression coefficients, solid arrows (black) show positive coefficients.

ReissRelabelv54Sanday2tall.png

Sharing, Publishing, and viewing Shared Data

In the middle of the top black bar you'll see -->Shared Data<-- at the middle. If you click there the first drop down item is:

Published Histories

And there you will see one shared history, that of Thomas Uram. You may run any of his models, although 56 Eff1 and 57 Eff1 are the most recent (working) models.

From Tom Uram (our Argonne Labs Galaxy -python- Programmer) Jan 24 2013

You can learn plenty about Galaxy from these screencasts:

http://wiki.galaxyproject.org/Learn/Screencasts

Some of them will be very bio-centric. If you can ignore the bio details and concentrate instead on datasets and tools, you'll see how it can support your work with your tools.

There's also this Galaxy101 page, which is a good example, but you'll have to gloss over the bio details here, too.

https://main.g2.bx.psu.edu/u/aun1/p/galaxy101

And lest you think it's only about biology, it's being applied in several other domains: at Argonne, we are using it for high energy physics simulations.

From an early Wiley Chapter draft these were some drafts of tables but there are better ways now: Illustrations from Wiley Companion Chapter 5 Diagrams 4 & 5 <-- click for the pdf of a draft article