Software: Kinship simulation

From InterSciWiki
Jump to: navigation, search
n=source("http://intersci.ss.uci.edu/wiki/pub/PajekR-KinSimGF.R") #open and copy to R console
source("http://intersci.ss.uci.edu/wiki/pub/KinSim1.R") #copy this line to R console, repeatedly

debug manual

Fig. 1: P-graph lines to parents drawn by Pajek: Red from wife/daughter, black lines from husband/son
Fig. 2: Drawn by R

Reprogramming task notes

New developments with a new simulator in R

Tolga:> Alright. I am now writing the code with three options given to the user. I'll pull the H and HH vectors from the adjacency matrix as well.

Doug>> Once you finish we'll write Vlado and ask him to write for us a macro that will give you the G and F data directly as well as the x-coords and the generational vector, all sent over into an R file that replace the file for the current toy dataset. Then, with no matrix manipulation on your code a new flat file for the simulated genealogy will be produced. If you want to be fancy since this is compact output you can set a user option for n replications of simulated data.

Then we can use the earlier "marriage census" tools we did several years ago and census the n simulated datasets. We can automate the comparison of the actual genealogy as a census in comparison to the statistical distribution of the simulated datasets.

Fancier still we can set our simulator to excluded certain type of marriage e.g., no FaBrDa etc marriages and others because these cover some of the "proscriptive rules" of the actual genealogy.

And so forth.

Doug to Vlado and Andrej> You will recall http://intersci.ss.uci.edu/wiki/pub/PajekR-KinSimGF.R which one or both of your wrote for us to develop a Kinship simulator in R, i.e., you gave us the matrix n (here 13x13) for a p-graph, the generations as an integer variables (y axis) and the x value of the graph. I wrote out by hand the function G (n=13 vector) that takes males to parents, and our simulators computes F for females to parents.

http://intersci.ss.uci.edu/wiki/index.php/Software:_Kinship_simulation was a graphic for a prototype program that we have now almost finished with our R code: it holds generations constant and takes all couples in each generation, permutes G or F or both to simulate a random reordering of the parents.

WHAT I WOULD LOVE NOW is if you could build a macro for us that computes generations, the x vector for east-west position of nodes, and then computes G and F for each node rather than the matrix n, and outputs into an R script.

You'll see in http://intersci.ss.uci.edu/wiki/pub/PajekR-KinSimGF.R that I added G by hand so we could do R script permutations of the parents of males, then later add the same for F.

Using vectors G and F and not the matrix as in n1 in script you wrote for us earlier, we will be able to do permutations for large pgraph networks.

We're going to use the "test for motifs" option in Pajek to do the marriage-type censuses according to what we did in 2005 with Klaus Hamberger -- but now for both the actual genealogy and simulations according to my 1999 article. Keeping the number of motifs fairly small, we should be able to so multiple simulations and compare their frequency distributions of motifs with the actual genealogy.

Introduction

This is an R routine that emulates the monte carlo/metropolitan simulation of Douglas R. White. 1999 Controlled Simulation of Marriage Systems. Journal of Artificial Societies and Social Simulation 2(3). As in the graph, nodes are marriages, red dotted line run from wife and/or daughter to parents, black solid lines from husband and/or son to parents. The goal of the simulation is to (1) randomize marriages in each generation (2) create a distribution of randomized marriages (3) compare the distributions of actual marriages to those of these randomized distributions as a test of the null hypothesis. A further possibility is to (4) add a probabilistic preferential bias for a certain type of marriage to the simulation, (5) generate a distribution ("bootstrap") from the preferential bias parameter (6) test the goodness of fit of the observed data to the bootstrap distribution. The TIPP Kinship and computing project in anthropology and history, with software developed by Douglas R. White and Klaus Hamberger further supported by the French Agence National de la recherche (ANR), PIs Michael Houseman and Cyril Grange, provides the software for computing the rates and frequencies of different kinds of marriage from small to extremely large genealogical datasets.

  1. 2004 Ring Cohesion Theory in Marriage and Social Networks. Mathématiques et sciences humaines 43(168):5-28 http://www.ehess.fr/revue-msh/recherche.php?numero=168 http://eclectic.ss.uci.edu/download/MarriageNetTools.htm
  2. 2004 Matrimonial Ring Structures (Klaus Hamberger, Michael Houseman, Isabelle Daillant, Douglas R. White and Laurent Barry). Mathématiques et sciences humaines 43(168):83-121. TIPP Kinship and computing replaces: http://eclectic.ss.uci.edu/download/MarriageNetTools.htm
  3. InterSci wiki R Paper Examples: Using R to replicate a published study#Simulation

Export Pajek to R

Given a P-graph-format kinship network in Pajek, the /Network/Partition/Depth/Generation option computes a partition of generations. The /Network/Partition/Depth/Genealogical option also computes a partition of generations but for large networks often gives two extra generations beyond the minimum number required. Generations can also be computed from birth or marriage dates and grouped into decades or other temporal intervals. The simulation requires some basis for assuming that it includes those who could have made other marriage choices within this temporal frame.

Both the network and the generation partition are exported to R. Read the *.paj into RECENT VERSION OF Pajek (N=13 nodes), /Partion/to Vector and export into R by Tools/R/Send to R/All Networks and Vectors - v1 is the generation vector, and adjacency matrix n1 and net are the square value matrices.

To replicate the 2D visualization of the pajek screen, use Net/Vector/Get Coordinate for the x and y coordinates.

R

Each married person has a one or more marriage links that are top-wired to the same parental node, and that connects to a spouse below.

The R routine allows three options

  1. Rewire all women's marriages
  2. Rewire all men's marriages
  3. Rewire woman's and men's marriages

Egos in the category selected retain their parental link and are disconnected from their spouse links, then randomly rewired to a detached spouse link in the generation of ego's marriage.

iterating through parental generations 1-N
  1. For each ego in the category selected (male/female/either), the parent generation, parent node, child index, and marriage node is saved in an array.
  2. For this parent generation, each child of each parent node is rewired to a randomly selected empty marriage node in the saved array. That node is filled before the next rewiring.

End

A test file for pajek

As R: http://intersci.ss.uci.edu/wiki/pub/PajekR-testSim.R

KinSim.paj:

*Network KinSim
*Vertices     13
1
2
3
4
5
6
7
8
9
10
11
12
13
*Arcs
     13        5 1 c Black p Solid
      1        5 1 c Black p Solid 
      1        6 2 c Red p Dots 
      2        6 1 c Black p Solid 
      2        7 2 c Red p Dots 
      3        7 1 c Black p Solid 
      3        8 2 c Red p Dots 
      4        8 1 c Black p Solid 
      4        5 2 c Red p Dots 
      5        9 1 c Black p Solid 
      5       10 2 c Red p Dots 
      6       10 1 c Black p Solid 
      6       11 2 c Red p Dots 
      7       11 1 c Black p Solid 
      7       12 2 c Red p Dots 
      8       12 1 c Black p Solid  
      8        9 2 c Red p Dots 
*Partition (partition into generations)
3
1
1
1
1
2
2
2
2
3
3
3
3

Load Pajek data into R

RIGHT CLICK and unzip for KinSim3gen.paj RIGHT CLICK and unzip to get all Pajek files

-----------------------------------------------------------------------
The following networks/matrices read:
n1  :  KinSim3gen (13)
The following vectors read::
v1  :  From partition 1 (13)
v2  :  x-coordinate of N1 (13)
v3  :  z-coordinate of N1 (13)
Use objects() to get list of available objects           
Use comment(?) to get information about selected object  
Use savevector(v?,'???.vec') to save vector to Pajek input file  
Use savematrix(n?,'???.net') to save matrix to Pajek input file (.MAT) 
    savematrix(n?,'???.net',2) to request a 2-mode matrix (.MAT) 
Use savenetwork(n?,'???.net') to save matrix to Pajek input file (.NET) 
    savenetwork(n?,'???.net',2) to request a 2-mode network (.NET) 
Use v?<-loadvector('???.vec') to load vector(s) from Pajek input file  
Use n?<-loadmatrix('???.mat') to load matrix from Pajek input file  
-----------------------------------------------------------------------

savenetwork, savevector, savematrix are programs... to pass networks back to Pajek

Read the *.paj into RECENT VERSION OF Pajek (N=13 nodes), /Partion/to Vector and export into R by Tools/R/Send to R/All Networks and Vectors - v1 is the generation vector, and adjacency matrix n1 and net are the square value matrices. See Export Pajek to R for v2, v3 coordinates.

The commands below are in SNA, by Carter Butts.

Fig. 3: Export to R image, as above, commands below by SNA, Carter Butts
Carter Butts SNA Manual

Syntax of multiple embedded for loops | Saved:Looped version | Turkish Nomads

Copy and run

#Step 1 - copy and paste R code from the following to your R console
http://intersci.ss.uci.edu/wiki/pub/PajekR-KinSimGF.R   # testSim.R    #PajekR.r from Pajek MUST have v2=the x dim vector
#A. CUT AND PASTE DATA FROM THE NEXT LINE
library(sna)
net=source("http://intersci.ss.uci.edu/wiki/pub/PajekR-KinSimGF.R")    #C:/Program Files/pajek/PAJEK//PajekR-testSim.R")  #PajekR.r
gplot(n1,  coord = cbind(4*v2,v1),gmode = "digraph", vertex.col=v1, edge.col=c(1,2), vertex.cex=2, label = c(1:dim(n2)[2])) #mode = coord
source("http://intersci.ss.uci.edu/wiki/pub/KinSim1.R")
dim(n2)
#B. CUT AND PASTE CODE
n2=n1
gplot(n1,  coord = cbind(4*v2,v1),gmode = "digraph", vertex.col=v1, edge.col=c(1,2), vertex.cex=2, label = c(1:dim(n2)[2])) #mode = coord
#skip these
#gplot(n1,  coord = cbind(v1,1-v1),gmode = "digraph", vertex.col=v1, vertex.cex=2) 
#gplot(n1, mode = "fruchtermanreingold",gmode = "digraph", vertex.col=v1) #mode = 
#gplot(n1, mode = "mds",gmode = "digraph", vertex.col=v1) #mode = "fruchtermanreingold"
#gplot3d(n1, mode = "mds",gmode = "digraph", vertex.col=v1) #mode = "fruchtermanreingold"
#NOW DO SIMULATION
is.matrix(n1)
is.matrix(n2) 
dim(n2)
lensqrt=length(n1)^.5
library(gtools)
m=max(v1)
#WITH REPETITION START HERE
n2=n1
m3=n2
for (igen in 1:2) {#0 m3 for each generation WORKS FOR 1,2 BUT NOT BOTH AND ARROWS IN 2:2 (upper are backwards; lower 1 generation back to original)
n2 <- m3
numnodes=0
#create permutation
F1=0 #female row i save FOR THIS GENERATION
for(i in 1:lensqrt) {#1         i = CHILD
 for(j in 1:lensqrt) {#2        j = MOTHER
                     if (v1[i]==igen & n2[i,j]==2) {#3
                     numnodes=numnodes+1
                     F1[numnodes] <- i 
                     }#3 put CHILD row i in the list to permute PARENT 
 }#2
}#1
numnodes=0
#save overlaps to avoid in WiPar and HuPar lists in a0
a0=0 #child row i save FOR THIS GENERATION
for(i in 1:lensqrt) {#1         i = CHILD
 if (v1[i]==igen) {#2
 for(j in 1:lensqrt) #        j = WiPar
                     if (n2[i,j]==2) {numnodes=numnodes+1 #3
 for(k in 1:lensqrt) #        k = HuPar
                     if (n2[i,k]==1) a0[numnodes] <- k
                     }#3 put CHILD row i in the list to permute PARENT 
}#2
}#1
a0
gplot(n1,  coord = cbind(4*v2,v1),gmode = "digraph", vertex.col=v1, edge.col=c(1,2), vertex.cex=2, label = c(1:dim(n2)[2])) #mode = coord


#ADD NEW PARENTAL LINKS
numnodes=0
FJ=0 #female col j save FOR THIS GENERATION
for(i in 1:lensqrt) {#1
 if (v1[i]==igen) {#2
 for(j in 1:lensqrt) 
                    if (n2[i,j]==2)     {#3
                    numnodes=numnodes+1  
                    FJ[numnodes] <- j    }#3
                    #FJ[j] <- numnodes   }#3 #WRONG DIRECTION??
                    #3 put PARENT col j in list with same order as CHILD i
 }#2
}#1
gplot(m2,  coord = cbind(4*v2,v1),gmode = "digraph", vertex.col=v1, edge.col=c(1,2), vertex.cex=2, label = c(1:dim(n2)[2])) #mode = coord
numnodes 
#PERMUTATION FINISHED HERE
#?("if")  ?("for") help - not much use
m2=n2
num=length(F1)
f1=0
f2=0
fj=0
f1  <- F1[1:num] #to permute i list
fj  <- FJ[1:num] #matching j list
f1 #rows to permute
f2 <- permute(f1) #permute to permute list
f2 #these are the permuted-i locations
 #Permutation for Sibling marriage HuPar=WiPar avoided
 nSib=0
 for(i in 1:num) if (a0[i]==f2[i]) nSib=nSib+1
 #(
 #Permutation with avoidance
 #f2 <- permute(f1) ########permute to permute list
 #}
#Permutation with avoidance
#for(k in 1:Len1) {
#F3=0
#F3=f2
#f2 <- permute(f1) ########permute to permute list
#permutation created for columns
numnodes
#CREATE NEW NETWORK
#delete the old ties of type 2
for (i in 1:lensqrt) {#1
if (v1[i]==igen) {#2
 for (j in 1:lensqrt) 
                     if (n2[i,j]==2) m2[i,j] <- 0
 }#2
}#1
#create permuted tie as type 2 #3
for (i in 1:num) m2[f2[i],fj[i]] <- 2 #3 #OT THE RIGHT ROW AND COLUMN?
#for (i in 1:num) m2[fj[i],f2[i]] <- 2 #3 #IS THIS RIGHT FOR ROW AND COLUMN?

#http://rss.acs.unt.edu/Rdoc/library/sna/html/gplot.html
m3=m2
}#0


gplot(m3,  coord = cbind(4*v2,v1),gmode = "digraph", vertex.col=v1, edge.col=c(1,2), vertex.cex=2, label = c(1:dim(n2)[2])) #mode = coord


F1
f1
f2
a0
FJ
fj
matrix (m3,13,13)
#if successful you can repeat with
source("http://intersci.ss.uci.edu/wiki/pub/KinSim1.R")
Fig. 4: After permutation

Matrix

matrix(n1,13,13)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]    0    0    0    0    1    2    0    0    0     0     0     0     0
[2,]    0    0    0    0    0    1    2    0    0     0     0     0     0
[3,]    0    0    0    0    0    0    1    2    0     0     0     0     0
[4,]    0    0    0    0    2    0    0    1    0     0     0     0     0
[5,]    0    0    0    0    0    0    0    0    1     2     0     0     0
[6,]    0    0    0    0    0    0    0    0    0     1     2     0     0
[7,]    0    0    0    0    0    0    0    0    0     0     1     2     0
[8,]    0    0    0    0    0    0    0    0    2     0     0     1     0
[9,]    0    0    0    0    0    0    0    0    0     0     0     0     0
10,]    0    0    0    0    0    0    0    0    0     0     0     0     0
11,]    0    0    0    0    0    0    0    0    0     0     0     0     0
12,]    0    0    0    0    0    0    0    0    0     0     0     0     0
13,]    0    0    0    0    1    0    0    0    0     0     0     0     0

Still errors

#A DEBUG PROBLEM HERE WITH THE DIRECTION OF ARROWS SOMETIMES RECIPROCAL
#OCCASIONALLY ONE SPOUSE GETS LOST (Either generation)
#IS THE NUMBERING OFF BY ONE?
# gplot(m2,  coord = cbind(4*v2,4-v1),gmode = "digraph", vertex.col=v1, vertex.cex=2) #mode = coord

Objects created in R

n1
   1 2 3 4 5 6 7 8 9 10 11 12
1  0 0 0 0 1 2 0 0 0  0  0  0
2  0 0 0 0 0 1 2 0 0  0  0  0
3  0 0 0 0 0 0 1 2 0  0  0  0
4  0 0 0 0 2 0 0 1 0  0  0  0
5  0 0 0 0 0 0 0 0 1  2  0  0
6  0 0 0 0 0 0 0 0 0  1  2  0
7  0 0 0 0 0 0 0 0 0  0  1  2
8  0 0 0 0 0 0 0 0 2  0  0  1
9  0 0 0 0 0 0 0 0 0  0  0  0
10 0 0 0 0 0 0 0 0 0  0  0  0
11 0 0 0 0 0 0 0 0 0  0  0  0
12 0 0 0 0 0 0 0 0 0  0  0  0

v1
[1] 1 1 1 1 2 2 2 2 3 3 3 3
v2
[1] 0.6658 0.4858 0.3058 0.1258 0.7172 0.5372 0.3572 0.1772 0.7674 0.5874
[11] 0.4074 0.2274 0.8229

Some functions

n2=as.numeric(n1)
n2
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#[38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 0 0 0
#[75] 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#[112] 2 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0
#[149] 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
length(n2)
#[1] 169

Output n1, i=0,71 has a square matrix in link list, including a zero diagonal n2=as.numeric(n1)

n1[3,7] #is row 3 (child) column 7 (parent)
for (i in 1:13) {for (j in 1:13)
{
{ print (i,j, n1[i,1])} 
} }

Afterwards

http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/HowToR.htm

Compilation drawing on gfortran?

http://r.research.att.com/tools/

Permutation code in R

Generation permutation code

OBSOLETE FROM HERE- testpit - to write the R program

The place to start is with a program created by Pajek when files are written to R, such as:

http://intersci.ss.uci.edu/wiki/pub/PajekR.r #(this one has only 12 nodes)

Old Fortran code

Although its old Fortran spagetti code, http://intersci.ss.uci.edu/wiki/pub/PGRAPH-KinSim.txt is a reminder that it was highly efficient for large-scale simulations to use two functions:


Simulation code in progress

  • F(i)=returns the parental node of a female
  • G(i)=returns the parental node of a male

let i=0,N-1 be the nodes in the p-graph. And g=1,Gen be the generation partitions (v1), 1 the top ancestors AND DONT FORGET TO EXPORT THE X COORD

net=source("C:/Program Files/pajek/PAJEK/PajekR.r")
library(sna)
gplot(n1,  coord = cbind(4*v2,4-v1),gmode = "digraph", vertex.col=v1, vertex.cex=2) #mode = coord
# n2= # matrix of links between marriages 1=Gmale 2=Ffemale
%REPLACE the value 2 elements in n2 with those of permute(f1)
is.matrix(n1)
[1] TRUE
is.matrix(n2)
[1] TRUE
dim(n2)
[1] 13 13
 






n2=n1
m2=n2
F1=0
F2=0


ro=0
co=0
numnodes=0
for (i in 1:(length(n2)-1)) {if (n2[i]==2) #if a 2 here at i
{numnodes=numnodes+1};
{F1[numnodes] <- i}; #then put i in the permutation list
#{F2[numnodes] <- co[i]}; #remember the column location
}
numnodes 
dimsq=length(n2)^.5
f1  <- F1[1:numnodes]
f1
library(gtools)
f2 <- permute(f1) #permute the permutation list
f2 #these are the permuted-i locations
#http://www.math.montana.edu/Rweb/Rhelp/00Index.html
m2=n2 # make the array copy
numnodes=0
for (i in 1:(length(n2)-1)) {if (n2[i]==2) #if a 2 here at i (again)
m2[i] <- 0;  #then zero this location in the array copy (this works)
for (i in 1:length(n2)) {ro[i+1] <- (i+0)%/%(dimsq+0)};
for (i in 1:length(n2)+0) {co[i+1] <- 1+(i%%dimsq)}  #modulo
}
ro
co


numnodes


numnodes=0
for (i in 1:(length(n2)-1)) {if (n2[i]==2) #if a 2 here at i (again)
{numnodes=numnodes+1;  #this works
m2[f2[numnodes]+1] <- 2}  # put a 2 in a permuted-i location (DOESNT WORK)
}
#m2[f2[numnodes]+1] <- 2} is INTENDED to put 2s back with COLUMNS permuted, ROWs the same
#I see my error -- it IS putting them back, right where they were before
#I need now to transfer the vector code into a row column location
dimsq #right=13 in this case
i=14
co[i]
ro[i]
numnodes 
m3=as.numeric(m2)
n3=as.numeric(n2)
n3
m3
#check f2
f3=0
for (i in 1:numnodes) {f3[i]=f2[i]}
f1
f2
f3
  • AND ADD A FOR LOOP TO DO THIS SEPARATELY FOR EACH GENERATION
  • This was pseudocode en route toward R code for the permutation problem
   write "error: no more than one parental couple"
N=length(n1)   #169 matrix entries 13x13
n=length(n2)   #169 matrix entries (numeric)
nr=n^.5         #13 row/col length = number of marriages = 13 in this example
ro=1 #initialize row vector
co=1 #initialize col vector
for (i in 1:length(n2)) {ro[i]<-i%/%13}
for (i in 1:length(n2)) {co[i]<-i%%13}
ro #check
co #check

Permute female marriages: R base functions - http://www.faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_Programming.html

Gen=max(v1)
n=length(n2)
F=1 #initialize parental node of a female
#for (g in 1:Gen) {
 #Get F,G links
 for (i in 1:length(n2)) {F[i]=0} # check0
 for (i in 1:length(n2)) {if (n2[i]==2) F[i]<-co[i]}
 F1=NULL #initialize F array location for a generation G
 F1=NA #initialize F array location for a generation G
 F2=NULL #initialize F array location for a generation G
 F2=NA #initialize F array  parental node in generation G m2=n2
numnodes=0
for (i in 1:(length(n2)-1)) (if (n2[i]==2) 
 numnodes=numnodes+1;
 F1[numnodes] <- i; 
 F2[numnodes] <- co[i]; 
 m2[i]==n2[i];
 m2[n2[i]]==0;
 }
numnodes 
f1  <- F1[1:numnodes]
f1
library(gtools)
f2 <- permute(f1)
f2


G<-c(5,6,7,8,9,10,11,12,0,0,0,0,5)
Gran<-G
z<-which(Gran!=0) #not equal to zero
v<-sample(Gran[z]) #permute those
Gran[z]<-v              #put permuted back in Gran

You need not do this for F. You can permute G alone, F alone, or F and G. Should be able to mark certain kinds of relatives as unmarriageable as well, save their array of numbers, convert to 0s,

Will it work to permute the remainder, put the unmarriageables back where they were?


x = c(1,2,0,0,3,0)
idxs = which(x > 0)
permutation = sample(x[idxs])
x[idxs] = permutation