# Network tools

These are instructional and resource pages so add, reorganize, explain!

Graph exploration: GUESS freeware and JUNG open source

Large networks: Pajek freeware

Python for networks: plus Networkx open source

R for networks: R open source

## Contents

## R software for cohesive blocking

See Peter McMahan's page for R code for the cohesive blocking algorithm introduced by James Moody and Douglas R. White, 2003, in Structural Cohesion and Embeddedness: A Hierarchical Conception of Social Groups *American Sociological Review* 68(1):1-25, which won the 2004 Outstanding Article Award in Mathematical Sociology, American Sociological Association.

This measure of structural cohesion was also used in 2005 by W. W. Powell, D.R. White, K. W. Koput & J. Owen-Smith in Network Dynamics and Field Evolution: The Growth of Interorganizational Collaboration in the Life Sciences *American Journal of Sociology* 110(4):901-975, which won the Viviana Zelizer Best Paper in Economic Sociology Award (2005-2006), American Sociological Association.

## Matlab programs

http://www.indiana.edu/~cortex/connectivity_toolbox.html Network data and algorithms for Brain connectivity (Olaf Sporns)

## SNA

SNA, Social Network Analysis R package, downloadable in R/CRAN, by Carter Butts

## StatNet packages

StatNet More advanced: a collection of functions to fit, simulate from, plot and evaluate exponential random graph models. The main function within the statnet package is the ergm function. This is designed to fit linear exponential random graph models in which the probability of a graph is dependent upon a vector of graph statistics specified by the user; it can return either a maximum pseudo-likelihood estimate or an approximate MLE based on a Monte Carlo Markov Chain scheme. A second commonly used function is simulate, designed to simulate exponential random graphs, using an ergm model, in which the graph is dependent upon a vector of graph statistics and associated parameters specified by the user; it returns a realization of the graph based on a draw from a Monte Carlo Markov Chain. statnet contains many other functions as well; for a guide to the basic types of functionality these provide, see User’s Guide – Basic Functionality of ERGM. When you download StatNet you actually get

- statnet - a base package for fitting, assessing, and simulating from exponential random graph models
- network - a base package to create, store, modify and plot the data in network objects. Within the statnet framework, network is the class of objects in which social network data are stored. The network object class can represent a range of relational data types, and supports arbitrary vertex/edge /graph attributes. Data can be stored as network objects and then analyzed using statnet.
- sna - a recommended range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, p* modeling, random graph generation, and 2D/3D network visualization.
- netdata – an optional collection of network data sets derived mainly from other network analysis program formats (like Pajek's .net format). SEEMS NOW TO BE PART OF
**statnet or latentnet** - latentnet - a recommended collection of functions to fit, simulate from, plot latent space and latent cluster models. Latentnet is a package to fit and evaluate latent position and cluster models for statistical networks. Networks should be stored as network objects. Hoff, Raftery and Handcock (2002) suggested an approach to modeling networks based on positing the existence of an unobserved latent space of characteristics of the actors. Relationships form as a function of distances between these characteristics as well as functions of observed dyadic level covariates. In latentnet social distances are represented in a Euclidean space. It also includes the extension of the latent position model to allow for clustering of the positions developed in Handcock, Raftery and Tantrum (2007). The package can compute maximum likelihood estimates and Bayesian posterior distributions for the parameters. It computes four types of point estimates for the coefficients and positions: maximum likelihood estimate, posterior mean, posterior mode and the estimator which minimizes Kullback-Leibler divergence from the posterior. You can assess the goodness-of-fit of the model via posterior predictive checks. It also makes it possible to simulate networks from a latent position model.
- dnet - an optional collection of functions to fit, simulate from, plot models for skewed count distributions.
- netperm - an optional collection of functions to fit network comparison model described in Butts, C., "Permutation Models for Relational Data" (IMBS TR 05-02, 2005).

http://csde.washington.edu/statnet/tutorial.shtml tutorials http://csde.washington.edu/statnet/statnet_tutorial.pdf

[1] Mark Handcock pdf slides (from ppt) on StatNet modeling

## Additional R packages for network data in statnet

(From StatNet:) Currently, there are three other R packages available that conduct analyzes on social networks; these are not required to use the base statnet package, but they are recommended since many users will find the functionality in them useful. They are:

- sna: The set of tools for social network analysis developed by Carter Butts.
- netperm: Permutation Models for Relational Data developed by Carter Butts.

dnet: For fitting probability distributions to count data.

[2] Permutation Models for Relational Data (2005)
Loading R packages

## Degree distributions

dnet (inside StatNet) is for fitting probability distributions to count data. The package is intended for use in modeling the degree distributions of statistical networks. It includes power-law models such as the Yule and Waring, as well as a broad range of models that have been proposed in the literature.

DRW> to Mark Handcock who programs dnet: Who is it that is writing your MLE code for degree distributions? Is that you? How far along is the project? What about documentation? Tutorial?

MH Response 19:00, 6 July 2007 (PDT): I am writing the code. It is mostly in R and uses a bit of FORTRAN for special functions. It has reached a plateau in terms of code development as I have written four papers on this topic and am moving on to other areas. It is open source code, though, and when I put it on CRAN it will be easy for folks to improve and extend it.

I intend to write a short paper for JOSS as a tutorial doing the examples in some of the earlier papers.

If the PMF (Probability mass function, i.e., discrete data function not continuous but may be cumulated) that is used here (for degree distributions) is the continuous version of the Pareto used just on the discrete counts, I do this with the "dp" model in "dnet". It stands for "discrete Pareto/Zipf law" See "adpmle".

DRW>: an MLE for the Pareto II?

MH: What is the PMF for this?

DRW>: Shalizi wrote an MLE for the Pareto II but designed for city size distributions (a single vector data input, cities ranked by size), and not for degree distributions (requiring two vectors, x for degree and y for frequency). Its at http://www.cscs.umich.edu/~crshalizi/research/tsallis-MLE/

Its not entirely satisfactory as there is a bug that causes estimates of the scale parameters to be divided by a large number and also distort the graph.

MH: "dnet" is easily to write for and I will code it up if I get a chance.

DRW>: Great, thanks, that will be a help to network researchers. I will ask Cosma for the pmf for Pareto II?

Cosma Shalizi>: 14:59, 10 July 2007 (PDT) The following is typeset in latex at Cosma station

The pmf is proportional to the density for the continuous case, so

p(x) = C (1+x/\sigma)^{-\theta-1} (using \theta-1 instead of \theta just so that it looks like the pdf of the continuous case). Now, assume that the range of x goes from k to infinity. To find the proportionality constant C, which depends on the parameters \sigma and \theta, use the fact that probabilities must sum to one: 1 = C \sum_{x=k}^{\infty}{1+x/sigma)^{-\theta-1}} = C\sum_{x=k}^{\infty}{(\sigma+x)/\sigma)}^{-\theta-1}} = C\sum_{x=k}^{\infty}{\sigma/(\sigma+x)}^{\theta+1}} = C\sigma^{\theta+1}\sum_{x=k}^{\infty}{1/(\sigma+x)^{\theta+1}} = C\sigma^{\theta+1}\sum_{y=1}^{\infty}{1/(\sigma+y+k-1)^{\theta+1}} = C\sigma^{\theta+1}\zeta(\theta+1,\sigma+k-1) where \zeta is the Hurwicz zeta function (which generalizes the Riemann zeta function). So p(x) = (sigma+x)^{-theta-1} / \zeta(\theta+1,\sigma+k-1)

Setting up the estimating equations is possible here but it involves taking derivatives of the zeta function, which are not noticeably well-behaved, and my experiments suggest that direct numerical optimization is actually more stable.

## Kolmogorov-Smirnov test

The K-S test is often used to compare distributions.

## Network visualization

Network visualization bibliography

## Other

Cohesion of Simplicial Complexes

See also QuikStart R for comparative research

Scatter analysis http://www-personal.umich.edu/~ladamic/papers/infoscatter/InformationScatter.pdf

http://www.benkler.org/Benkler_Wealth_Of_Networks_Chapter_7.pdf