Proof of the pudding-Networks and q-exponentials
From InterSciWiki
In a series of projects under Testing the model: q exponential a whole group of network and complexity researchers -- Doug White (starting in 2004), Laurent Tambayong (2006), Cosma Shalizi (early 2007), Aaron Clauset (mid 200&), Haifeng Du, aided initially by Constantino Tsallis and Ernesto Borges -- started to investigate ways to estimate parameters of the q-exponential distribution for large- and small-sample continuous distributions (like city-sizes) and for network degree discrete distributions. The social-circles network model 2005 (also with Scott D. White and J. Doyne Farmer) was stimulated by discussions with Tsallis and the Soares et al. paper which he read as a preprint (2005), but which simulated networks as tree-structures with hubs.
At Doug's request, Cosma derived an MLE for the continuous case. We could not find an MLE derivation for the discrete case (involving small integers, like degree distributions).
In 2007 Aaron had written Matlab code for fitting power laws (Clauset, Shalizi, Newman 2007). After we started fitting and comparing q-exponentials to power laws, Aaron wrote Matlab code in December 2007.
We -- Laurent Tambayong (who did initial fitting), Aaron, and I (with Cosma as coauthor) then used this code of Aaron to fit the q-distribution parameters to city sizes and other continuous distributions in the Clauset, Shalizi, and Newman (2007) paper, and we produced a joint draft of a new paper.
Next, we were joined by Haifeng Du and his student Yue/Zhongshan in the analysis of discrete distributions using 14 200 person networks (same people, seven networks) to fit degree distributions using Aaron's argument. Although we know that these distributions do not have a known density function.... In doing so, the discrete data were fit to the same normalized from that was used in Cosma's paper for Pareto II (which has a parameter transformation to q and κ, which is part of Aaron's Matlab program.
Now, Haifeng sent the 14 networks and the program to Zhongshan Yue in Xi'an because Aaron's program needs lots of time to run (it took 4 days for all 14 networks). The analysis showed quite good KS probabilities for these networks, and in some cases much better than power laws, but also some PL better than q. The outdegree distributions have more power law distributions, better than q fits, which makes sense as activity differentials, and the indegree have better q-exponentials.
I will eventually put these tables on this wiki site. We now have a comparison too between whole networks and ego network degree distributions done by method 2 WikiSysopWikiSysop 13:13, 20 February 2008 (PST)
Method 2: Next, we investigated Ernesto Borges' and Tsalli's q-semilog fitting technique. Here Haifeng did the normalization CCDF of these 14 degree distributions using Cosma's (2007) technique for continuous data. We call this the q-semilog plot in which lnqX, with X the cumulative distribution, is plotted on the y axis and x on the x axis, with neither logged in the conventional since. When q gives a good fit, the data plot in a straight line for the best fit of q, and κ is the slope. We have this working perfectly and it MIGHT provide an unbiased estimate independent of sample size, but we don't know that for discrete distributions yet.
D. J. B. Soares, C. Tsallis, A. M. Mariz, and L. R. da Silva, Preferential attachment growth model and nonextensive statistical mechanics Europhys. Lett. 70, 70 (2005).
Doug asks:
- Is it time to think about SMALL SAMPLE CORRECTION for
- 1) continuous case?
- 2) discrete case?
Aaron: If I recall correctly, at some point last year Cosma and I discussed small-sample corrections for the power-law MLEs. In the continuous case, the error falls off like 1/n, which is already pretty small for n=50 (see Fig 4 in our paper). That being said, I think didn't conclude the discussion because we differed slightly on the precise mathematical form the correction factor should take. Also, the discrete MLEs don't seem to be as sensitive as the continuous ones (see Fig 6b in our paper), so things might be fine there.
Doug to Laurent, Haifeng:
- This is good news generally for our degree distribution analyses
- Also good for our city size dynamics, tho the smaller sized samples with have some biases
- Thing to see is whether the q-semilog method will give better estimates there
