\documentclass[twocolumn,pre,superscriptaddress]{revtex4}
\usepackage{dcolumn}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
%\usepackage{hyperref}
\newcommand{\e}{{\rm e}}
\newcommand{\dx}{{\rm d}x}
\newcommand{\dy}{{\rm d}y}
\newcommand{\etal}{{\it{}et~al.}}
\newcommand{\xmin}{x_{\mathrm{min}}}
\newcommand{\ntail}{n_{\mathrm{tail}}}
\begin{document}
\title{$q$-Exponential Distributions in Empirical Data}
\author{Laurent Tambayong}
\affiliation{Institute for Mathematical Behavioral Sciences, University of California, Irvine CA 92697 USA}
\author{Aaron Clauset}
\affiliation{Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501 USA}
\author{Cosma R. Shalizi}
\affiliation{Statistics Department, Carnagie Mellon University, Pittsburgh PA 15213 USA}
\affiliation{Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501 USA}
\author{Douglas R. White}
\affiliation{Institute for Mathematical Behavioral Sciences, University of California, Irvine CA 92697 USA}
\affiliation{Anthropology Department, University of California, Irvine CA 92697 USA}
\affiliation{Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501 USA}
\begin{abstract}
Heavy-tailed distributions, such as the power law, continue to receive a
strong degree of interest throughout the sciences. Here we develop and test
novel statistical methods for estimating and testing the $q$-exponential
distribution (equivalently, the unshifted Pareto II distribution) in
empirical data, which also exhibits a heavy-tail, and for comparing it
against other simple models. We then consider several data sets recently
analyzed with comparable tools for the strict power-law distribution, and
find that the $q$-exponential provides a significantly better fit for six
readily-available data sets, covering the sales of books, the area of forest
fires, the frequencies of surnames, the frequencies of bird sightings, the
extent of blackouts, and the populations of cities.
\end{abstract}
\maketitle
% --- introduction
%\section{Introduction}
%\label{sec:intro}
Scientists have long been interested in using probability distributions to
model the frequencies with which various events occur in data, assessing the
fit between model and data, and using the fitted distributions as clues as to
the underlying data-generating mechanisms. For example, since Tsallis
\cite{tsallis:1988} introduced $q$-exponential distributions as a possible
generalization of Boltzmann-Gibbs distributions to situations with
non-extensive, long-range interactions, there have been over 2000 publications
on the subject [[add URL]], a large fraction of which involved fitting those
distributions to data. Surprisingly, estimating and comparing distributions
remains something many scientists do poorly, despite its age, ubiquity and
importance, and the vast statistical literature \cite{wasserman} providing
guidance on the practice. The purpose of this paper is two-fold. First, we
show that $q$-exponentials, correctly estimated and tested, are at least
competitive with more familiar families of distributions as descriptions of
certain kinds of real-world data. Second, we provide an example of valid
procedure in fitting and comparing distributions, with some explanation of {\em
why} such methods should be preferred to common but fallacious techniques.
There should be no doubt that many familiar practices in this domain are
fallacious. Some of the most widespread of these errors include:
\begin{enumerate}
\item Inappropriate use of least-squares curve-fitting or regression techniques
to fit distributions, despite the absence of nearly all the conditions needed
for regression to work \cite{berk}, up to and including a fundamental
difference between the intrinsic geometry of probability distributions
\cite{kass:vos:1997,kulhavy:1996} and the geometry of vectors in Euclidean
space;
\item Preferring the model whose central tendency most closely matches
observations, regardless of the magnitude of the fluctuations the model
predicts around those central tendencies, or the possible presence of {\em
systematic} trends in the discrepancy between prediction and observation,
and using some measure of the size of the residuals as an index of fit (most
commonly $R^2$, the ratio of the variance predicted by the model to the total
observed variance);
\item Substituting the rejection of a trivial null model for a direct
assessment of the adequacy of a substantive model, a debate called the
significance-test controversy \cite{meehl:1967} that has never been fully
laid to rest.
\end{enumerate}
While the significance-test controversy is largely confined to the social and
applied sciences, the former two errors are commonly made even by
mathematically sophisticated physicists. Among the latter, for example, it is
a nearly universal belief that fitting a straight line to a log-log plot, by
minimizing the sum of the squared residuals, is a reliable way to estimate a
power law, and that the $R^2$ of the resulting regression provides a good test
of the fit. Clauset \etal, in their Appendix A.2, show just how severely
unreliable this approach can be \cite{clauset:etal:2007}, even though it is in
common use in both the social and the physical sciences. To give just one
example, hundreds if not thousands of papers have been published claiming a fit
between a power law and the frequency distribution of the degrees of nodes in
networks on exactly such a basis. These claims may be right, but, because the
methods used are unreliable, these publications provide little reason for
thinking so.
Clauset \etal, in \cite{clauset:etal:2007}, described objective,
demonstrably-reliable methods for estimating and validating distributional
models, with particular reference to power laws, also known as type-I Pareto
distributions, compared to other models of heavy-tailed or highly right-skewed
data. Here we follow their procedures and extend their results to type-II
Pareto distributions, now widely known as $q$-exponentials, describing the
maximum likelihood estimate of the parameters, goodness of fit tests, and tests
against specific competing distributions. We apply these methods to several
readily-available data sets (populations of cities, sales of books, forest fire
sizes, blackouts, bird species sightings, frequencies of surnames), for all of
which \cite{clauset:etal:2007} found at least moderate support for a power law
distribution of the right or upper tail. In all six cases, we
find that $q$-exponentials provide a superior fit.\footnote{Computer code
implementing many of the analysis methods described in this
paper can be found online at \\
%\url{http://www.santafe.edu/~aaronc/qexponentials/}.
{\tt http://www.santafe.edu/$\sim$aaronc/qexponentials/}.}
\section{The $q$-exponential is a subfamily of the Pareto II distribution}
Tsallis \cite{tsallis:1988} introduced a new entropy measure, called
$q$-entropy, which, unlike the usual Boltzmann-Gibbs-Shannon entropy, is
non-extensive, with the parameter $q$ measuring the departure from extensivity;
as $q\rightarrow 1$, the $q$-entropy converges to the ordinary entropy. Just
as maximizing the usual entropy under expectation-value constraints leads to
exponential distributions, constrained maximization of the $q$-entropy leads to
its own family of distributions, called the $q$-exponentials. Tsallis argued,
on theoretical grounds, that the $q$-entropy is the appropriate generalization
of equilibrium statistical mechanics to systems with long-range interactions,
and that such maximization provided a general explanation of the ubiquity of
distributions with power-law tails in empirical data. (We do not enter here
into the intricate debate on these theoretical claims; as will become clear, it
is logically independent of our concerns.)
The functional form of $q$-exponentials, however, has a much older history. A
simple transformation of variables shows that the $q$-exponential distribution
is a subfamily of the Pareto II distribution, which has been part of the
statistical literature since at least 1952
\cite{mcguire:etal:1952,silcock:1954,harris:1968}, where it arises from
mixtures of (ordinary) exponential distributions with fluctuating
characteristic scales. For clarity, we briefly review the definition of the
$q$-exponential distribution and its connection with the Pareto II, which was
first shown by Shalizi \cite{shalizi:2007}.
The $q$-exponential and its inverse the $q$-logarithm \cite{tsallis:2004} can
be derived in several ways, including as the solution to the differential
equation $\dy / \dx = y^{q}$ with $y(0)=1$. Its solution, defined as the
$q$-exponential function, is
% AC: should generalize this formula to include a \kappa in order to agree with the comp. CDF below
\begin{align}
\e_{q}^{x} = [1+(1-q)x]^{1/(1-q)} \enspace ,
\label{eq:qexp:defn}
\end{align}
when \mbox{$1 + (1-q)x > 0$}, and \mbox{$e^{x}_{q} = 0$} otherwise. By
continuity, we take \mbox{$\e_{1}^{x} = \e^{x}$}, which of course can be
extended to all values of $x$.
The inverse of the $q$-exponential is the $q$-logarithm:
\begin{align}
\ln_{q} x = \frac{x^{1-q}-1}{1-q} \enspace ,
\label{eq:qlog:defn}
\end{align}
where \mbox{$\ln_{1} = \ln x$}. A short calculation verifies that the
\mbox{$q$-exponential} and the \mbox{$q$-logarithm} are indeed inverse
functions, i.e., \mbox{$\ln_{q}(\e_{q}^{x}) = x$}.
Combining the $q$-exponential function with its appropriate normalization
factor, we obtain the $q$-exponential distribution $p_{\e_{q}^{x}}(x)$, a
generalization of the Boltzmann distribution,
\begin{align}
p_{\e_{q}^{x}}(x) & = (2-q)[1-(1-q)x/\kappa]^{1/(1-q)} \enspace ,
\label{eq:qexp:pdf}
\end{align}
where $\kappa$ is a scalar coefficient of the same dimensions as $x$. The role
of $\kappa$ is two-fold; on the one hand it ensures that only a dimensionless
ratio $x/\kappa$ enters into the probability distribution, and on the other it
sets a characteristic scale for the cross-over from a relatively constant
density (when $x \ll \kappa$) to one that asymptotes to a power-law tail (when
$x \gg \kappa$), with exponent $1/(1-q)$.
The complementary cumulative distribution function (cdf) for this distribution
is also a \mbox{$q$-exponential} \cite{tsallis:2004} and has the form
\begin{align}
P_{q,\kappa}(X \geq x) & = \left[1 - \frac{1-q}{\kappa}x\right]^{\frac{1}{1-q}} \enspace .
\label{eq:qexp:cdf}
\end{align}
This form, however, is a special case of the Pareto II distribution
\begin{align}
P_{\theta,\sigma}(X\geq x) = [1+(x-\mu)/\sigma]^{-\theta} \enspace ,
\label{eq:pareto:cdf}
\end{align}
where $\theta$ is the exponent of the power-law tail of the distribution, $\mu$
is the shift or location parameter, and $\sigma$ is the scale parameter that
controls the width of the distribution's body. By letting \mbox{$\theta =
-1/(1-q)$}, \mbox{$\sigma = \kappa \theta$} and $\mu = 0$ in the Pareto II
distribution (Eq.~\ref{eq:pareto:cdf}), we recover the \mbox{$q$-exponential}
(Eq.~\ref{eq:qexp:cdf}).\footnote{The ordinary power law, or type-I Pareto, is
another special case of the Pareto II, where $\mu = \sigma$.}
This tight relationship thus allows us to use the previously derived maximum
likelihood estimators for a Pareto II distribution as estimators of the
parameters for the $q$-exponential \cite{shalizi:2007}.
\section{Fitting a $q$-exponential to empirical data}
\label{sec:fitting}
The method of maximum likelihood estimation picks parameters that maximize the
probability of the observed data under the chosen model; this probability is
the likelihood. The main reason to do so are results from probability theory
which guarantee that, under reasonable but not automatic conditions, maximum
likelihood estimates (MLEs) are reliable in two senses. The first, known as
``consistency'', is that the MLE converges (almost surely) on the true
parameter values. The second, known as ``asymptotic efficiency'', is that they
are asymptotically unbiased and have the smallest standard errors compatible
with consistency. \cite{wasserman} The method is justified by these
reliability properties, not by the abstract appeal of maximization, and there
are situations where the necessary conditions fail and the MLE is unreliable
(though even then it can often be corrected to a reliable estimator
\cite{efron:1982}).
As noted by Shalizi \cite{shalizi:2007}, the maximum likelihood estimators for
the Pareto II distribution have been known for some time, with the possible
exception of the adjustments necessary for their use with censored data, since
the Pareto II model can hold only for values that exceed some threshold
$\xmin$. For completeness, we now briefly summarize the derivation of the MLEs
for the uncensored $q$-exponential distribution. For simplicity, in our use of
maximum likelihood in this paper we assume that the data are independent and
identically distributed, i.e., all observations are drawn statistically
independent draws from a common, albeit unknown,
distribution.\footnote{Dependent and heterogeneous data can be accommodated
within the maximum likelihood framework, provided one models the dependence
and heterogeneity. Wiuf \etal
\cite{Wiuf-et-al-likelihood-for-growing-networks} is a recent example of
doing so for a model of network growth.}
For a $q$-exponential distribution with parameters $\theta$,$\sigma$, the log-likelihood function is
\begin{align}
\ell = -n \log \sigma + n \log \theta - (\theta+1) \sum_{i=1}^{n} \log 1 + x_{i}/\sigma \enspace .
\label{eq:pareto:logL}
\end{align}
Calculating the first derivatives of $\ell$ with respect to $\theta$ and $\sigma$, setting them equal to zero, and solving for the appropriate parameter yields MLEs for each parameter. Thus,
\begin{align}
\frac{\partial \ell}{\partial \theta} & = \frac{n}{\theta} - \sum_{i=1}^{n} \log(1+x_{i}/\sigma) \nonumber \\
\hat{\theta} & = n \left[ \sum_{i=1}^{n} \log(1+x_{i}/\sigma)\right]^{-1} \enspace ,
\label{eq:pareto:theta:hat}
\end{align}
and
\begin{align}
\frac{\partial \ell}{\partial \sigma} & = -\frac{n}{\sigma} + \frac{\theta+1}{\sigma^{2}} \sum_{i=1}^{n} \frac{x_{i}}{1+x_{i}/\sigma} \nonumber \\
\hat{\sigma} & = \frac{\theta+1}{n}\sum_{i=1}^{n}\frac{x_{i}}{1+x_{i}/\hat{\sigma}}
\label{eq:pareto:sigma:hat}
\end{align}
Equations \eqref{eq:pareto:theta:hat} and \eqref{eq:pareto:sigma:hat} thus give
implicit estimators for $\theta$ and $\sigma$, and can be simultaneously solved
numerically with real data.
In many applications, including the data sets under consideration here, we are
interested only in observations that exceed some threshold value $\xmin$. This
can either be because smaller values are simply not available, due to known
aspects of the observational procedure (censoring), or because the data are
actually produced by multiple mechanisms, and we expect the $q$-exponential to
hold only above this threshold (tail-fitting), in a region we may conveniently
call the ``$q$-tail''. (Note that this regime is distinct from the asymptotic
one, $x \gg \sigma$, where the distribution approaches a pure power law.) In
either case, the estimators must be functions of the threshold $\xmin$.
Starting from the appropriate likelihood function for such thresholded data, a
similar calculation to the one given above \cite{shalizi:2007}, yields
\begin{align}
\hat{\theta}_{C} & = n\left[\sum_{i=1}^{n} \log \frac{1+x_{i}/\sigma}{1+\xmin/\sigma} \right]^{-1} \\
\hat{\sigma}_{C} & = -\theta \frac{\xmin}{1+\xmin/\hat{\sigma}_{C}} + \frac{\theta+1}{n} \sum_{i=1}^{n}\frac{x_{i}}{1+\xmin/\hat{\sigma}_{C}} \enspace .
\label{eq:pareto:censored}
\end{align}
As above, these equations can be solved numerically to obtain estimates from
real data. These equations assume that the value of $\xmin$ is known. This is
appropriate for censoring, but not (except very rarely) for tail-fitting. In
particular, in none of our empirical data sets do we have prior or independent
knowledge of the right $\xmin$, so it must also be estimated. Because $\xmin$
controls the number of observations in the likelihood calculation, we cannot
estimate $\xmin$ by maximum likelihood. Clauset \etal \cite{clauset:etal:2007}
proposed a different method for estimating $\xmin$, and demonstrated its
accuracy for fitting power-law distributions. As we will use this method to
estimate the tail threshold, we now briefly explain the technique.
The key to the idea is the simple one of minimizing the distance between the
fitted model distribution and the empirical data. To do so, one must chose a
measure of distance {\em between probability distributions}, rather than using
something like the Euclidean distance between data points. While there are
many such distances, this technique employs the Kolmogorov-Smirnov distance for
univariate distributions, which is simply the maximum difference between the
CDFs \cite{wasserman}:
\begin{equation}
D_{KS} = \max_{x \geq \xmin}{|P(x) - \hat{F}(x)|}
\end{equation}
where $P(x)$ is the CDF for the $q$-exponential fitted to points $x \geq
\xmin$, and $\hat{F}(x)$ is the empirical CDF of those data points. The
estimate of $\xmin$ is then just the value $\widehat{\xmin}$ at which
$D_{KS}$ is minimized.
[[It would be nice to have a simulation example showing that this is reliable
for picking the threshold for a Pareto-II tail, as in the power laws paper. ---
CRS]]
\subsection{Estimation for Discrete Data}
Maximum likelihood estimation for the discrete case is essentially similar to
the continuous case, only complicated by the fact that the likelihood involves
a special function, rather than having an analytical closed form. Our
presentation is correspondingly brief.
We begin with the unthresholded-case. Here the probability mass function is
proportional to a $q$-exponential function,
\begin{equation}
p(x) = \frac{{(1+x/\sigma)}^{-(\theta+1)}}{C(\theta,\sigma)} ~,
\end{equation}
and so to find the MLEs $\hat{\theta}, \hat{\sigma}$ we must compute the
normalizing factor $C(\theta,\sigma)$, i.e., find the partition function. This
is clearly just the infinite sum
\begin{eqnarray*}
C(\theta,\sigma) &= &\sum_{x=0}^{\infty}{{(1+x/\sigma)}^{-(\theta+1)}}\\
& = & \sigma^{\theta+1} \sum_{x=0}^{\infty}{(\sigma+x)}^{-(\theta+1)}\\
& = & \sigma^{\theta+1} \zeta(\theta+1,\sigma)
\end{eqnarray*}
where $\zeta(\theta,\sigma)$ is the Hurwitz zeta function [[cite --- it's not
in Abramowitz and Stegun! CRS]]. The log likelihood is therefore
\begin{equation}
\ell = - n(\theta+1) \log{\sigma} - n\log{\zeta(\theta+1,\sigma)} - (\theta+1)\sum_{i=1}^{n}{\log{1+x_i/\sigma}}
\label{eq:discrete:qexp:loglike}
\end{equation}
Taking derivatives yields equations analogous to Eq.\
\ref{eq:pareto:theta:hat}--\ref{eq:pareto:sigma:hat}. Rather than attempting
to solve these directly, however, which would involve taking derivatives of
the Hurwitz zeta function, it is simpler to maximize the log likelihood from
Eq.\ \ref{eq:discrete:qexp:loglike} numerically.
In the case of thresholding, the likelihood takes the same form, except that the
the second argument of the zeta function changes from $\sigma$ to
$\sigma+\xmin$. The threshold $\xmin$ itself can be estimated just as for
continuous data.
%\begin{table*}
%\setlength{\tabcolsep}{6pt}
%\begin{tabular}{l|cc|dc|dc|dc|dc|l}
%& power law & $q$-exp. & \multicolumn{2}{c|}{\text{log-normal}} & \multicolumn{2}{c|}{\text{exponential}} & \multicolumn{2}{c|}{\text{stretched exp.}} & \multicolumn{2}{c|}{\text{power law + cut-off}} & support for \\
%data set & $p$ & $p$ & \text{LR} & $p$ & \text{LR} & $p$ & \text{LR} & $p$ & \text{LR} & $p$ & power law \\
%\hline
%birds & {\bf 0.55} & {\bf 0.99} & -0.850 & 0.40 & 1.87 & {\bf 0.06} & -0.882 & 0.38 & -1.24 & 0.12 & moderate \\
%blackouts & {\bf 0.62} & {\bf 0.97} & -0.412 & 0.68 & 1.21 & 0.23 & -0.417 & 0.68 & -0.382 & 0.38 & moderate \\
%book sales & {\bf 0.66} & {\bf 0.88} & -0.267 & 0.79 & 2.70 & {\bf 0.01} & 3.885 & {\bf 0.00} & -0.140 & 0.60 & moderate \\
%cities & {\bf 0.76} & {\bf 0.94} & -0.090 & 0.93 & 3.65 & {\bf 0.00} & 0.204 & 0.84 & -0.123 & 0.62 & moderate \\
%fires & 0.05 & {\bf 0.56} & -1.78 & {\bf 0.08} & 4.00 & {\bf 0.00} & -1.82 & {\bf 0.07} & -5.02 & {\bf 0.00} & with cut-off \\
%surnames & {\bf 0.20} & {\bf 0.66} & -0.836 & 0.40 & 2.89 & {\bf 0.00} & -0.844 & 0.40 & -1.36 & {\bf 0.10} & with cut-off \\
%\end{tabular}
%\caption{Results of comparisons.}
%\label{table:1}
%\end{table*}
\begin{table*}
\setlength{\tabcolsep}{4.5pt}
\begin{tabular}{l|cccc|ccccc|lc|c}
& \multicolumn{4}{c|}{\text{power law}} & \multicolumn{5}{c|}{\text{$q$-exponential}} & \multicolumn{2}{c|}{\text{comparison}} & support for \\
data set & $\alpha$ & $\xmin$ & $\ntail$ & $p$ & $\theta$ & $\sigma$ & $\xmin$ & $\ntail$ & $p$ & {\rm LR} & $p$ & $q${\rm -exponential} \\
& & ($\times10^{3}$) & & & & ($\times10^{3}$) & ($\times10^{3}$) & & & & & \\
\hline
bird species sightings & 2.1(2) & 6.68 & 66 & {\bf 0.55} & 1.42 & 3.47 & 0.937 & 209 & {\bf 0.99} & & & \\
blackout sizes & 2.3(3) & 230.00 & 59 & {\bf 0.62} & 1.82 & 235.92 & 75.00 & 115 & {\bf 0.97} & & & \\
sales of books & 3.7(3) & 2400.00 & 139 & {\bf 0.66} & 4.52 & 2\,550.70 & 1005.38 & 612 & {\bf 0.88} & & & \\
population of cities & 2.37(8) & 52.46 & 580 & {\bf 0.76} & 1.53 & 14.81 & 13.91 & 2085 & {\bf 0.94} & & & \\
forest fire size (acres) & 2.2(3) & 6.32 & 521 & 0.05 & 1.38 & 2.45 & 2.20 & 1230 & {\bf 0.56} & & & \\
freq.\ of surnames & 2.5(2) & 111.92 & 239 & {\bf 0.20} & 1.94 & 65.07 & 79.59 & 343 & {\bf 0.66} & & &
\end{tabular}
\caption{Comparison of power-law and $q$-exponential fits for six of the data
sets analyzed in \cite{clauset:etal:2007}; for brevity we omit uncertainty
estimates from the bootstrap procedure. Recall that $\xmin$ is the threshold
value above which the distribution assumes a parametric form, and $\ntail$ is
the number of samples in this upper tail.}
\label{table:results}
\end{table*}
% birds sigma = 3472.8798 theta = 1.4176 xmin = 937 D = 0.0224 ntail = 209
% blackouts sigma = 235919.8207 theta = 1.8192 xmin = 75000 D = 0.0339 ntail = 115
% books sigma = 2550703.2016 theta = 4.5175 xmin = 1005381 D = 0.0183 ntail = 612
% cities sigma = 14809.1896 theta = 1.5339 xmin = 13908 D = 0.0090 ntail = 2085
% fires sigma = 2445.9405 theta = 1.3841 xmin = 2200 D = 0.0161 ntail = 1230
% surnames sigma = 65072.5488 theta = 1.9383 xmin = 79587.2 D = 0.0283 ntail = 343
% --- Test the plausibility of the fit
\section{Testing the $q$-exponential's plausibility}
\label{sec:test}
Our application of $q$-exponentials to describing empirical distributions has
three phases. First, we estimate the $q$-exponential's parameters $\kappa$,
$q$ and $\xmin$ as described in Section \ref{sec:fitting}. Second, given these
estimates, we may calculate a standard $p$-value for the statistical
significance of the fit using a semiparametric bootstrap approach --- one which
is parametric for the tail, but nonparametric for the rest of the distribution.
Third, assuming the $q$-exponential is not ruled out as ill-fitting, we turn to
model selection tests which directly compare it to other plausible
distributions. The latter two phases follow the lines suggested in
\cite{clauset:etal:2007}, to which we refer the reader for some details of the
general theory.
We begin with the goodness-of-fit test. Suppose that our observed data set has
$n$ observations in total, of which $\ntail$ are points in the
$q$-tail, where $x\geq \xmin$. We generate a new data set with $n$
observations as follows. With probability $\ntail/n$ we generate a
random number $x_i$ drawn from a $q$-exponential with parameters
$\hat{\kappa}$,$\hat{q}$ and $x\geq \xmin$. Otherwise, with probability
$1-\ntail/n$, we select one element uniformly at random from among the
elements of the observed data set that have $x<\xmin$ and set $x_i$ equal to
that element. Repeating the process for all $i=1\ldots n$ we generate a
complete synthetic data set that follows a perfect $q$-exponential above
$\xmin$, matching our model's parametric hypothesis for the $q$-tail, but has
the same (non-$q$-exponential) distribution as the observed data below the
$q$-tail, where our model is agnostic.
Thus, it is straightforward to test the hypothesis that an observed data set is
drawn from a $q$-exponential distribution. The steps are as follows:
\begin{enumerate}
\item Determine the best fit of the $q$-exponential to the data, according to
the methods described in Section \ref{sec:fitting}.
\item Calculate the KS statistic for the goodness-of-fit of the best-fit
$q$-exponential to the data.
\item Generate a large number of synthetic data sets using the procedure above,
fit each according to the methods described in Section \ref{sec:fitting}, and
calculate the KS statistic for each fit.
\item Calculate the $p$-value as the fraction of the KS statistics for the
synthetic data sets whose value exceeds the KS statistic for the real data.
\item If the $p$-value is small then the $q$-exponential distribution
can be ruled out with high confidence.
\end{enumerate}
Empirically, we find that generating about $2500$ synthetic data sets produces
an uncertainty in the true $p$-value as a level of about $\pm 0.01$ (see
\cite{clauset:etal:2007}).
It is important to be clear on the logic of this procedure \cite{mayo:cox}. No
limited model can be expected to perfectly predict actual measurements, nor is
it clear that this is even desirable. It has long been recognized, however,
that one important advantage of stochastic models is that they predict not just
central tendencies, but also the distribution of deviations from or
fluctuations around those tendencies \cite{haavelmo,guttorp}. One can
therefore use the stochastic model itself to assess whether the observed
deviations are, themselves, within the range predicted by the model. This is
what the goodness-of-fit test achieves. Seen from this angle, it is perhaps
clearer that while it can, with some confidence, {\em rule out} a model which
fails to fit, a high goodness-of-fit $p$-value at best shows that a model has
avoided {\em one} kind of error, not that it is correct. The latter inference
would be justified only if we knew that no {\em other} model could produce data
which got such a high $p$-value. This asymmetry is fundamental to statistical
inference, and confusion on this point is at the heart of many of the problems
which arise in fitting statistical models \cite{mayo:1996}.
Because general goodness-of-fit tests like the Kolmogorov-Smirnov test are
sensitive to all kinds of departures from the hypothesis of a $q$-exponential
tail, they are not especially sensitive to departures in any {\em particular}
direction. As a third step, we therefore directly compare the $q$-exponential
to particular alternative models. There are a number of model selection
criteria and tests which could be employed \cite{handcock:jones:2004}; here, as
in \cite{clauset:etal:2007}, we use Vuong's normalized likelihood ratio test \cite{vuong},
which we now briefly explain.
Consider two fixed distributions, represented by their probability densities
$p$ and $q$, neither of which correctly represents the distribution $g$ from
which the data were actually drawn. The log-likelihood ratio at a point $x_i$
is simply $L_i = \log{p(x_i)/q(x_i)}$. Assuming independent samples,
then, $R_n = \frac{1}{n}\sum_{i=1}^{n}{L_i}$ is an average of independent,
identically-distributed random variables. As $n$ grows, by the central
limit theorem $\sqrt{n} R_n$ approaches a Gaussian distribution. The mean of this distribution is $\sqrt{n}\langle L\rangle$, where
\begin{eqnarray}
\langle L \rangle & = & \int{\left(\log{\frac{p(x)}{q(x)}}\right) g(x) dx}\\
& = & \int{\left(\log{\frac{p(x)}{g(x)}} - \log{\frac{q(x)}{g(x)}}\right) g(x) dx}\\
& = & D(g\|q) - D(g\|p)
\end{eqnarray}
$D(g\|p)$ and $D(g\|q)$ being, respectively, the relative entropies of $p$ and
$q$ from $g$. Thus, unless $p$ and $q$ are equally close,
information-theoretically, to $g$, the mean of $R_n$ will go off to either
positive or negative infinity. The limiting variance of $\sqrt{n} R_n$ is
\begin{equation}
\omega^2 = \int{{\left(\log{\frac{p(x)}{q(x)}}\right)}^2 g(x) dx} - {\langle L \rangle}^2
\end{equation}
Thus, a simple test of whether $p$ or $q$ is closer to $g$ is provided by
first calculating $R_n$, and then the test statistic
\begin{equation}
Z_n = \sqrt{n} \frac{R_n}{\sqrt{\frac{1}{n-1}{\sum_{i=1}^{n}{{\left(\log{\frac{p(x_i)}{q(x_i)}} - R_n \right)}^2}}}}
\end{equation}
If $D(g\|p) = D(g\|q)$, the distribution of $Z_n$ will approach a standard
Gaussian, with mean 0 and variance 1, for large $n$. Notice that no knowledge
of the true, data-generating distribution $g$ is required to calculate $Z_n$.
The above analysis is for {\em fixed} alternative distributions $p$ and $q$.
The cases of interest to us, however, are the ones where we have two families
of distributions, say $\mathcal{P}$ and $\mathcal{Q}$, but do not know which
distribution, from each family, best matches $g$. What Vuong showed is that
the above analysis remains valid if, for $p$ and $q$, we substitute the maximum
likelihood estimates from $\mathcal{P}$ and $\mathcal{Q}$, provided that (1)
the maximum likelihood estimates are converging on, respectively, the
distributions $p^* \in \mathcal{P}$ and $q^* \in \mathcal{Q}$ which best
approximate $g$, and (2) $p^* \neq q^*$. The latter condition may fail when
either $\mathcal{P}$ or $\mathcal{Q}$ is strictly nested within the other
family.\footnote{There are some additional technical regularity conditions
required for Vuong's test to be valid. It is straightforward, but tedious,
to verify that these conditions hold for the models of interest here, and we
omit the algebra in the interest of space.}
Applying Vuong's test requires only a knowledge of the likelihood functions of
the two models in question, and the ability to calculate the CDF of the
standard Gaussian. Once again, the logic is that one would have to be very
unlucky to get a small $p$-value when the two models are, in fact, equally
good. If one model is better, the probability that the Vuong test will select
it goes to 1 as $n$ grows. A large $p$-value may be due to bad luck, to an
insufficient quantity of data, or to both models being equally far from the
truth.
[[CRS to also look up encompassing tests, and see if we cannot use the fact
that both Pareto I and q-exponentials are sub-types of the general Pareto II]]
% --- Empirical results
\section{Empirical results}
Here we reanalyze six data sets from different domains, all previously studied
in \cite{clauset:etal:2007}. Four of these --- the population sizes of
U.S. cities in 2000; the number of customers affected by North American power
blackouts; the sale quantities of best-selling U.S. books; and the number of
sightings of North American birds by species --- provided moderate support for
pure power laws. Another two --- the areas burned by forest fires, and the
frequencies of surnames --- provided moderate support for power laws with
exponential cut-offs.\footnote{There were two other data sets in
\cite{clauset:etal:2007} which had moderate support for pure power laws,
namely the intensities of wars and the number of adherents of religions;
these were eliminated on visual inspection as fitting neither power laws nor
$q$-exponentials. [[CRS: What? Why eliminate on visual inspection if they
passed the tests?]] Another four data sets --- magnitudes of solar flares;
magnitudes of earthquakes; frequencies of Web hits; Web degree distribution
---- supported power laws with exponential cut-offs, but were eliminated as
either irregular (the former two), or clearly better fit by cut-off power
laws than by $q$-exponentials.
\textbf{[[CRS: I am not at all sure I agree with these exclusion criteria. It seems
to me important that (1) we show that our methods can tell when things are
{\em not} $q$-exponentials, or even close to $q$-exponentials, and that (2)
we present a fair picture of how common $q$-exponentials actually are.
Saying ``six out of six distributions tested were $q$-exponentials'' is not
actually very impressive if we rigged the test beforehand!]]}}
Using the methods described above, we fitted and tested the $q$-exponential
distribution with these data, comparing the results to the previously found
power-law models. Table \ref{table:results} summarizes the two kinds of models
(giving the estimated parameters, $\ntail$, and the calculated
$p$-value), and the results of our model comparison. Figure \ref{fig:fits}a-f
show the corresponding fits for both power-law and $q$-exponential
distributions. In all cases, the $q$-exponentials both fit a larger portion of
the data and yielded a larger $p$-value. In all cases, the deviation from a
$q$-exponential is well within that expected from ordinary sampling
fluctuations (i.e., $p>0.1$).
% ---
\begin{figure*}[t]
\begin{center}
\includegraphics[scale=0.55]{qexp_fits_plus_shift}
\end{center}
\caption{(color online) The cumulative distribution functions Pr$(X\geq x)$
with their maximum likelihood $q$-exponential fits (solid lines), and for
comparison the power-law fits from \cite{clauset:etal:2007} (dashed lines,
offset slightly for legibility), for our six empirical data sets. Details of
these fits are given in Table \ref{table:results}. (a) The frequency of
sightings of bird species in the United States [[dates]]. (b) The number of
customers affected by electrical blackouts in the United States [[dates]].
(c) Sales volume of bestselling books in the United States [[dates]]. (d)
Population of cities in the United States in 2000. (e) Number of acres
burned in California forest fires [[dates]]. (f) Frequency of surnames in
the United States [[date]].}
\label{fig:fits}
\end{figure*}
% ---
[[Add some of the continuous data sets where the q-exponential is not so
hot back in; estimate]]
[[Add discrete data sets; estimate]]
[[Perform and discuss Vuong tests]]
[[Add tests vs. log-normal and power law with cut-off ?]]
[[Comment on test results]]
One noteworthy aspect of our findings concerns $\xmin$ and its interpretation.
As remarked above, when fitting exclusively to the upper part of the
distribution, $\xmin$ represents a characteristic scale separating two distinct
regimes, with distinct data-generating mechanisms. For the distribution of
city sizes, this has a possible theoretical meaning, which has not, to the best
of our knowledge, been previously recognized. The data contains {\em all} US
population aggregations (as of the year 2000), but there is a distinction to be
drawn anthropologically between rural settlements which lack a diverse division
of labor supporting bilateral exchange and cities proper, in which a complex
division of labor supports complex forms of exchange
\cite{white:tambayong:kejzar}. Cities are archaeologically distinguishable
from other settlements, and their minimum size archaeologically begins at
4--5,000 people \cite{white:tambayong:kejzar}. If contemporary American cities
have an $\xmin$ of 14,000, this suggests that this is a significant threshold
at which urbanity begins, perhaps only three times greater than the thresholds
of urbanity 5,500 years ago.
\section{Conclusions}
%Clauset, Shalizi, and Newman \cite{clauset:etal:2007} provided an all-important review of what is needed to fit theoretical models to empirical distributions, illustrated with a focus on fitting power laws in relation to other plausible univariate distributions. They showed the fundamental importance of likelihood methods, from maximal likelihood (MLE) estimation of parameters to the use of Kolmogorov-Smirnov (KS) tests using synthetic data to generate the scatter of theoretical distributions consistent with estimated parameters for a given model. They provide a package of R programs that include estimating MLE parameters for each of four pure distributions (power law, exponential, stretched exponential, and log-normal) and one nested (power-law plus exponential cut-offs). The also provide a package of Matlab programs that generates the synthetic distributions for KS tests for each of these distributions, given particular parameters, and for estimating the likelihood ratio and its variance for comparisons of goodness-of-fit for pairs of competing models. Their fits and comparisons of fit to the power law for 24 data sets shows that log-normal distributions are strong competitors in five cases (all for continuous variables: fires, HTTP, earthquakes, web hits, web links), exponentials or stretched exponentials are strong competitors in all but one of the 24, and power-law plus cut-offs for ten of the 24. These findings give extreme caution against inappropriate fitting of power laws by $r^{2}$ fit, as is most commonly done.
%Our findings complement and reinforce those of Clauset et al., adding the q-exponentials to the set of models that should be compared as an alternative to power laws. We do not yet have a way of finding MLE solutions for parameter estimates of the $q$-exponentials for discrete distributions (DRW - can the one-parameter MLE be modified to suit this need?). Our contribution for continuous distributions is to show that four of the six continuous distributions classified by Clauset et al. as having moderate support for power-law fit have a much higher fit to the q-exponential. Of the two remaining data sets numbers by religion have a $p=0.42$ to power law with a slightly better fit for a stretched exponential (and a visual cut-off to a fat-tail), while intensity of wars oddly wobbles around a power law with p=0.20 KS fit over three orders of magnitude. Over all six, the $q$-exponential strongly outperforms power law fit. Further work, however, would be required for likelihood ratio tests of $q$-exponential fit versus that of the stretched exponential, exponential, or lognormal.
We have seen that for some heavy-tailed data sets, the $q$-exponential provides
an accurate description of the tail of the data, one which is superior to that
of a pure power law, or type-I Pareto model. This provides reason to hope that
this family will also describe other heavy-tailed or highly skewed data sets,
and should at least be considered along with the Pareto I, the log-normal, the
stretched exponential, etc. Such fits, and comparisons among these
alternatives, can be done by using the objective and reliable methods we have
described. We do not pretend that those are the {\em only} reliable methods
for these problems, though they are fairly straightforward and definitely
superior to existing practice with $q$-exponentials. The performance of these
mehtods on small samples can be improved computationally via parametric
bootstrap estimates of bias \cite{wasserman,shalizi:2007}, and perhaps
analytically, via expansion methods \cite{field:ronchetti}.
At a deeper level, it has long been recognized that what is important in
fitting stochastic models is not merely getting numerical values for the
parameters, but rather what those parameters can tell us about the mechanisms
which produced the phenomena in question \cite{haavelmo,ijiri:simon}. Accurate
estimation of distributions becomes more than a mere computational exercise
only when its results inform such genuinely scientific inferences --- but the
validity of those inferences rests on the accuracy of the preliminary steps.
We have tried to indicate how such inferences may develop in our discussion
above of the meaning of the $q$-tail threshold for cities, and what it may tell
us about the evolution of the urban system \cite{white:tambayong:kejzar}.
We have explained how to fit $q$-exponential distributions to the tails of
empirical data sets, and how to validly assess the goodness of these fits and
compare them to those obtained from other models. This lays the foundation for
the scientific investigation of when models predicting $q$-exponential
distributions are accurate representations of the actual data-generating
mechanisms in the natural and social worlds around us.
% ---------- ACKNOWLEDGMENTS ----------
{\acknowledgments This work was supported in part by the Santa Fe Institute
(AC) and by the Social Dynamics and Complexity group of the Institute for
Mathematical Behavioral Sciences (DRW, LT). }
% ---------- BIBLIOGRAPHY ----------
\begin{thebibliography}{99}
\bibitem{berk} R. A. Berk. {\em Regression Analysis: A Constructive Critique.}
Sage Publications (2004).
\bibitem{clauset:etal:2007}
A. Clauset, C. R. Shalizi and M. E. J. Newman. Power-law distributions in empirical data. E-print (2007) {\tt arxiv:0706.1062}.
\bibitem{efron:1982}
B. Efron. Maximum likelihood and decision theory. {\em Annals of Statistics}
{\bf 10}, 340--356 (1982).
\bibitem{field:ronchetti}
C. A. Field and E. Ronchetti. {\em Small Sample Asymptotics.} Institute
of Mathematical Statistics (1990).
\bibitem{guttorp}
P. Guttorp. {\em Stochastic Modeling of Scientific Data.} Chapman and Hall (1995).
\bibitem{handcock:jones:2004} M. S. Handcock and J. H. Jones. Likelihood-based
inference for stochastic models of sexual network formation. {\em
Theoretical Population Biology} {\bf 65}, 413--422 (2004).
\bibitem{harris:1968}
C. M. Harris. The Pareto distribution as a queue service discipline. {\em Operations Research} {\bf 16}, 307--313 (1968).
\bibitem{haavelmo}
T. Haavelmo. The Probability Approach in Econometrics. {\em Econometrica} {\bf 12} (supplement), iii--115 (1944).
\bibitem{ijiri:simon}
Y. Ijiri and H. A. Simon. {\em Skew Distributions and the Sizes of Business
Firms.} North-Holland (1977).
\bibitem{kass:vos:1997}
R. E. Kass and P. W. Vos. {\em Geometrical Foundations of Asymptotic
Inference.} John Wiley (1997).
\bibitem{kulhavy:1996}
R. Kulhav{\'y}. {\em Recursive Nonlinear Estimation: A Geometric Approach.}
Springer-Verlag (1996).
\bibitem{mayo:1996}
D. G. Mayo. {\em Error and the Growth of Experimental Knowledge.} University
of Chicago Press (1996).
\bibitem{mayo:cox}
D. G. Mayo and D. R. Cox. Frequentist statistics as a theory of inductive
inference. In J. Rojo (ed.), {\em Optimality: The Second E. L. Lehmann Symposium}, 77--97. Institute of Mathematical Statistics (2006). E-print {\tt arxiv:math.ST/0610846}.
\bibitem{mcguire:etal:1952}
B. A. McGuire, E. S. Pearson and A. H. A. Wynn. The time intervals between industrial accidents. {\em Biometrika} {\bf 39}, 168 (1952).
\bibitem{meehl:1967}
P. Meehl. Theory testing in psychology and physics: A methodological paradox. {\em Philosophy of Science} {\bf 34}(2),103--115 (1967). Reprinted in {\em The Significance Test Controversy}. Ed., Denton E. Morrison and Ramon E. Henkel. Chicago: Aldine (2006).
\bibitem{shalizi:2007}
C. R. Shalizi. Maximum likelihood estimation for $q$-exponential distributions. E-print (2007) {\tt arxiv:math.ST/0701854}.
\bibitem{silcock:1954}
H. Silcock. The phenomenon of labour turnover. {\em Journal of the Royal Statistical Society A} {\bf 117}, 429--440 (1954).
%\bibitem{thurner:etal:2007}
%S. Thurner, F. Kyriakopoulos and C. Tsallis. Unified model for network dynamics exhibiting nonextensive statistics. {\em Phys. Rev. E} {\bf 76}, 036111 (2007).
\bibitem{tsallis:1988}
C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics. {\em J. Stat. Phys.} {\bf 52}, 479--487 (1988).
\bibitem{tsallis:2004}
C. Tsallis. Nonextensive statistical mechanics: Construction and physical interpretation. In M. Gell-Mann and C. Tsallis, eds., {\em Nonextensive Entropy: Interdisciplinary Applications}. Oxford University Press, 1--53 (2004).
\bibitem{vuong}
Q. H. Vuong. Likelihood ratio tests for model selection and non-nested
hypotheses. {\em Econometrica} {\bf 57}, 307--333 (1989).
\bibitem{wasserman}
L. Wasserman. {\em All of Statistics: A Concise Course in Statistical Inference".} Springer-Verlag (2003).
\bibitem{white:tambayong:kejzar}
D. R. White, L. Tambayong and N. Kej\u{z}ar. Oscillatory dynamics of
city-size distributions in world historical systems. In G. Modelski,
T. Devezas and W. Thompson, eds., {\em Globalization as Evolutionary Process:
Modeling, Simulating, and Forecasting Global Change}, 190--225. Routledge
(2008).
\bibitem{Wiuf-et-al-likelihood-for-growing-networks} C. Wiuf, M. Brameier,
O. Hagberg and M. Stumpf, A likelihood approach to analysis of network
data. {\em Proc. Natl. Acad. Sci.} (USA) {\bf 103}, 7566--7570 (2006).
\end{thebibliography}
\end{document}
\item Use of the Kolmogorov-Smirnov (KS) test~\cite{chakravarti:etal:1967} to estimate goodness-of-fit for a model with MLE parameters (using synthetic data) and estimate the lower bounds xmin for power-law and q-exponential behavior, or xmin lower-bounds on fit for other distributions that give the minimum KS value D that measures distance between distributions~\cite{clauset:etal:2007}. (Unlike the null significance test, KS probability values close to 1 are good fits, and the null hypothesis of no difference is rejected if $p$ is small; e.g., $p<0.05$ means the distributions differ.) In these cases, the use of the KS test to measure the distance between distributions is applied to the empirical data compared to synthetic data. Unlike the commonly used $r^{2}$ test the theoretical values do not form a continuous curve but variable ÒjiggledÓ lines that only approximate in their general form the theoretical curve of a model. Graphs of the synthetic data for a given model and set parameters can be quite surprising to a novice used to fitting empirical data points to a continuous and singular theoretical curve. Because of the random generation of $x$ values for the synthetic data, no two generated ÒjiggledÓ lines are identical. They form a distribution of possible outcomes for a given theoretical distribution, a characteristic shared with ÒbootstrapÓ estimates of variance in outcomes from model probabilities. Thus, for example, Òeven if a data set is drawn from a perfect power-law distribution, the fit between that data set and the true distribution will on average be poorer than the fit to the best-fit [e.g., $r^{2}$] distribution, because of statistical fluctuationsÓ (p.11)
\item Likelihood ratio tests of competing fit as between different models. Clauset et al do so only for power laws compared to fit of other distributions (exponential, stretched exponential, lognormal, power law plus exponential cut-off)~\cite{clauset:etal:2007}.
\begin{align}
\label{eq:full:discrete}
p(X=x) = x^{-\alpha} / \zeta(\alpha) \enspace ,
\end{align}
where $\zeta$ is the Reimann zeta function and serves the role of a normalization factor; the continuous version, also called a Pareto distribution, is
\begin{align}
\label{eq:full:continuous}
p(x \leq X < x+dt) = (\alpha-1)\,x^{-\alpha} \enspace .
\end{align}
In real-world systems, the scaling behavior often only holds for some value $x\geq \xmin$, and we must use slightly modified forms of equations (\ref{eq:full:discrete}) and (\ref{eq:full:continuous}):
\begin{align}
\label{eq:general:ppwerlaw}
p(X=x) & = x^{-\alpha} / \zeta(\alpha,x_{min}) \enspace , \,{\rm and} \\
p(x \leq X < x+dt) & = \frac{\alpha-1}{\xmin}\,\left( \frac{x}{\xmin}\right)^{-\alpha} \enspace ,
\end{align}
respectively, where $\zeta(\alpha,\xmin)$ is the generalized, or incomplete, zeta function.
\subsubsection{Testing the MLE with simulated data}
The best way to test the accuracy of an MLE is to generate large samples of synthetic data drawn from the model with different parameter values, and see if the MLE generates these values and their expected variances. For the $q$-exponential, Clauset's QPVA.m Matlab code includes a way of generating synthetic data for the parameters from the Pareto II which, as formulated in Equation 2 of Shalizi~\cite{shalizi:2007}, has parameters that are algebraic equivalents of the $q$-exponential parameters.
\subsection{Estimation with the MLE}
The MLE is found by fitting the model parameter value(s) to maximize the likelihood function $\lambda(\pi)$ for the empirical data, a sum of logs weighted by these value(s). This maximum can also be found algebraically by setting to zero the derivative of the likelihood function with respect to the parameter(s).
\subsubsection{Small sample correction for the MLE}
For small samples, parameter estimates can be corrected by analysis of the tests of MLE accuracy for smaller samples (e.g., Fig. 6 for power laws in~\cite{clauset:etal:2007}; Fig. 1 in~\cite{shalizi:2007}, for $q$-exponentials). Error bounds estimates from asymptotic approximations should be avoided. Instead, parametric bootstrapping (Wasserman 2002:section 9.11) to obtain parameter estimates, standard errors, and confidence limits can be derived by generating many ÒbootstrapÓ samples of random numbers with the model density function $p_{\pi}(x)$.
\subsection{Testing KS fit for a model with MLE parameters and }
The likelihood framework allows tests of parameter values comparing the empirical data (using KS) to a same-sized sample of synthetic data using the model parameters estimates, returning $D$ and $p$ for each comparison. The synthetic data is generated by varying $x$ randomly (rather than in uniform intervals) and returning $y(x)$ for the function investigated. Because the intervals between rank-ordered values of $x$ are not uniform, each synthetic data set will vary, with $D$ and $p$ from the KS test varying for each comparison with the empirical data. The appropriate measure of fit is to generate point data from the model and its parameters (with the same number of points as the actual data), repeated a sufficient number of times to estimate a convergent variance around the mean of the parameter estimate, and then calculate p as to where it falls within the variance estimate, with best fit at $p=1$.
Power laws, like most distributions, have cutoffs xmin for the lower value of $x$ at which the distribution begins to lose fit to the data. The Kolmogorov-Smirnov (KS) test can be use to estimate lower bounds $\xmin$ for best fit~\cite{clauset:etal:2007}, again compared to synthetic data. This approach can be applied to other distributions as well.
\subsection{Comparing different models for the same empirical data by calculating the likelihood ratio}
The likelihood ratio (LR) addresses the question of which of two models is better fit to the data. Each model is has a likelihood determined by the KS test against synthetic data. Here the model with the higher ratio $R$ is the better fit, normally given as the logarithm $\mathcal{R}$ of the ratio, and with significant differences computed from the standard deviation of $\mathcal{R}$, as reviewed by Clauset et al~\cite{clauset:etal:2007}. By excluding nested-model comparisons (power law with exponential cut-off), non-nested LR comparisons can be normalized.
\subsection{The KS fit for $q$-exponential models with MLE parameters }
Using ShaliziÕs $q_{W}$ estimates, Clauset wrote a new Matlab program, QPVA.m, to complement the Matlab program package used in Clauset et al~\cite{clauset:etal:2007}, now allowing $q$-exponential (Pareto II) fitted models to be compared to Pareto models for a given distribution. What is completely novel about his procedure is that this procedure validated the discovery by White, Tambayong and Key?ar (2007), of an $\xmin$ for $q$-exponentials. QPVA($x$, $x_{\min'}$) takes data $x$, computes the $q$-exponential behavior with a lower cutoff $x_{\min'}$, and the corresponding $p$-value for the Kolmogorov-Smirnov (KS) test. It iterates values of $x_{\min'}$ to obtain the value $\xmin$ that gives the highest KS probability of fit. With $x(0)$ the lower limit of the observed data series, and $\min(x(0),\xmin) = \min(x)$, if $\min(x) \geq 1000$, QPVA uses the continuous approximation, which is a reliable in this regime. The fitting procedure works as follows: 1) For each possible choice of $x_{\min'}$, we estimate the $q$-exponential parameters via the method of maximum likelihood (ShaliziÕs MLE), and calculate the Kolmogorov-Smirnov goodness-of-fit statistic $D$. 2) We then select as our estimate of $x_{\min'}$, the value $\xmin$ that gives the minimum value $D$ over all values of $x_{\min'}$. (With the data we tested, $x(0) < \xmin$ in each data series, so that the optimal $\xmin$ represented an empirical finding pertinent to fit.)