Evaluating results
From InterSciWiki
Hi Laurent,
Right, higher p-values indicate that the deviations from the q-exp model are within what you'd expect for statistical fluctuations. A low p-value indicates systematic deviations or correlation effects that cannot be explained by mere statistical fluctuations. Visual closeness is normally a *less* reliable way than a p-value to decide if deviations can be explained by fluctuations, or, if you like, to decide if a fit is actually a good one.
The best way to test whether the program I wrote is accurate is to feed it data for which you know the answer already. If it's correct, then when you feed it q-exponentially distributed data, the p-values should be uniformly distributed over the unit interval, but with an average p-value at 0.5. Here's a snippet of code that does precisely this calculation:
rand('state',sum(100*clock));
kof = zeros(2500,1);
N = 1000;
xmin = 1;
tic;
for ik=1:length(kof)
sigma = 1+5*rand(1);
theta = 1+2*rand(1);
x = (xmin+sigma).*(1-rand(N,1)).^(-1/theta) - sigma;
[p,gof] = qpva(x,xmin,'fixed','silent');
kof(ik)=p;
fprintf('[%i]\tp = %6.4f [%4.2f (%4.2f)]\tgof =
%6.4f\t[%4.2fm]\n',ik,p,mean(kof(1:ik)),mean(kof(1:ik))./sqrt(ik),gof,toc/60);
end;
And, here's the resulting histogram of the p-values; the average is p=0.49 \pm 0.1, which is in strong agreement with the conclusion that qpva is doing the calculation correctly.
[edit] Estimation - Nataša Kejžar
From natasa.kejzar@fdv.uni-lj.si Tue Jan 8 01:43:52 2008 Date: Tue, 8 Jan 2008 10:36:41 +0100 (CET) From: natasa.kejzar@fdv.uni-lj.si To: Doug White <drwhite@uci.edu> Subject: Re: http://intersci.ss.uci.edu/wiki/index.php/Estimating_Tsallis_q
[ The following text is in the "iso-8859-2" character set. ] [ Your display is set for the "ISO-8859-1" character set. ] [ Some special characters may be displayed incorrectly. ]
Dear Doug.
> You also used a Newton-Raphson algorithm for estimating q, correct? Could you say a word or two about its advantages, and any flaws you see in the wiki page above? http://intersci.ss.uci.edu/wiki/index.php/Estimating_Tsallis_q
No, I did not use Newton-Raphson algorithm. I used the "nls" standard routine from stat.program R with the default algorithm for fitting nonlinear least squares, which was Gauss-Newton's algorithm. I didn't program the estimation myself.
The advantage is certainly, that I didn't have to program. The used method is the one, that is used mostly for estimation nonlinear least squares fittings to the data. Lately, I've been listening to lectures about different regression analysis in medicine recently, and the lecturer says, that they usually use the usual linear regression (if possible). They don't complicate with nonlinear ones. I have to say that I haven't yet look deeper into that subject.
If at all possible, the MLE estimations are far the best and better than nonlinear least squares estimations, because they are more general and have these nice asymptotic results (estimates asymptotically normally distributed with known variance, unbiased)...
> could you check this site for me? I am trying to compare the MLE method of Cosma Shalizi using Pareto II with the one we used from Tsallis and Borges.
Tsallis and Borges estimation: they try to find the best suitable q, for which the ordinary linear regression fit of the logarithmic-q equation is best.
MLE is far superior, because of the above nice properties and of the fact, that values of q and kappa cannot fall out of allowed range. Second - T-B estimation overemphasizes the very rare events (i.e. fitting of probabilities that are very small) and therefore gives poorer fits than nls approach also.
About the site (MY OPINIONS ONLY):
Section: q expontential and q logarithm
- q logarithm is not described (only in the next section)
Section: Estimating q with maximal likelihood estimation (MLE)
- you start talking about Pareto II (that has 2 PARAMETERS), but in the next line, you use only 1 parameter. (Equation (6), which you substitute into (3)). As far as I know, this can not be correct. You would first have to explain the GENERAL SOLUTION (including also parameter kappa and then sigma (if looking ad the Shalizi's paper)) and only then equate theta to sigma, to get Eq.(7).
In that case, Eq.(8) would then be computed a bit differently - you would have to keep in mind, that theta = sigma (so sigma is "known" - you should substitute it in the equation) and only then, you do a derivative with respect to theta (the "unknown").
Eq.(9) should come at the beginning of this section. (before explaining the general solution; also section Comparison to the Shalizi MLE by Pareto II should go there) In Eq.(9) you also have the kappa, that was not mentioned before yet. This kappa is (when doing estimation of q-logarithm) the y(0). I'm not sure about the slope equation. (I don't know how to derive it...)
Section: Cumulative probability distribution (CDF):
I don't find the connection to this in the sections above - the notation is very different. CDF is also written in Shalizi's paper - maybe you should take that one and write about the range of q.
Best, Natasa
