Multiple working hypotheses, models, and statistics
From InterSciWiki
Back to Weekly riff
Models are not of much use unless you have plausible alternatives and multiple working hypotheses, and then you are able to test how well each fits your data and which of these alternatives, in pairs, have the better relative fit. To do so you need to rely on the laws of probability, study those, see how they apply to your modeling problems, and seek out the methods that have come along with computationally intensive data analysis.
These are the inferential and not just the descriptive statistical methods. If you just have a pet idea or one you got from looking at descriptive statistics, its not very likely that you have the right model, nor will you be learning from your statistical analysis.
Significance tests are almost useless. First, they are intended only to test the null hypothesis, and not whether the statistic or model you have chosen actually fits the data. Second, they assume that the cases exist in separate worlds where they do not interact with one another, but only with locally independent variables. The idea that “independence of cases” is matter of sampling independence (where your sampling of individual cases does not affect the probability of others being chosen) is preposterous, and wrong.
In observational studies at any scale, cases are usually not independent but influence one another through networks of various sorts. If these influences produce clusters of cases that become more similar than they would be without the influences between them, or from common sources of influence, then there is greater de facto variance than with independence of cases. For significance tests:
- Significance is exaggerated (lower), and you are more likely to get spurious results.
- When you compare results from different tests of the same relationships, you are more likely to underestimate the likelihood that the outcomes are similar, i.e., to spuriously reject replication of results.
If these influences produce clusters of cases that become LESS similar, points 1 and 2 will be reversed:
- you are more likely to spuriously reject a single test
- and spuriously accept replication across tests.
In either cases, to the extent that such influences exist, you need to reduce or downgrade your effective sample size in order to interpret a significant test properly.
Estimates of these effects can be done through network autocorrelation tests, regressing one variable on another (or multiple others) but measuring the correlations between the error terms for pairs of cases that are network related. These are standard statistical tests. Alternately, the network relations can be incorporated as a part of the regression model with terms than transform independent variables into variables of network-mediated influences.
