# Actual Math.

Yay! I get to do some real math. I totally feel like a real engineer and a real scientist this morning. I have a huge amount of work to do over the next few days on my little research policy paper. It was, as I wrote, well reviewed. But one of the reviewers complained that my simulated data was normally distributed, and that I should instead use lognormal data. But that mucks up my statistics.

Normal data is great. Most stuff is normal in the real world, and it’s easy to hypothesis test it. Two samples of data. Are the means the same, or not? Easy peasy. Lognormal data isn’t so easy, because the same tests don’t work. For testing normal data, you can use either the Z test or Student’s t test. But for lognormal data, I’m not aware of any standard test, and certainly nothing that’s already built for me.

So I did what any lost engineer would do. I went to Prof. Google (after asking on twitter to crickets). And I found this:

Which is exactly what I need[1]. So now, I get to redo all my data, implement the above in Excel, and rerun all my tests. Should be glorious. As long as my result holds under the new distribution. We shall see.

But this is how science is done. Assertion, rebuttal, research, revision, response. And it’s exciting to do it.

____________________________________

[1] Abdollahnezhad K, Babanezhad M, Jafari AA, Inference on Difference of Means of two Log-Normal Distributions A Generalized Approach, Journal of Statistical and Econometric Methods, vol.1, no.2, 2012, 125-131

but data _are_!

In all seriousness, yes, after initial grumbling, in almost all cases, responding to reviewers gets me a better paper in the end.

SAS is out of favor, but it’s very easy in glimmix to analyze data where the resids have all sorts of distributions. You just specify what that distribution is.

I was wondering whether you applied a transformation to your data. The log transformation can is a fairly strong one for skewed data. Of course, t-test on the logs compares geometric means, not the (usual) arithmetic means.

Once transformed, you can then assess whether the transformed data is normal by using histograms, QQ-plots and tests for normality. The t-test is particularly sensitive to deviations from normality in form of skewness and therefore a test for normality that is directed towards skew alternatives would be preferable. Pearson’s sample skewness n−1∑ni=1(xi−x¯)3(n−1∑ni=1(xi−x¯)2)3/2 is a suitable test statistic in this case.

I have used the Box-Cox procedure for choosing an appropriate transformation, rather than just going to a logarithmic one. Ecological data are notoriously non-normal and other procedures like the bootstrap analogue of the t-test are often used. It does not require the assumption of normality and is a test about the untransformed means (and not about anything else).

Formula came out wonky, but you can find the Pearson’s skewness test in most stat packages or program it.

I really should do a K-S test, and leave it at that. But I’ve actually gone through and decided to represent the normal data with a justification. We’ll see what the reviewers say.