# 数据不满足方差齐性和正态分布时应该如何处理？

If a measurement variable does not fit a normal distribution or has greatly different standard deviations in different groups, you should try a data transformation.

There are an infinite number of transformations you could use, but it is better to use a transformation that other researchers commonly use in your field,such as the square-root transformation for count data or the log transformation for size data.

Log transformation. This consists of taking the log of each observation. You can use either base-10 logs (LOG in a spreadsheet, LOG10 in SAS) or base-e logs, also known as natural logs (LN in a spreadsheet, LOG in SAS). It makes no difference for a statistical test whether you use base-10 logs or natural logs, because they differ by a constant factor; the base-10 log of a number is just 2.303…× the natural log of the number. You should specify which log you're using when you write up the results, as it will affect things like the slope and intercept in a regression. I prefer base-10 logs, because it's possible to look at them and see the magnitude of the original number: log(1)=0, log(10)=1, log(100)=2, etc.

The back transformation is to raise 10 or e to the power of the number; if the mean of your base-10 log-transformed data is 1.43, the back transformed mean is 101.43=26.9 (in a spreadsheet, "=10^1.43"). If the mean of your base-e log-transformed data is 3.65, the back transformed mean is e3.65=38.5 (in a spreadsheet, "=EXP(3.65)". If you have zeros or negative numbers, you can't take the log; you should add a constant to each number to make them positive and non-zero. If you have count data, and some of the counts are zero, the convention is to add 0.5 to each number.

Square-root transformation. This consists of taking the square root of each observation. The back transformation is to square the number. If you have negative numbers, you can't take the square root; you should add a constant to each number to make them all positive.

People often use the square-root transformation when the variable is a count of something, such as bacterial colonies per petri dish, blood cells going through a capillary per minute, mutations per generation, etc.

http://blog.sciencenet.cn/blog-1817482-1119094.html

## 相关博文

GMT+8, 2020-8-12 03:35