Smicroorganism的个人博客分享 http://blog.sciencenet.cn/u/Smicroorganism

博文

数据不满足方差齐性和正态分布时应该如何处理?

已有 4320 次阅读 2018-6-15 10:57 |个人分类:统计|系统分类:科研笔记

找到一个生物方面的数据转化处理指南,摘了一些笔记,原文网址如下:http://www.biostathandbook.com/transformation.html  

If a measurement variable does not fit a normal distribution or has greatly different standard deviations in different groups, you should try a data transformation.

There are an infinite number of transformations you could use, but it is better to use a transformation that other researchers commonly use in your field,such as the square-root transformation for count data or the log transformation for size data.

 Log transformation. This consists of taking the log of each observation. You can use either base-10 logs (LOG in a spreadsheet, LOG10 in SAS) or base-e logs, also known as natural logs (LN in a spreadsheet, LOG in SAS). It makes no difference for a statistical test whether you use base-10 logs or natural logs, because they differ by a constant factor; the base-10 log of a number is just 2.303…× the natural log of the number. You should specify which log you're using when you write up the results, as it will affect things like the slope and intercept in a regression. I prefer base-10 logs, because it's possible to look at them and see the magnitude of the original number: log(1)=0, log(10)=1, log(100)=2, etc.

The back transformation is to raise 10 or e to the power of the number; if the mean of your base-10 log-transformed data is 1.43, the back transformed mean is 101.43=26.9 (in a spreadsheet, "=10^1.43"). If the mean of your base-e log-transformed data is 3.65, the back transformed mean is e3.65=38.5 (in a spreadsheet, "=EXP(3.65)". If you have zeros or negative numbers, you can't take the log; you should add a constant to each number to make them positive and non-zero. If you have count data, and some of the counts are zero, the convention is to add 0.5 to each number.

Square-root transformation. This consists of taking the square root of each observation. The back transformation is to square the number. If you have negative numbers, you can't take the square root; you should add a constant to each number to make them all positive.

People often use the square-root transformation when the variable is a count of something, such as bacterial colonies per petri dish, blood cells going through a capillary per minute, mutations per generation, etc.

我们常用的有log转化及平方根转化,平方根转化在微生物基因丰度上有见过使用。感觉研究生课程应该增加一门研究方法学和统计学,教授们应该精心设置课程。课程应呈体系化,我所在的学院作为一个博士生的感受来讲,还不够。不过博士生应该提高自学和解决问题的能力,解决问题的方法是多元的。



http://blog.sciencenet.cn/blog-1817482-1119094.html

上一篇:书籍阅读:1947年黄昆给杨振宁的一封信
下一篇:《生物信息学》

0

该博文允许注册用户评论 请点击登录 评论 (4 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-8-12 03:35

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部