||
OLS: 最小二乘法
from scipy import stats import pandas as pd import numpy as npfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.stats.multicomp import pairwise_tukeyhsdimport matplotlib.pyplot as plt
dat1
Variety | rep | y | |
---|---|---|---|
0 | A | b1 | 15.3 |
1 | B | b1 | 18.0 |
2 | C | b1 | 16.6 |
3 | D | b1 | 16.4 |
4 | E | b1 | 13.7 |
5 | F | b1 | 17.0 |
6 | A | b2 | 14.9 |
7 | B | b2 | 17.6 |
8 | C | b2 | 17.8 |
9 | D | b2 | 17.3 |
10 | E | b2 | 13.6 |
11 | F | b2 | 17.6 |
12 | A | b3 | 16.2 |
13 | B | b3 | 18.6 |
14 | C | b3 | 17.6 |
15 | D | b3 | 17.3 |
16 | E | b3 | 13.9 |
17 | F | b3 | 18.2 |
18 | A | b4 | 16.2 |
19 | B | b4 | 18.3 |
20 | C | b4 | 17.8 |
21 | D | b4 | 17.8 |
22 | E | b4 | 14.0 |
23 | F | b4 | 17.5 |
数据描述
有A,B, C,D,E五个品种,共有4个重复的产量数据。
Variety 品种
rep 重复
y 产量
dat1 = pd.read_csv("7.1.csv")
dat1.head()
Variety | rep | y | |
---|---|---|---|
0 | A | b1 | 15.3 |
1 | B | b1 | 18.0 |
2 | C | b1 | 16.6 |
3 | D | b1 | 16.4 |
4 | E | b1 | 13.7 |
这里, 将Variety作为考察因素,使用最小二乘法OLS
model = ols('y ~ C(Variety)',dat1).fit()
anovat = anova_lm(model)
print(anovat)
df sum_sq mean_sq F PR(>F) C(Variety) 5.0 52.378333 10.475667 40.334118 3.662157e-09 Residual 18.0 4.675000 0.259722 NaN NaN
结果可以看出,Variety之间的F检验达到极显著水平(P=3.66e-9)
二因素方差分析,即有两个处理因素的方差分析。下面数据有两个处理:地点loc和品种cul,观测值为y
dat2 = pd.read_csv("7.2.csv") dat2.head()
loc | cul | y | |
---|---|---|---|
0 | Ann | BH93 | 4.460 |
1 | Ari | BH93 | 4.417 |
2 | Aug | BH93 | 4.669 |
3 | Cas | BH93 | 4.732 |
4 | Del | BH93 | 4.390 |
模型为y~loc + cul
formula = 'y~ loc + cul' anova_results = anova_lm(ols(formula,dat2).fit()) print(anova_results)
df sum_sq mean_sq F PR(>F) loc 17.0 22.671174 1.333598 9.087496 2.327448e-15 cul 8.0 114.536224 14.317028 97.560054 1.611882e-52 Residual 136.0 19.958126 0.146751 NaN NaN
结果可以看出,地点loc和品种cul间均达到极显著水平
二因素有交互的方差分析,模型为: y~A*B,或者为y ~ A + B + A:B
dat3 = pd.read_csv("7.3.csv")
dat3.head()
A | B | y | |
---|---|---|---|
0 | A1 | b1 | 27 |
1 | A1 | b1 | 29 |
2 | A1 | b1 | 26 |
3 | A1 | b1 | 26 |
4 | A2 | b1 | 30 |
formula = 'y~C(A) + C(B) + C(A):C(B)' anova_results = anova_lm(ols(formula,dat3).fit()) print(anova_results)
df sum_sq mean_sq F PR(>F) C(A) 2.0 315.833333 157.916667 129.204545 2.247182e-19 C(B) 4.0 207.166667 51.791667 42.375000 1.032420e-14 C(A):C(B) 8.0 50.333333 6.291667 5.147727 1.375790e-04 Residual 45.0 55.000000 1.222222 NaN NaN
结果表明,因素A,B,A:B均达到极显著水平
Python学生物统计---Python基础---学习笔记3
关注公众号:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-22 12:15
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社