|
Pipeline():可以认为是个打包器,将多个函数(步骤)集成到一个函数中;
from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline pipe_lr = Pipeline([('scl', StandardScaler()), ('pca', PCA(n_components=2)), ('clf', LogisticRegression(random_state=1))]) pipe_lr.fit(X_train, y_train) print('Test Accuracy: % .3f' % pipe_lr.score(X_test, y_test))
Pipleline holdout交叉验证 k折交叉验证
交叉验证:
holdout交叉验证(holdout cross-validation):
三个数据集:训练数据集、验证数据集和测试数据集;
训练数据集合验证数据集用于参数调优(需求最佳值,超参);
测试数据集用于泛化误差评估;
缺点:模型性能评估对训练数据集和验证数据集的划分方法比较敏感;评价方法与样本也有关系;
k折交叉验证(f-fold cross-validation):
不重复地随机将训练数据集划分为k个,k-1个用于训练,剩余一个用测试;
重复此过程k次,得到k个模型和相应的模型评价;
对k个评价结果取均值,得到其平均性能;
Leave-one-out交叉验证法(LOO):k折交叉验证特例;
分层k折交叉验证:对标准k折交叉验证的改进;
#*********************************************************************************** #scikit-learn 实现k折交叉验证 from sklearn.cross_validation import cross_val_score scores = cross_val_score(estimator=pipe_lr, X=X_train, y=y_train, cv=10, n_jobs=1) print('CV accuracy scores: %s' % scores) print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores))) #*********************************************************************************** #pipeline实现k折交叉验证: from sklearn.cross_validation import StratifiedKFold kfold = StratifiedKFold(y=y_train, n_folds=10,random_state=1) scores=[] for k, (train, test) in enumerate(kfold): pipe_lr.fit(X_train[train], y_train[train]) score = pipe_lr.score(X_train[test], y_train[test]) scores.append(score) print('Fold: %s, Class ist.: %s, Acc: %.3f' % (k+1, np.bincount(y_train[train]), score)) print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))
#参考《Python 机器学习》,作者:Sebastian Raschaka, 机械工业出版社;
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-24 15:27
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社