# 贷还是不贷：如何用Python和机器学习帮你决策？

• short_emp：一年以内短期雇佣
• emp_length_num：受雇年限
• home_ownership：居住状态（自有，按揭，租住）
• dti：贷款占收入比例
• purpose：贷款用途
• term：贷款周期
• last_delinq_none：贷款申请人是否有不良记录
• last_major_derog_none：贷款申请人是否有还款逾期90天以上记录
• revol_util：透支额度占信用比例
• total_rec_late_fee：逾期罚款总额
• safe_loans：贷款是否安全

“老张，吃了吗？”

……

pip install -U PIL

jupyter notebook

Jupyter Notebook已经正确运行。下面我们就可以正式编写代码了。

import pandas as pd

df.shape

(46508, 13)

X = df.drop('safe_loans', axis=1) y = df.safe_loans

X.shape

(46508, 12)

y.shape

(46508,)

from sklearn.preprocessing import LabelEncoder from collections import defaultdict d = defaultdict(LabelEncoder) X_trans = X.apply(lambda x: d[x.name].fit_transform(x)) X_trans.head()

from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X_trans, y, random_state=1)

X_train.shape

(34881, 12)

X_test.shape

(11627, 12)

from sklearn import tree clf = tree.DecisionTreeClassifier(max_depth=3) clf = clf.fit(X_train, y_train)

with open("safe-loans.dot", 'w') as f:     f = tree.export_graphviz(clf,                              out_file=f,                              max_depth = 3,                              impurity = True,                              feature_names = list(X_train),                              class_names = ['not safe', 'safe'],                              rounded = True,                              filled= True ) from subprocess import check_call check_call(['dot','-Tpng','safe-loans.dot','-o','safe-loans.png']) from IPython.display import Image as PImage from PIL import Image, ImageDraw, ImageFont img = Image.open("safe-loans.png") draw = ImageDraw.Draw(img) img.save('output.png') PImage("output.png")

test_rec = X_test.iloc[1,:] clf.predict([test_rec])

array([1])

y_test.iloc[1]

1

from sklearn.metrics import accuracy_score accuracy_score(y_test, clf.predict(X_test))

0.61615205986066912

http://blog.sciencenet.cn/blog-377709-1063178.html

## 全部热门博文导读

GMT+8, 2018-5-27 03:17