||
(一)路径写法
filepath="C:\\Users\\python_study\\test.csv" 或 'C:\\Users\\python_study\\test.csv' (单双引号均可)
或 filepath=r"C:\Users\Desktop\python_study\test.csv"
或 filepath= "C:/Users/Desktop/python_study/test.csv"
(二)pandas函数
(1)pd.read_csv('data.csv',encoding = "utf-8",header = 0,names = range(0,50),index_col=0,keep_default_na=False)
header = 0 是默认情况(即不标明,默认就是header = 0),表示以数据的第一行为列索引;header=None即指明原始文件数据没有列索引,这样read_csv为自动加上列索引,除非你给定列索引的名字。
encoding = "utf-8" 表明以utf-8为编码规则。
names = range(0,50)) 表示以[0....49]为列索引的名字
index_col=0 表示以原有数据的第一列(索引为0)当作行索引。
keep_default_na=False 表示空值将变为null;
如果没有keep_default_na=False,加载后空值处就是NAN,且类似coupon_id(如:11002)等处的类型都是float,再者判断是否是NAN的话是:off_train.date!=off_train.date,结果是True即为NAN,否则是非空值!
如果使用了keep_default_na=False,会使coupon_id等字段的数据类型转化为object可以简单看作是字符串,空值变为null,这时候判断是否是空值便可用off_train.date=='null'!
更多参考:https://www.jianshu.com/p/9c12fb248ccc
示例代码:
import pandas as pd
import numpy as np
from datetime import date
import datetime as dt
import os
#源数据路径
DataPath = r'D:\Desktop\XGBoost\Data\data_origin'
#预处理后数据存放路径
FeaturePath = r'D:\Desktop\XGBoost\Data\data_preprocessed'
off_train = pd.read_csv(os.path.join(DataPath,'ccf_offline_stage1_train.csv'),header=0,keep_default_na=False)
off_train.columns=['user_id','merchant_id','coupon_id','discount_rate','distance','date_received','date']
off_test = pd.read_csv(os.path.join(DataPath,'ccf_offline_stage1_test_revised.csv'),header=0,keep_default_na=False)
off_test.columns = ['user_id','merchant_id','coupon_id','discount_rate','distance','date_received']
print(off_train.info())
print(off_train.head(5)) #输出DataFrame的前五行,看读入数据是否与原数据一致,作为核查,很重要!
输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1754884 entries, 0 to 1754883
Data columns (total 7 columns):
user_id int64
merchant_id int64
coupon_id object
discount_rate object
distance object
date_received object
date object
dtypes: int64(2), object(5)
memory usage: 93.7+ MB
None
user_id merchant_id coupon_id discount_rate distance date_received \
0 1439408 2632 null null 0 null
1 1439408 4663 11002 150:20 1 20160528
2 1439408 2632 8591 20:1 0 20160217
3 1439408 2632 1078 20:1 0 20160319
4 1439408 2632 8591 20:1 0 20160613
date
0 20160217
1 null
2 null
3 null
4 null
点滴分享,福泽你我!Add oil!
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-26 16:57
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社