|
模拟数据:
df = pd.DataFrame({'region':['北','北','北','南','南','北','北'], 'province':['山东','陕西','辽宁','浙江','浙江','山东','陕西'], 'gdp-p': [8.74,3.28,2.9,7.77,7.77,8.74,3.28], 'city': ['济南','西安','沈阳','杭州','宁波','青岛','咸阳'], 'gdp-c': [1.2,1.15,0.77,1.88,1.57,1.49,0.28]}) |
1.转换为列表
目的:生成北方5万亿级省份列表 p_north_5_club = df[df['5_club-p-north'] == 'Y']['province'].tolist() |
2.转换为字典
2-1 目的:生成城市和其对应gdp的字典 city_gdp = {} city_gdp.update(dict(zip(df['city'], df['gdp-c']))) |
2-2 目的:生成城市和其对应gdp的字典 city_gdp = df.set_index('city')['gdp-c'].to_dict() |
3.删除某些列
目的:删除region和5_club-p-north列 df.drop(columns=['region','5_club-p-north'], inplace=True) 说明 :Inplace:直接在原来df上修改 |
4.列名重命名
目的:将region重命名区域,city重命名市 columnName_new = {'region':'区域', 'city':'市'} df.rename(columns=columnName_new,inplace=True) |
5.处理某一列
目的:将gdp-p小数点后保留一位 def decimal_point(value): return round(value,1) df['gdp-p'] = df['gdp-p'].apply(decimal_point) |
6.增加新列
6-1 df.loc[:,'新列名'] = df.apply(lambda x:func(x),axis=1) 目的:新增一列trillionClub-c<城市万亿级俱乐部> def trillionClub(df): result = '' if df['gdp-c'] >= 1: result = 'Yes' else: result = 'No' return result df.loc[:,'trillionClub-c'] = df.apply(lambda x:trillionClub(x),axis=1) |
6-2 目的:新增一列5_club-p-north<北方省份5万亿俱乐部> def merge_dfColumns(x,y): result = '' if x == '北' and y >=5: result = 'Y' else: result = 'N' return result df['5_club-p-north']=list(map(lambda x,y: merge_dfColumns(x,y), df['region'], df['gdp-p'])) |
7.排序
.sort_values() 目的:按照region分组,各省市按照gdp降序排序 df = df.sort_values(by=['region','gdp-p','province','gdp-c'],ascending= [True,False,True,False]) 说明 :ascending: False降序;True升序 |
8.组内筛选并排序
目的:万亿级俱乐部城市,在省内排名<从1开始>;非万亿级俱乐部城市排名为0。列名为1_club-c_order def group_order_city(df,city_groupOrder): # groupOrder = '' if df['city'] in city_groupOrder.keys(): groupOrder = city_groupOrder[df['city']] else: groupOrder = '0' return groupOrder city_groupOrder = {} df_GroupOrder = df.groupby(['province','1_club-c']) for name,group in df_GroupOrder: if name[1] == 'Yes': group.index = group.reset_index(drop=True).index+1 # index从1开始 group.reset_index(inplace=True) # 将index为一列 city_groupOrder.update(dict(zip(group['city'], group['index']))) else: pass df.loc[:,'1_club-c_order'] = df.apply(lambda x:group_order_city(x,city_groupOrder),axis=1) |
9.将结果写入一个xlsx的不同sheet
writer = pd.ExcelWriter(‘test.xlsx’) df1.to_excel(writer, sheet_name='df1',index=False) df2.to_excel(writer, sheet_name='df2',index=False) writer.close() 说明 : index=False : 索引号不写入 |
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-27 05:24
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社