|||
read_csv(filepath, usecols)
e.g., subnames_no_MPR = pd.read_csv(filepath, delimiter=',', usecols = ['Subject ID', 'Age']);
usecols: Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2]
or ['foo','bar', 'baz']
. Element order is ignored, so usecols=[0, 1]
is the same as [1, 0]
.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
DataFrame.drop_duplicates(subset, keep)
e.g., subs_MPR = subnames_MPR.drop_duplicates(subset = ['Subject ID', 'Age'], keep = 'first');
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
keep : {‘first’, ‘last’, False}, default ‘first’
first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.
http://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.drop_duplicates.html
https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-30 09:44
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社