|
Computing GC Content
Problem
The GC content of a DNA string is given bythe percentage of symbols in the string that are 'C' or 'G'. For example, theGC content of "AGCTATAG" is 37.5%. Note that the reverse complementof any DNA string has the same GC content.
DNA strings must be labeled when they areconsolidated into a database. A commonly used method of string labeling iscalled FASTA format. In this format, the string is introduced by a line thatbegins with '>', followed by some labeling information. Subsequent linescontain the string itself; the first line to begin with '>' indicates thelabel of the next string.
In Rosalind's implementation, a string inFASTA format will be labeled by the ID "Rosalind_xxxx", where"xxxx" denotes a four-digit code between 0000 and 9999.
Given: At most 10 DNA strings in FASTAformat (of length at most 1 kbp each).
Return: The ID of the string having thehighest GC content, followed by the GC content of that string. Rosalind allowsfor a default error of 0.001 in all decimal answers unless otherwise stated;please see the note on absolute error below.
Sample Dataset
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
Sample Output
Rosalind_0808
60.919540
针对以上案例我采用以下的代码解决:
#!/usr/bin/python
f=open("data.txt",'r')
seqID=[]
line=[]
for i in f:
i.strip()
ifi.startswith(">"):
seqID.append(i)
else:
line.append(i)
list=zip(seqID,line)
dic=dict((seqID,line)for seqID, line inlist)
a=dic.values()
i=0
c=0
value=[]
while i<len(a):
for b in a[i]:
if b=="G" or b=="C":
c+=1
d=float(c)
e=d/len(a[i])
value.append(e)
c=0#重新起始化c值
i+=1
h=dic.keys()
list1=zip(h,value)
dic2=dict((h,value) for h, value in list1)
dic3=sorted(dic2.items(),key=lambdadic2:dic2[1]) #对字典以值排序引用lambda匿名函数,冒号前为参 数,冒号后为返回的值,其中sorted函数有两个参 数dic2.items函数返回字典的键值对列表,key后 面是要比较的选项,这里默认reverse=False
print dic3[len(dic3)-1][0]
print dic3[len(dic3)-1][1]
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-27 19:20
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社