TickingClock的个人博客分享 http://blog.sciencenet.cn/u/TickingClock

博文

Rosalind 15 - Finding a Motif in DNA

已有 4923 次阅读 2017-11-10 08:56 |个人分类:Python Learning|系统分类:科研笔记

Bioinformatics Stronghold - SUBS: Finding a Motif in DNA


Combing Through the Haystack


鉴定两个物种间共同的DNA片段是十分有意义的,这个共同的DNA片段可能在两个个体中的功能类似。


基序motif)定义为共同存在的DNA片段。分子生物学的一个共同的目标就是检索一个个体基因组以寻找已知的基序。


而基因组中充满了DNA片段的重复序列repeat),这就使得鉴定目标基序变得困难。这些重复序列远比随机情况下要多得多,这表明基因组并不是随机的。


人类基因组中最为常见的就是Alu重复序列Alu repeat),每个拷贝大概300bp长,在每个人类基因组中大约重复一百万次。然而,还未有证据显示Alu重复序列对人类有正向的作用,更像是寄生性质的,当一个新的Alu重复序列插入到基因组中,其会经常导致遗传异常。

The human chromosomes stained with a probe for Alu elements, shown in green.



Problem


Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).


The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].


A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2 : 5] = "UGCU".


The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s(see the Sample below).


Given: Two DNA strings s and t (each of length at most 1 kbp).


Return: All locations of t as a substring of s.


Sample Dataset


GATATATGCATATACTT
ATAT


Sample Output


2 4 10


Solution


>>> s = 'GATATATGCATATACTT'
>>> t = ‘ATAT'
>>> for i in range(len(s) - len(t)):
...        a = i
...        b = len(t)+a
...        if s[a:b] == t:
...           print a+1
...
2
4
10
>>>


Over


Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.


P.S. 欢迎关注微信公众号:微信号Plant_Frontiers




https://blog.sciencenet.cn/blog-3158122-1084606.html

上一篇:Plant Biotechnol J:WRKY15和WRKY33互作提高欧洲油菜抗病性
下一篇:Nature Biotechnology:基于信息理论纠错算法的高精度荧光发...
收藏 IP: 221.181.145.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-26 18:25

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部