博文

Rosalind 15 - Finding a Motif in DNA

已有 4988 次阅读 2017-11-10 08:56 |个人分类:Python Learning|系统分类:科研笔记

Bioinformatics Stronghold - SUBS: Finding a Motif in DNA

Combing Through the Haystack

鉴定两个物种间共同的DNA片段是十分有意义的，这个共同的DNA片段可能在两个个体中的功能类似。

基序（motif）定义为共同存在的DNA片段。分子生物学的一个共同的目标就是检索一个个体基因组以寻找已知的基序。

而基因组中充满了DNA片段的重复序列（repeat），这就使得鉴定目标基序变得困难。这些重复序列远比随机情况下要多得多，这表明基因组并不是随机的。

人类基因组中最为常见的就是Alu重复序列（Alu repeat），每个拷贝大概300bp长，在每个人类基因组中大约重复一百万次。然而，还未有证据显示Alu重复序列对人类有正向的作用，更像是寄生性质的，当一个新的Alu重复序列插入到基因组中，其会经常导致遗传异常。

The human chromosomes stained with a probe for Alu elements, shown in green.

Problem

Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).

The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].

A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2 : 5] = "UGCU".

The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s(see the Sample below).

Given: Two DNA strings s and t (each of length at most 1 kbp).

Return: All locations of t as a substring of s.

Sample Dataset

GATATATGCATATACTT

ATAT

Sample Output

2 4 10

Solution

>>> s = 'GATATATGCATATACTT'

>>> t = ‘ATAT'

>>> for i in range(len(s) - len(t)):

... a = i

... b = len(t)+a

... if s[a:b] == t:

... print a+1

...

>>>

Over

Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.

（P.S. 欢迎关注微信公众号：微信号Plant_Frontiers）

转载本文请联系原作者获取授权，同时请注明本文来自郝兆东科学网博客。
链接地址：https://blog.sciencenet.cn/blog-3158122-1084606.html

上一篇：Plant Biotechnol J：WRKY15和WRKY33互作提高欧洲油菜抗病性
下一篇：Nature Biotechnology：基于信息理论纠错算法的高精度荧光发...

收藏 IP: 221.181.145.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

郝兆东

扫一扫，分享此博文

TickingClock的个人博客分享 http://blog.sciencenet.cn/u/TickingClock

博文

Rosalind 15 - Finding a Motif in DNA

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

郝兆东

全部作者的其他最新博文

全部精选博文导读

相关博文

TickingClock的个人博客分享 http://blog.sciencenet.cn/u/TickingClock

博文

Rosalind 15 - Finding a Motif in DNA

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

郝兆东

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)