||
Bioinformatics Stronghold - SUBS: Finding a Motif in DNA
Combing Through the Haystack
鉴定两个物种间共同的DNA片段是十分有意义的,这个共同的DNA片段可能在两个个体中的功能类似。
基序(motif)定义为共同存在的DNA片段。分子生物学的一个共同的目标就是检索一个个体基因组以寻找已知的基序。
而基因组中充满了DNA片段的重复序列(repeat),这就使得鉴定目标基序变得困难。这些重复序列远比随机情况下要多得多,这表明基因组并不是随机的。
人类基因组中最为常见的就是Alu重复序列(Alu repeat),每个拷贝大概300bp长,在每个人类基因组中大约重复一百万次。然而,还未有证据显示Alu重复序列对人类有正向的作用,更像是寄生性质的,当一个新的Alu重复序列插入到基因组中,其会经常导致遗传异常。
The human chromosomes stained with a probe for Alu elements, shown in green.
Problem
Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).
The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].
A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2 : 5] = "UGCU".
The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s(see the Sample below).
Given: Two DNA strings s and t (each of length at most 1 kbp).
Return: All locations of t as a substring of s.
Sample Dataset
Sample Output
2 4 10
Solution
Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-26 18:25
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社