|||
Miller化了许多年研究用浮点代数理解计算机程序行为的计算技巧。1987年,他彻底改变了研究方向,选择生物信息学作为他的新领域。他说:
“我需要彻底改变的理由是非常简单的,它给我的生活带来新的更大的挑战和兴奋点。生物信息学之所以吸引我,是因为对于这个领域我几乎一无所知,也因为在那时没有任何人知道。”
这个改变的催化剂是他和Gene Myers的友情,而Myers已经在这个新领域开始工作。它那时还不叫“Bioinformatics”,Miller 转向这个领域还没有这个名字,他喜爱在一些对人类可能做贡献的领域前沿工作。
“这个改变对我来说是困难的,是因为我对生物学和统计学一无所知。这花费了很多年,我才真正理解生物学。我现在是一个生物系的职员,所以某种意义下,我成功地进行了转型。(不幸地是,我对统计学基本上还是很无知)在另一方面,这一转变也很容易,因为在这个领域所知还非常少。我读了Mike Waterman和David Sankoff的几篇文章,而后就此而止,开始了研究。”
Miller进入这个领域有两个已有的技巧被证明是非常有用的,这包括了他的数学技巧和编程经验,另有一系列的想法帮助他关注他的研究。他对这个领域贡献的第一个想法是优化了两个序列联配算法能够在正比于长些序列的空间内进行计算。在给定这么大空间内计算优化联配的打分不是很复杂,但是怎样对给定的打分构造一个联配就不是那么明显了。一个非常聪明的联配算法,被Don Hirschberg在1975年左右所发现。Miller贡献的另外一个想法是如果两个序列非常相似,当联配可以非常容易打分时,优化的联配可以不用动态规划而更快地计算出来,只需要一个贪婪算法。这个想法被独立由Gene Myers(在Miller的鼓励下)和Esko Ukkonen在八十年代发现出来。Miller希望这两个想法和他们的变种能够把他带入新的领域,他从"生物问题中寻找解答的方式”转向了“有解答后寻找生物问题”的研究形式。事实上,这是一种非常通用的在数理科学中培养出的科学家在生物信息中的研究范式。
在他早期的几十年的生物信息研究中,Miller是对线性空间的联配问题的一些论文的共同作者之一。找到一个称心的贪婪算法花费了很长的时间。可是,对于比较相似的DNA序列,这些序列是源于测序错误的不同而不是进化突变引起的,这个方法非常有效。因此,他们得到了生物信息学界的广泛认可。
Miller在生物信息项目中大部分的成功涉及到了除此之外的另一些想法,他把它们引入了这个领域。他更广为人知的项目是合作开发了BLAST程序,这个程序是David Lipman的眼光把该项目推向了正确的方向。但是,它是由Miller的对长DNA序列比较方法的工作使之更具有生物学的内涵,它使得Miller的算法用来给成队的科学家分析哺乳动物和其它全基因组序列的工具。Miller在1989年选择了作为他的“圣杯”的主题,并从此之后他就坚持这个方向。开始时,世界上只有两个足够勇敢或说是“愚蠢”的人来公开提倡测序小鼠的基因组序列与人类基因组进行比较--Miller和他长期合作的生物学家Ross Hardison。他们不时兜售在几个附近的物种中测序的想法。现在,所有的人都同意小鼠、大鼠、黑猩猩、狗和其它物种的测序工作是不可避免的,可是可能是Miller在比较基因组方面多年的编程工作使得这种不可避免的事件更早的发生了。
使Miller作出最好的工作,是他预料通过生物信息学的发展可以推动新的生物学的发现,那就是通过发展方法来比较整个哺乳动物基因组序列能够得到对进化和基因调控有更好地了解,这就可以使得他想做他想到的任何想法,并使之成为现实。这个工作包括发展算法来更容易地比对他能发现的最长的序列,还帮助Ross Hardson从实验上验证这些序列对基因调控是有价值的。当Miller和Hardison决定出示联配和生物实验的数据怎样联系到一个数据库的时候,他们学习了数据库的知识。当他们想开始设置一个网络服务器来比对DNA序列时候,他们学习了网络服务器的知识。当在他的实验室里没有人能够写他们组所需要的软件的时候,Miller自己编写了它们。当发明和分析一个序列的算法看起来很重要的时候,他开始了这方面的工作。这些方法已时过境迁,但是生物学的研究动机一直没有变化。
Miller更加成功地是追求“一个生物问题而不是其它的途径”。他的同事David Haussler,具有类似的经验,他的令人瞩目的成就就是把隐马模型和其它机器学习方法应用于生物信息学,而他里程碑式的成功是人类基因组的浏览系统,它使得更大的范围内的科学家得到了帮助,后一工作使前一工作黯然失色。
“我的职业生涯最令人兴奋的时刻是现在,每年都有一个脊椎动物的基因序列能够在我的研究中出现。某一天,我希望能骄傲地回首我在生物信息学中的贡献,可是可能它还没有被发现。”
原文:
Webb Miller (born 1943 in Washington State) is
professor in the Departments of Biology and of Computer Science and Engineering
at Pennsylvania State University. He holds a PhD in mathematics from the
University ofWashington. He is a
pioneer and a leader in the area of DNA and protein sequence comparison, and in
comparing whole genomes in particular.
For a number of years Miller worked on
computational techniques for understanding the behavior of computer programs
that use floating-point arithmetic. In 1987 he completely changed his research focus, after
picking bioinformatics as his new field. He says:
My reason for wanting a
complete change was simply to bring more adventure and excitement into my life.
Bioinformatics was attractive because I had no idea what the field was all
about, and because neither did anyone else at that time.
The catalyst was his
friendship with Gene Myers, who was already working in the new area. It
wasn‘t even called ‚“bioinformatics" then; Miller was switching to a
field without a name. He loved the frontier spirit of the emerging discipline
and the possibility of doing something useful for mankind.
The change was difficult
for me because I was completely ignorant of biology and statistics. It took a
number of years before I really started to understand biology. I'm now on the
faculty of a biology department, so in some sense I successfullymade the
transition. (Unfortunately, I'm still basically ignorant of statistics.) In
another respect, the change was easy because there was so little already known
about the field. I read a few papers by Mike Waterman and David Sankoff, and
was off and running.
Miller came to the new field armed with two skills that proved very
useful, and with a couple of ideas that helped focus his research initially.
The skills were his mathematical training and his experience writing computer
programs. The first idea that he brought to the field was that an optimal
alignment between two sequences can be computed in space proportional to the
length of the longer sequence. It is straightforward to compute the score of an
optimal alignment in that amount of space, but it is much less obvious how to
produce an alignment with that score. A very clever linearspace alignment
algorithm had been discovered by Dan Hirschberg around 1975. The other ideawas
thatwhen two sequences are very similar andwhen alignments are scored rather
simply, an optimal alignment can be computed much more quickly than by dynamic
programming, using a greedy algorithm.
That idea was discovered independently by Gene Myers (with some prodding
from Miller) and Esko Ukkonen in the mid-1980s. Miller hoped that these two
ideas, or variants of them, would get him started in the new field; he had "solutions in search of biological problems"‚rather than ‚"biological
problems in search of solutions". Indeed, this is a common mode of entry into
bioinformatics for scientists trained in a quantitative field.
During his first decade in
bioinformatics, Miller coauthored a few papers about linear-space alignment
methods. Finding a niche for greedy algorithms took longer, but for comparing
very similar DNA sequences, particularly when the difference between them is
due to sequencing errors rather than evolutionarymutations, they are quite
useful; they deservewider recognition in the bioinformatics community than they
now have.
The
most successful of Miller's bioinformatics projects have involved ideas other
than the ones he brought with him to the field. Hismost widely known project
was the collaboration to develop the BLAST program, where it was David
Lipman's insights that drove the project in the right direction. However, it
is Miller's work on comparison methods for long DNAs equences that brought him
closer to biology and made Miller's algorithms a household name among teams
of scientists analyzing mammalian and other wholegenome sequences. Miller
picked this theme as his Holy Grail around 1989, and he has stuck with it ever
since. When he started, there were only two people in theworld brave or
foolish enough to publicly advocate sequencing the mouse genome and comparing
it with the human genome: Miller and his long-term collaborator, the biologist
Ross Hardison. They occasionally went so far as to tout the sequencing of
several additional mammals.
Nowadays, it looks to everyone like the genome sequencing of mouse, rat,
chimpanzee, dog, and so on, was inevitable, but perhapsMiller’smany years of
working on programs to compare genome sequences made the inevitable happen
sooner.
What
worked best for Miller was to envision an advance in bioinformatics that would
foster new biological discoveries – namely, that development of methods to
compare complete mammalian genome sequences would lead to a better
understanding of evolution and of gene regulation – and to do everything he
could think of to make it happen. This included developing algorithms that
would easily align the longest sequences he could find, and helping Ross
Hardison to verify experimentally that these alignments are useful for studying
gene regulation. When Miller and Hardison decided to show how alignments and
data from biological experiments could be linked through a database, they
learned about databases. When they wanted to set up a network server to align
DNA sequences, they learned about network servers. When nobody in his lab was
available to write software that they needed, Miller wrote it himself. When
inventing and analyzing a new algorithm seemed important, he worked on it. The
methods changed but the biological motivation remained constant.
Miller has been more
successful pursuing ‚"a biological problem in search of solutions than the
other way around". His colleague, David Haussler, has had somewhat the same
experience; his considerable achievements bringing hidden Markov models and
other machine learning techniques to bioinformatics have recently been eclipsed
by his monumental success with the Human Genome Browser, which has directly
helped a far wider community of scientists.
The most
exciting point so far in my career is today, with a new vertebrate genome
sequence coming my way every year. Some day, I hope to look back with pride at
my best achievement in bioinformatics, but perhaps it hasn’t happened yet.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-18 03:16
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社