TickingClock的个人博客分享 http://blog.sciencenet.cn/u/TickingClock

博文

Rosalind 13 - Mendel\'s First Law

已有 4888 次阅读 2017-10-31 10:33 |个人分类:Python Learning|系统分类:科研笔记

Bioinformatics Stronghold - IPRB: Mendel's First Law


Introduction to Mendelian Inheritance


现代遗传法则是由Gregor Mendel首次在1865年提出。现代遗传模型,叫做融合遗传blending inheritance),就是生物个体存在亲本双方的遗传性状。这个模型显然在经验和统计上都不成立,经验上存在很多人要比双亲要高,统计上融合遗传的性状平均了双亲的性状,因此限制了变异。


Mendel基于在豌豆中的研究认为性状可能不是连续的,更有可能是不连续的building block在控制,这类block叫做factor。另外,Mendel认为每个factor会有不同的形式,叫做等位基因allele)。


Mendel提出每个factor都有一对等位基因,这也就是孟德尔第一定律Mendel's first law),也叫遗传分离定律。如果某个个体特定factor上的等位基因相同,那么这个factor就是纯合的homozygous);如果该factor上的等位基因不同,那么这个factor就是杂合的heterozygous)。孟德尔第一定律告诉我们,一个生物个体会随机将factor上的两个等位基因中的一个遗传给其子代,所以子代会从亲本双方中各遗传一个等位基因。


Mendel认为对于某个factor上一共有两个等位基因,一个是显性的dominant),一个是隐性的recessive)。生物个体只需要有一个显性等位基因就会表现出显性等位基因的表型,而需要纯合的隐性等位基因(即两个等位基因都是隐性的)才能表现出隐形等位基因的表型。


factor上显性等位基因用大写字母A表示,隐性等位基因用小写字母a来表示。因为生物可以带有隐性等位基因而不表现隐形等位基因的表型,因此生物的基因型genotype)定义为其遗传组成及表型phenotype)的综合表现。


个体从亲本双方遗传等位基因的可能性可以用一个Punnett square来表现。


A Punnett square representing the possible outcomes of crossing a heterozygous organism (Yy) with a homozygous recessive organism (yy); here, the dominant allele Y corresponds to yellow pea pods, and the recessive allele y corresponds to green pea pods.


Problem


Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process.


For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let X represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by Pr(X = red) = 3/5 and Pr(X = blue )= 2/5.


Random variables can be combined to yield new random variables. Returning to the ball example, let Y model the color of a second ball drawn from the bag (without replacing the first ball). The probability of Y being red depends on whether the first ball was red or blue. To represent all outcomes of X and Y, we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for X and Y, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree; see Figure 2 for an illustrative example.


An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let A be the event "Y is blue."  Pr(A) is equal to the sum of the probabilities of two different outcomes: Pr(X=blue and Y=blue) + Pr(X=red and Y=blue), or 3/10 + 1/10 = 2/5 (see Figure 2 above).


Given: Three positive integers k, m, and n, representing a population containing k + m + n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.


Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.


Sample Dataset


2 2 2


Sample Output


0.78333


Solution


最笨的方法,穷举所有可能性(9种):


第一次抽到k,第二次抽到k,Pr1 = k/(k+m+n) * (k-1)/(k+m+n-1)
第一次抽到k,第二次抽到m,Pr2 = k/(k+m+n) * m/(k+m+n-1)
第一次抽到k,第二次抽到n,Pr3 = k/(k+m+n) * n/(k+m+n-1)
第一次抽到m,第二次抽到k,Pr4 = m/(k+m+n) * k/(k+m+n-1)
第一次抽到m,第二次抽到m,Pr5 = m/(k+m+n) * (m-1)/(k+m+n-1)
第一次抽到m,第二次抽到n,Pr6 = m/(k+m+n) * n/(k+m+n-1)
第一次抽到n,第二次抽到k,Pr7 = n/(k+m+n) * k/(k+m+n-1)
第一次抽到n,第二次抽到m,Pr8 = n/(k+m+n) * m/(k+m+n-1)
第一次抽到n,第二次抽到n,Pr9 = n/(k+m+n) * (n-1)/(k+m+n-1)


其中k为AA,m为Aa,n为aa;所以这九种可能的组合导致的子代结果:


Pr1:AA
Pr2:1/2 AA + 1/2 Aa
Pr3:Aa
Pr4:1/2 AA + 1/2 Aa
Pr5:1/4 AA + 2/4 Aa + 1/4 aa
Pr6:1/2 Aa + 1/2 aa
Pr7:Aa
Pr8:1/2 Aa + 1/2 aa
Pr9:aa


所以子代表现为显性性状的可能性为:


Pr = Pr1 + Pr2 + Pr3 + Pr4 + 3/4 * Pr5 + 1/2 * Pr6 + Pr7 + 1/2 * Pr8


或者:


Pr = 1 - 1/4 * Pr5 - 1/2 * Pr6 - 1/2 * Pr8 - Pr9


代码:


>>> k, m, n = 2, 2, 2
>>> t = float(k + m + n)
>>> Pr = 1 - 0.25 * m/t * (m-1)/(t-1) - 0.5 * m/t * n/(t-1) - 0.5 * n/t * m/(t-1) - n/t * (n-1)/(t-1)
>>> print Pr
0.783333333333
>>>


Over


Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.


P.S. 欢迎关注微信公众号:微信号Plant_Frontiers




https://blog.sciencenet.cn/blog-3158122-1083146.html

上一篇:the plant journal:番茄转录组综合分析在线网站TomExpress
下一篇:Plant Physiology:昼夜节律调控植物的气孔孔径
收藏 IP: 221.181.145.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2025-1-8 21:14

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部