||
With the help of high throughput NGS technologies, It is very convenient to clone genes using strategies similar to map cloning.
This is a example in Arabidopsis, which is very simple without much statistics analysis.
(I will introduce some complex analysis recently.)
Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing
Josh T. Cuperusa,b, Taiowa A. Montgomerya,b, Noah Fahlgrena,b, Russell T. Burkeb, Tiffany Townsendb, Christopher M. Sullivanb,c, and James C. Carringtonb
http://www.pnas.org/content/107/1/466.full?sid=8962a78f-c95b-4fbd-8ca6-b71e82071008
MASS.
The MASS package contains scripts to run CASHX, SOAP and
MAQ, and is available for download (http://jcclab.science.oregonstate.edu/MASS).
In addition to the MASS mapping and
alignment tools, the MASS package contains the entire pipeline
used to identify the mir390a-1 mutation. It includes programs for
creating plots of SNP enrichment, alignment with MAQ, and
filtering of SNPs. MASS is designed to take any indicated read
length and to create an appropriate database of sequences
centered on a SNP nucleotide, forcing each read to align across
the SNP site. The MASS pipeline filters the SNP data set (cns.
snp) from the MAQ output. Using Illumina quality scores, data
are filtered based on the following criteria: consensus base is a
true base; a phred-like quality score of 43; a minimum read
depth of 5; a maximum read depth of 50; and no second-best
base call. The phred-like quality score is based on Illumina
quality scores. In part, these filtering values are based on 12×
coverage; quality scores and read depth may be adjusted based
on coverage, read length, and quality of reads.
Analysis approach
Reads from the bulk segregant population were mapped to
the Arabidopsis (Col-0, TAIR8) genome using Cache Assisted
Hash Search using XOR logic (CASHX) (6), resulting in ∼12×
average coverage for perfect-match reads (1.6 GB). Using
143,508 available SNPs (6), a database of 71-bp sequences centered
on each SNP (Col-0 vs. Ler) was created. When 71mers
overlapped, they were joined into one larger database entry. Illumina
1G reads were aligned to entries in the database using
CASHX. Reads that hit Col-0 or Ler SNPs were summed in
100,000-bp windows, using a 20,000-bp scroll, and ratios were
calculated. These ratios were plotted using R and visualized (Fig.
2B). Illumina reads that aligned with up to two mismatches to
ChrII:15800000–17320000 were parsed using Short Oligonucleotide
Analysis Package (SOAP) (7). Using the MAQ program
easyrun (8), 967,616 sequences (with their Illumina-based quality
scores) that mapped with two mismatches or less to the 1.5-MB
interval were assembled. An A-to-G difference at genome coordinate
ChrII:16766679 was detected, but this was due to a
bona fide difference between the reference and initially mutagenized
genome. Four of the G-to-A mutations were sequenced
using the Sanger method and confirmed as post-EMS specific in
the 52b2 mutant.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-11 02:08
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社