|||
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. It is a text format for storing sequence data in a series of tab delimited ASCII columns and is commonly used in next-generation sequencing data processing. It is the (non-binary) human-readable version of the BAM format and contains information about the read and the aligned position in the genome. It was developed by Heng Li in Richard Durbins group and others, their paper is here.
After a header section the alignment section describes all results of the aligned read data. The format is best explained with an example line:
Code
1:497:R:-272+13M17D24M 113 1 497 37 37M 15 100338662 0 CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG 0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>> XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37 |
The tags are optional and might vary between alignment programs. Shown are examples from BWA. Important for filtering are usually the tags X0:i (numbers of genome alignments of this read) and XM:i (number of mismatches in alignment).
The read name (at least from Illumina machines) are constructed as:
example:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-2 19:32
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社