||
一、NCBI36 / hg18
1. human_b36
该参考的染色体编号开头不含“chr”,是千人基因组过去使用过的参考基因组,包含EBV病毒序列type1类型(NC_007605)但不含ALT重叠群(alternate loci)。现已弃用,以下分女用、男用两种版本。
(1) human_b36_female
https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gz
ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gz
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gz
(2) human_b36_male
https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gz
ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gz
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gz
2. Ensembl release 54
Homo_sapiens.NCBI36.54.dna.toplevel,该参考的染色体编号开头不含“chr”。
https://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz
ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz
二、GRCh37 / hg19
1. human_g1k_v37(别名:hs37-1kg)
Human g1k v37 是GRCh37系列的基础参考,且该参考的染色体编号开头不含“chr”。
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
2. Homo_sapiens_assembly19(别名:hs37)
Broad Institute所用参考的类GRCh37版本,介于 human g1k v37 和 hs37d5 之间,它比 human g1k v37 多了EBV病毒序列 NC_007605,但不含 hs37d5 的级联诱饵序列。
https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta
s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta
3. hs37d5
该参考在 human g1k v37 的基础上增加了 Broad Institute 的名为hs37d5的级联诱饵序列(concatenated decoy sequences,有来自HuRef、BAC或者质粒克隆和NA12878,可以提高序列比对的准确率)和 human herpesvirus 4 type 1 sequence 人类疱疹病毒序列(NC_007605),且该参考也是 Dante Labs 全基因组测序目前使用的参考基因组。该参考的染色体编号开头不含“chr”。
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
https://www.yfish.org/static/hs37d5.7z
www.yfish.org/static/hs37d5.7z
4. hg19
(1) YSEQ全基因组测序目前使用的参考,采用长度16569的通用的rCRS线粒体序列:
https://genomes.yseq.net/WGS/ref/hg19/hg19.zip
genomes.yseq.net/WGS/ref/hg19/hg19.zip
(2) UCSC原版,采用长度16571的旧版的约鲁巴人(Yoruba)线粒体序列,不推荐一般情况下使用:
https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
5. Ensembl release 75
(1) Homo_sapiens.GRCh37.75.dna.primary_assembly,该参考的染色体编号开头不含“chr”,且SN与 human g1k v37 基本一致。
https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
(2) Homo_sapiens.GRCh37.75.dna.toplevel,该参考的染色体编号开头不含“chr”。
https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz
ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz
6. build37_used_by_cg
该参考的染色体下载自千人基因组,且编号开头为UCSC样式(“chr”+编号),只有已编排到主序列的部分,不含未定位序列,因此不建议一般情况下使用。
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz
三、GRCh38 / hg38
1. GCA_000001405.15_GRCh38_no_alt_analysis_set(别名:hs38)
该参考的染色体编号开头包含“chr”前缀,比 GCA_000001405.15_GRCh38_full_analysis_set 序列少了可能影响读取映射器的ALT重叠群(alternate loci),且比 GRCh38 primary assembly 多出EBV病毒序列以作诱饵,更适合一般情况下的参考选用,且该参考也是 Nebula 全基因组测序目前使用的参考。
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
2. GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set(别名:hs38d1)
该参考比 GCA_000001405.15 GRCh38 no alt analysis set 多了哈佛医学院提交到NCBI的 hs38d1 诱饵序列(包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列)。
(1) NCBI官网的版本:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz
(2) 其他来源,比NCBI的参考多了多种病毒序列:
https://www.yfish.org/static/hs38d1.7z
www.yfish.org/static/hs38d1.7z
3. GCA_000001405.15_GRCh38_full_analysis_set(别名:hs38a)
该参考比UCSC的hg38多了EBV病毒的序列(chrEBV),且比 GCA_000001405.15 GRCh38 no alt analysis set 多了ALT重叠群。
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
4. GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set
该参考比 GCA_000001405.15 GRCh38 full analysis set 多了hs38d1的诱饵序列,也比 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 多了ALT重叠群。
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz
https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna
ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna
5. GRCh38_full_analysis_set_plus_decoy_hla(别名:hs38DH、GRCh38DH)
该参考比 GCA_000001405.15 GRCh38 full plus hs38d1 analysis set 多了大量HLA分型的序列,且比 GCA000001405.15 GRCh38 no alt analysis set 多了ALT重叠群、hs38d1的诱饵序列、HLA分型所在序列,同时该参考也被用作古人DNA(aDNA)的cram数据的参考基因。在Broad,也叫Homo_sapiens_assembly38。
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
6. hg38
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
https://genomes.yseq.net/WGS/ref/hg38/hg38.fa
genomes.yseq.net/WGS/ref/hg38/hg38.fa
7. Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC
该参考在 GCA_000001405.15_GRCh38_no_alt_analysis_set 的基础上增加了核酸内切酶非催化亚基序列ERCC。
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta
ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta
8. Homo_sapiens_assembly38(别名:hs38DH、GRCh38DH)
Broad Institute所用参考的类hg38版本,碱基序列与 GRCh38 full analysis set plus decoy hla 基本相同。
https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta
storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta
9. Ensembl release 106
(1) Homo_sapiens.GRCh38.dna.primary_assembly,该参考的SN不含EBV病毒序列,且该参考的染色体编号开头不含“chr”,但其他部分与 GCA_000001405.15 GRCh38 no alt analysis set 相对一致。
https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
(2) Homo_sapiens.GRCh38.dna.toplevel,该参考的染色体编号开头不含“chr”。
https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
10. hs38s
该参考在 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 的基础上多了包含GSTT1基因的 22_KI270879v1_alt 序列,其他染色体编号也在hs38d1参考的基础上去掉前缀“chr”,并且线粒体编号用MT来表示。这也是http://Sequencing.com测序机构所使用的参考基因组。
https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/content
api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/content
四、T2T-CHM13
与其他参考相比,Telomere-to-Telomere(T2T)机构已实现从端粒到端粒的完整测序,填补了传统测序的残留空白,但T2T仍处于实验阶段,且可能存在单个位点错误等问题。详细资料建议自行查阅。2021年4月,有关T2T-CHM13的完整人类参考相关论文已被发布到《科学》杂志。
1. CHM13_v1.1
CHM13 T2T v1.1 参考的“chr+数字”命名染色体+线粒体的版本,不含Y染色体。
https://processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fa
processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fa
https://processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fa
processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fa
https://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa
processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa
2. CP086569.1-CHM13_v1.1
该参考在 CHM13 T2T v1.1 的基础上增加德系犹太人NA24385样本作Y染色体参考,默认父系单倍群为J1-ZS2712,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CP086569.1。
https://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa
processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa
3. T2T-v2.0
官方 CHM13 T2T v2.0 的全部参考基因组,Y染色体序列自带NA24385样本的第二版(CP086569.2),且染色体和线粒体命名含前缀“chr”。
https://processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fa
processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fa
4. CM034974.1-CHM13_v1.1
该非官方参考在 CHM13 T2T v1.1 的基础上增加了样本HG01243的Y染色体作参考,默认父系单倍群为R1b-DF27,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CM034974.1且该Y染色体更接近GRCh38。
https://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fa
processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fa
5. T2T-CHM13v2.0(Genome Informatics Section版本)
(1) CHM13v2.0
T2T-CHM13v2.0 参考本体,染色体X、Y部分重复假常染色体区,且序列名已转换为UCSC样式(“chr”+编号)。
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
(2) CHM13v2.0_noY
该参考不含Y染色体,即 T2T-CHM13v1.1。
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gz
s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gz
(3) CHM13v2.0_maskedY
该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽。
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz
s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz
(4) CHM13v2.0_maskedY_rCRS
该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽,并且本参考的线粒体被rCRS的线粒体模型NC_012920.1/J01415.2替换(rCRS也被用于GRCh37/GRCh38/hg38)。
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz
s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz
6. hs1
T2T-CHM13v2.0 参考的“chr+染色体/线粒体编号”命名版本,阅读起来相对方便。
https://hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz
hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz
五、诱饵序列
1. hs37d5cs
hs37d5 的级联诱饵序列(concatenated decoy sequences),有来自HuRef、BAC或者质粒克隆和NA12878,SN仅以一条“hs37d5”单独命名,且其中的各种序列之间以长度若干的N相连。该诱饵序列已被用于hs37d5参考主序列中。
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz
2. hs37d5ss
hs37d5 的非级联诱饵序列,其中的每条序列均单独存在,这一点类似于hs38d1诱饵序列。
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz
3. GCA_000786075.2_hs38d1_genomic
hs38d1 的几千条单独存在的非级联诱饵序列,包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列。其命名不含“chr”前缀和“decoy”后缀。
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz
4. GRCh38_full_analysis_set_plus_decoy_hla-extra(别名:hs38DH-extra)
在诱饵序列 hs38d1 的基础上增加了与HLA分型有关的序列以作为类似于ALT重叠群(alternate loci)的存在,但诱饵命名包含“chr”前缀和“decoy”后缀。该诱饵序列已被用于hs38DH参考主序列中。
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa
5. EBVt1(别名:NC_007605、HHV-1)
即 NC_007605.1,human herpesvirus 4 type 1 sequence 人类疱疹病毒序列,它不属于人类基因组,但可以增加全基因组检测结果的准确度(尤其是唾液样本)。
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz
ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz
ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz
六、其他
以下参考不常用,因此归为一类。比如 GCA_000001405.28 GRCh38.p13 genomic 的2号染色体以GenBank-Accn编号“CM000664.1”表示,而不是常规的“2”或“chr2”;比如 GCF_000001405.39 GRCh38.p13 genomic 的2号染色体以RefSeq-Accn编号“NC_000002.12”表示,用“NT_187361.1”表示“chr1_KI270706v1_random”;以及比如不属于NCBI GRC或UCSC系列的参考基因组“CHM13”“NA12878_prelim”等。以下只列举一部分链接:
1. hg38_CP086569
该混合参考的常染色体(1~22号)、X染色体和线粒体使用hg38序列,Y染色体使用T2T的CP086569.1序列,且不含未定位在主要序列的hg38序列片段。
https://ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fasta
ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fasta
2. NeandertalizedReference
尼安德特人化的智人参考基因组。该参考的非线粒体部分基因长度与hs37d5长度一致,但参考碱基改为了与古人类——尼安德特人一致的内容,且线粒体长度不等(17569)、增加了肠杆菌噬菌体phiX序列。
https://cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fa
cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fa
3. HG01243_v3
https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/content
api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/content
4. NCBI收录分类:GCF
(1) GCF_000001405.25_GRCh37.p13_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz
(2) GCF_000001405.40_GRCh38.p14_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz
(3) GCF_009914755.1_T2T-CHM13v2.0_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
(4) CHM1_1.1
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gz
(5) HuRef
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gz
5. NCBI收录分类:GCA
(1) GCA_009914755.4_T2T-CHM13v2.0_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0/GCA_009914755.4_T2T-CHM13v2.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0/GCA_009914755.4_T2T-CHM13v2.0_genomic.fna.gz
(2) GCA_009914755.3_T2T-CHM13v1.1_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gz
(3) GCA_009914755.2_T2T-CHM13v1.0_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.2_T2T-CHM13v1.0/GCA_009914755.2_T2T-CHM13v1.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.2_T2T-CHM13v1.0/GCA_009914755.2_T2T-CHM13v1.0_genomic.fna.gz
(4) GCA_000001405.29_GRCh38.p14_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz
(5) GCA_000001405.15_GRCh38_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz
(6) CHM1_1.1
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gz
(7) HuRef
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gz
(8) NA12878_prelim_3.0
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gz
(9) NA19240_prelim_3.0
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gz
(10) HG00514_prelim_3.0
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gz
(11) HG00733_prelim_1.0
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gz
(12) YH_2.0(炎黄)
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gz
(13) KOREF1.0
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gz
(14) GCA_018873775.2_hg01243.v3.0_genomic
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz
…………
【注】
文中列出的 Ensembl 参考仅为NCBI36、GRCh37的最终版,以及GRCh38的最新版,如果需要从Ensembl下载其他release,可进入如下目录来进行选择:
https://ftp.ensembl.org/pub/
2. 以下几种参考基因组也被用于千人基因组(1000genomes)WGS数据的主要参考,后三种推荐在一般情况下使用:
human_b36 (已淘汰)
human_g1k_v37
hs37d5
hs38 (GCA_000001405.15_GRCh38_no_alt_analysis_set,但EBI的官方链接已被移除)
hs38DH (GRCh38_full_analysis_set_plus_decoy_hla)
3. Fasta参考文件的本体既可以直接使用,也可以作为bgzip压缩的gz格式使用。
4. 更多不常见的fasta参考也可以通过在这里逐级搜索对应GCA或GCF的编号下载到(可在 NCBI Genome Remapping Service 的 Source Orgarism 输入 Homo Sapiens 找出编号):
https://ftp.ncbi.nlm.nih.gov/genomes/all/
https://www.ncbi.nlm.nih.gov/genome/tools/remap
5. hg18的完整版参考被官网移除,因此hg18仅存的的染色体参考版本链接如下(可手动拼接成完整的参考):
Index of /goldenPath/hg18/chromosomes
hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
6. Illumina官网也有一部分参考基因组文件,找到Homo Sapiens(智人)所在位置后,根据需要下载并使用即可:
iGenomes
support.illumina.com.cn/sequencing/sequencing_software/igenome.html
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-8 01:28
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社