||
使用R来操作,得安装biomaRt模块,biomaRt是个什么东西,可以看下这个文件(The biomaRt user’s guide.pdf)。下面是操作过程,回头再来梳理其中相关的一些东西。
1 进入R系统:
2 把biomaRt调入:
>library(biomaRt)
3 利用命令listMarts(),调出所有的数据库和相关描述:
>listMarts()
结果(不完整):
biomart
1 ensembl
2 snp
3 regulation
4 vega
5 fungi_mart_26
6 fungi_variations_26
7 metazoa_mart_26
8 metazoa_variations_26
9 plants_mart_26
10 plants_variations_26
11 protists_mart_26
12 protists_variations_26
13 msd
14 cg_mart_02
15 WS220
16 parasite_mart
植物SNP数据库为其中的第10项: plants_variations_26,当然这个数据库里包含多个子数据库,所以进一步利用下面命令看下有哪些数据库:
>listDatasets(useMart("plants_variations_26"))
注意:这个命令中的双引号一定要加,否则会报错。运行这个命令以后结果如下:
dataset
1 athaliana_eg_snp
2 athaliana_eg_structvar
3 osativa_eg_snp
4 sbicolor_eg_structvar
5 sbicolor_eg_snp
6 vvinifera_eg_snp
7 oglaberrima_eg_snp
8 hvulgare_eg_snp
9 oindica_eg_snp
10 slycopersicum_eg_snp
11 bdistachyon_eg_snp
12 taestivum_eg_snp
13 zmays_eg_snp
这里有13个SNP数据库,后面有对数据库的具体描述(后面的汉字是我自己加的):
description
1 Arabidopsis thaliana variations (TAIR10 (2010-09-TAIR10))
2 Arabidopsis thaliana structural Variations (TAIR10 (2010-09-TAIR10))
3 Oryza sativa Japonica variations (IRGSP-1.0 (IRGSP-1.0))
4 Sorghum bicolor structural Variations (Sorbi1 (2007-12-JGI))(高粱)
5 Sorghum bicolor variations (Sorbi1 (2007-12-JGI))
6 Vitis vinifera variations (IGGP_12x (2012-07-CRIBI))(葡萄)
7 Oryza glaberrima variations (AGI1.1 (2011-05-AGI))
8 Hordeum vulgare variations (IBSC_1.0 (IBSC_1.0))(大麦)
9 Oryza sativa Indica variations (ASM465v1 (2010-07-BGI))
10 Solanum lycopersicum variations (SL2.40 (ITAG2.3))(番茄)
11 Brachypodium distachyon variations (v1.0 (2010-02-Brachy1.2))(二穗短柄草,和小麦有关)
12 Triticum aestivum variations (IWGSC2 (2.2))(普通小麦)
13 Zea mays variations (AGPv3 (5b))(玉米)
以及参考序列版本信息:
version
1 TAIR10 (2010-09-TAIR10)
2 TAIR10 (2010-09-TAIR10)
3 IRGSP-1.0 (IRGSP-1.0)
4 Sorbi1 (2007-12-JGI)
5 Sorbi1 (2007-12-JGI)
6 IGGP_12x (2012-07-CRIBI)
7 AGI1.1 (2011-05-AGI)
8 IBSC_1.0 (IBSC_1.0)
9 ASM465v1 (2010-07-BGI)
10 SL2.40 (ITAG2.3)
11 v1.0 (2010-02-Brachy1.2)
12 IWGSC2 (2.2)
13 AGPv3 (5b)
现在假设我对那个高粱的数据(sbicolor_eg_structvar)比较感兴趣:
先建一个变量和高粱的数据库建立联系
mart_sbicolor_eg_structvar<-useMart( "plants_variations_26","sbicolor_eg_structvar")
建立联系后敲mart_sbicolor_eg_structvar,应该是下面这个样子:
> mart_sbicolor_eg_structvar
Object of class 'Mart':
Using the plants_variations_26 BioMart database
Using the sbicolor_eg_structvar dataset
很自然,我想知道这个数据库能够提供什么东西
>listAttributes(mart_sbicolor_eg_structvar)
结果:
name description
1 dgva_study_accession DGVa study accession
2 study_description Study description
3 sv_variant_type Variant type
4 sv_accession Structural variation name
5 description Structural variation description
6 strain_name Strain name
7 strain_description Strain description
8 source_name Source name
9 external_reference Pubmed ID
10 validation_status Validation status
11 chr_name Chromosome name
12 chrom_start Sequence region start
13 chrom_end Sequence region end
14 seq_region_strand Strand
15 inner_start Inner start
16 inner_end Inner end
17 outer_start Outer start
18 outer_end Outer end
19 set_name SV Set name
20 set_description SV Set description
21 variation_name_20116 SSV accession
22 class_name SSV variant type
23 ssv_strain_name SSV strain name
24 sample_name SSV sample name
25 clinical_significance Clinical significance
26 seq_region_name_20116 SSV chromosome name
27 seq_region_start_20116 SSV sequence region start
28 seq_region_end_20116 SSV sequence region end
29 seq_region_strand_20116 SSV strand
30 inner_start_20116 SSV inner start
31 inner_end_20116 SSV inner end
32 outer_start_20116 SSV outer start
33 outer_end_20116 SSV outer end
.............................................................
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-24 13:00
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社