jiyanbio1983的个人博客分享 http://blog.sciencenet.cn/u/jiyanbio1983

博文

“如何玩转生物大数据”系列:基于ArrayExpress网站快速搜索功能

已有 8829 次阅读 2017-7-16 22:19 |个人分类:生物信息|系统分类:科研笔记

ArrayExpress is a database of functional genomics data (http://www.ebi.ac.uk/arrayexpress/). It stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. Objectives of this post is to show you how to effectively search and retrieve data from the ArrayExpress website.



1. What kinds of data types are stored in ArrayExpress?

These are the different experiment types that are currently reposited in ArrayExpress.



2. Programmatic access and retrieval by constructing URLs

The main object in ArrayExpress is the experiment.

  • An experiment usually groups several assays belonging to one study or publication.

  • Each experiment contains metadata describing the biological specimen and experimental procedures, as well as resulting data files.

Experiments can be searched for by keywords and specific fields (e.g. sample attributes or experiment types)

2.1 Keyword searches for experiments

Keyword searches of all fields for experiments and files linked to experiments can be made using the following format of URLs:
1. https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments?keywords=prostate+AND+breast
2. https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments?keywords=prostate+breast (same as above)
3. https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments?keywords=prostate+OR+breast
4. https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments?keywords=prostate+NOT+breast

A few points to note when using keyword search:

  • Accession number and keyword searches are case insensitive

  • Use an asterisk * as a multiple character wild card e.g. keywords=colo* will search for colon, colorectal, color etc

  • Use a question mark ? as a single character wild card e.g. keywords=te?t will search for text and test

  • Phrases of more than one word must be entered in quotes e.g. keywords=“growth condition”

  • More than one keyword can by searched for using the ‘+’ sign e.g. keywords=lung+cancer

2.2 Key particular fields for searching


3. Case Studies



what experiment data types are frequently used to study stem cells?

what experiment study designs are frequently used to study stem cells?

what cell types are frequently used to study stem cells?



what cell lines are frequently used to study stem cells?

what organism parts are frequently used to study stem cells?


what antibodies are frequently used to study stem cells?


what disease states are frequently studies in stem cells related projects?



3.2 find experiments samples of which have survival-time records in metadata

Use particular fields for searching: species=“homo sapiens”, sac=survival and constructed URL:

https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments?species=%22homo%20sapiens%22&sac=survival

what experiment data types are frequently used in experiments samples of which have patient survival records?



what diseases are studied by these experiments?



4. Conclusions and References
This post shows that ArrayExpress search is a very powerful tool for researchers to find public datasets. I strongly suggest you use ArrayExpress to find high quality public data to cross-validate your research objectives.
You need to read more materials to skillfully master this tool:


关注“如何玩转生物大数据”微信公众号,及时获取更多内容









https://blog.sciencenet.cn/blog-3291578-1066688.html

上一篇:“如何玩转生物大数据”系列:幽门螺旋杆菌感染胃癌样本特异表达
下一篇:“如何玩转生物大数据”系列:ENCODE数据分析经验分享(一)
收藏 IP: 202.127.20.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-12-20 01:15

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部