Chenfiona的个人博客分享 http://blog.sciencenet.cn/u/Chenfiona

博文

[转载]【资源共享】9大类186组公开数据集

已有 1944 次阅读 2019-8-9 09:16 |个人分类:最新资讯|系统分类:博客资讯|文章来源:转载

本周IJAC分享复杂网络、计算网络、数据挑战赛、图像处理、机器学习、自然语言、神经科学、社会网络、交通等9大类公开数据集,多数可免费使用。点击数据集名称即可查看详情。更多实用小工具、最新资讯、特约好文,每周持续更新!


Complex Networks

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) AMiner Citation Network Dataset

2) CrossRef DOI URLs

3) DIMACS Road Networks Collection

4) NBER Patent Citations

5) NIST complex networks data collection

6) Network Repository with Interactive Exploratory Analysis Tools

7) Protein-protein interaction network

8) PyPI and Maven Dependency Network

9) Scopus Citation Database

10) Small Network Data

11) Stanford GraphBase

12) Stanford Large Network Dataset Collection

13) The Laboratory for Web Algorithmics (UNIMI)

14) UCI Network Data Repository

15) UFL sparse matrix collection


Computer Networks

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) 3.5B Web Pages from CommonCrawl 2012

2) 53.5B Web clicks of 100K users in Indiana Univ.

3) CAIDA Internet Datasets

4) CRAWDAD Wireless datasets from Dartmouth Univ.

5) ClueWeb09 - 1B web pages

6) ClueWeb12 - 733M web pages

7) CommonCrawl Web Data over 7 years

8) Criteo click-through data

9) Internet-Wide Scan Data Repository

10) OONI: Open Observatory of Network Interference - Internet censorship data

11) Open Mobile Data by MobiPerf

12) The Peer-to-Peer Trace Archive - Real-world measurements play a key role [...]

13) Rapid7 Sonar Internet Scans

14) UCSD Network Telescope, IPv4 /8 net


Data Challenges

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) Bruteforce Database

2) Challenges in Machine Learning

3) CrowdANALYTIX dataX

4) DrivenData Competitions for Social Good

5) ICWSM Data Challenge (since 2009)

6) KDD Cup by Tencent 2012

7) Kaggle Competition Data

8) Localytics Data Visualization Challenge

9) Netflix Prize

10) Space Apps Challenge

11) Telecom Italia Big Data Challenge

12) TravisTorrent Dataset - MSR'2017 Mining Challenge

13) TunedIT - Data mining & machine learning data sets, algorithms, challenges

14) Yelp Dataset Challenge


Image Processing

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) 10k US Adult Faces Database

2) 2GB of Photos of Cats

3) Adience Unfiltered faces for gender and age classification

4) Affective Image Classification

5) Animals with attributes

6) CADDY Underwater Stereo-Vision Dataset of divers' hand gestures - [...]

7) Caltech Pedestrian Detection Benchmark

8) Chars74K dataset - Character Recognition in Natural Images (both English [...]

9) Danbooru Tagged Anime Illustration Dataset - A large-scale anime image [...]

10) Face Recognition Benchmark

11) Flickr: 32 Class Brand Logos

12) GDXray - X-ray images for X-ray testing and Computer Vision

13) HumanEva Dataset - The HumanEva-I dataset contains 7 calibrated video [...]

14) ImageNet (in WordNet hierarchy)

15) Indoor Scene Recognition

16) International Affective Picture System, UFL

17) KITTI Vision Benchmark Suite

18) Labeled Information Library of Alexandria - Biology and Conservation - [...]

19) MNIST database of handwritten digits, near 1 million examples

20) Massive Visual Memory Stimuli, MIT

21) Open Images From Google - Pictures with segmentation masks for 2.8 [...]

22) Stanford Dogs Dataset

23) The Action Similarity Labeling (ASLAN) Challenge

24) The Oxford-IIIT Pet Dataset

25) Violent-Flows - Crowd Violence / Non-violence Database and benchmark

26) Visual genome

27) YouTube Faces Database


Machine Learning

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) All-Age-Faces Dataset - Contains 13'322 Asian face images distributed [...]

2) Context-aware data sets from five domains

3) Delve Datasets for classification and regression

4) Discogs Monthly Data

5) IMDb Database

6) Keel Repository for classification, regression and time series

7) Labeled Faces in the Wild (LFW)

8) Lending Club Loan Data

9) Million Song Dataset

10) More Song Datasets

11) MovieLens Data Sets

12) New Yorker caption contest ratings

13) RDataMining - "R and Data Mining" ebook data

14) Registered Meteorites on Earth

15) Restaurants Health Score Data in San Francisco

16) UCI Machine Learning Repository

17) Yahoo! Ratings and Classification Data

18) YouTube-BoundingBoxes

19) Youtube 8m

20) eBay Online Auctions (2012)


Natural Language

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) Automatic Keyphrase Extraction

2) Blizzard Challenge Speech - The speech + text data comes from [...]

3) Blogger Corpus

4) CLiPS Stylometry Investigation Corpus

5) ClueWeb09 FACC

6) ClueWeb12 FACC

7) DBpedia - 4.58M things with 583M facts

8) Flickr Personal Taxonomies

9) Freebase of people, places, and things

10) German Political Speeches Corpus - Collection of political speeches from [...]

11) Google Books Ngrams (2.2TB)

12) Google MC-AFP - Generated based on the public available Gigaword dataset [...]

13) Google Web 5gram (1TB, 2006)

14) Gutenberg eBooks List

15) Hansards text chunks of Canadian Parliament

16) LJ Speech - Speech dataset consisting of 13,100 short audio clips of a [...]

17) Microsoft MAchine Reading COmprehension Dataset (or MS MARCO)

18) Machine Comprehension Test (MCTest) of text from Microsoft Research

19) Machine Translation of European languages

20) Making Sense of Microposts 2016 - Named Entity rEcognition and Linking

21) Multi-Domain Sentiment Dataset (version 2.0)

22) Noisy speech database for training speech enhancement algorithms and TTS [...]

23) Open Multilingual Wordnet

24) POS/NER/Chunk annotated data

25) Personae Corpus

26) SMS Spam Collection in English

27) SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles)

28) Stanford Question Answering Dataset (SQuAD)

29) USENET postings corpus of 2005~2011

30) Universal Dependencies

31) Webhose - News/Blogs in multiple languages

32) Wikidata - Wikipedia databases

33) Wikipedia Links data - 40 Million Entities in Context

34) WordNet databases and tools

35) WorldTree Corpus of Explanation Graphs for Elementary Science Questions - [...]


Neuroscience

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) Brain Catalogue

2) Brainomics

3) Collaborative Research in Computational Neuroscience (CRCNS)

4) FCP-INDI

5) Human Connectome Project

6) NDAR

7) NIMH Data Archive

8) NeuroData

9) NeuroMorpho - NeuroMorpho.Org is a centrally curated inventory of [...]

10) Neuroelectro

11) OASIS

12) OpenNEURO

13) OpenfMRI

14) Study Forrest


Social Networks

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) 72 hours #gamergate Twitter Scrape

2) Ancestry.com Forum Dataset over 10 years

3) CMU Enron Email of 150 users

4) Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape

5) EDRM Enron EMail of 151 users, hosted on S3

6) Facebook Data Scrape (2005)

7) Facebook Social Networks from LAW (since 2007)

8) Foursquare from UMN/Sarwat (2013)

9) GitHub Collaboration Archive

10) Google Scholar citation relations

11) High-Resolution Contact Networks from Wearable Sensors

12) Indie Map: social graph and crawl of top IndieWeb sites

13) Network Twitter Data

14) Reddit Comments

15) Skytrax' Air Travel Reviews Dataset

16) Social Twitter Data

17) SourceForge.net Research Data

18) Twitter Data for Online Reputation Management

19) Twitter Data for Sentiment Analysis

20) Twitter Graph of entire Twitter site

21) UNIMI/LAW Social Network Datasets

22) United States Congress Twitter Data - Daily datasets with tweets of 1100+ [...]

23) Yahoo! Graph and Social Data

24) Youtube Video Social Graph in 2007,2008


Transportation

(点击数据集名称查看详情,部分数据集需国外IP访问)


1) Airlines OD Data 1987-2008

2) Ford GoBike Data (formerly Bay Area Bike Share Data)

3) Bike Share Systems (BSS) collection

4) Dutch Traffic Information

5) GeoLife GPS Trajectory from Microsoft Research

6) German train system by Deutsche Bahn

7) Hubway Million Rides in MA

8) Montreal BIXI Bike Share

9) NYC Taxi Trip Data 2009-

10) NYC Taxi Trip Data 2013 (FOIA/FOILed)

11) NYC Uber trip data April 2014 to September 2014

12) Open Traffic collection

13) OpenFlights - airport, airline and route data

14) Philadelphia Bike Share Stations (JSON)

15) Plane Crash Database, since 1920

16) RITA Airline On-Time Performance data

17) RITA/BTS transport data collection (TranStat)

18) Toronto Bike Share Stations (JSON and GBFS files)

19) Transport for London (TFL)

20) Travel Tracker Survey (TTS) for Chicago

21) U.S. Bureau of Transportation Statistics (BTS)

22) U.S. Domestic Flights 1990 to 2009

23) U.S. Freight Analysis Framework since 2007


RECOMMEND

最受欢迎的干货

科研神网站,免费的!

复杂公式转LaTex:一张图片,三步搞定!

【线上课堂】Endnote使用大全

如何提升科研成果影响力

提升科研效率的几款小工具

如何找到科研金点子?

2019年国际会议参考列表

【最新资讯】2018全球工程前沿报告

一张图帮你界定AI

【论文助手】写好摘要的6个诀窍

IEEE给您的8条办会建议

【主编报告】如何写好一篇学术论文?

Nature社论:论文提笔前,编辑给你的小建议

投稿小心机:别再放过cover letter!

Science:没时间写论文?这么办!

【IJAC支招】Poster=PPT?NO!

【同行评议】如何撰写审稿报告?

【同行评议】优秀论文背后的“伯乐”们



更多精彩内容,欢迎关注

1) IJAC官方网站:

http://link.springer.com/journal/11633

http://www.ijac.net

2) Linkedin: Int. J. of Automation and Computing

3) 新浪微博: IJAC-国际自动化与计算杂志

4) Twitter: IJAC_Journal

5) Facebook: ijac journal

关于杂志或文章,您有任何意见或建议,欢迎后台留言或私信小编

本文编辑:欧梨成



http://blog.sciencenet.cn/blog-749317-1192980.html

上一篇:特约综述+最新研究,好文云集!
下一篇:AI复原”美男学霸”,一文综述背后神算法

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-10-21 21:32

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部