一、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026bphp
二、幾個實用的測試數據集下載的網站html
Data for MATLAB hackers (Handwritten Digits、Faces、Text)git
http://www.cs.toronto.edu/~roweis/data.htmlweb
三、UCI KDD Archive(各種數據集)數據庫
http://kdd.ics.uci.edu/summary.task.type.html網絡
http://kdd.ics.uci.edu/summary.data.type.htmlapp
四、UCI收集的機器學習數據集less
ftp://pami.sjtu.edu.cn/ 機器學習
http://www.ics.uci.edu/~mlearn//MLRepository.htm ide
五、樣本數據庫
http://kdd.ics.uci.edu/
WWW-pages were manually classified
http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-20/www/data/
六、CMU World Wide Knowledge Base (Web->KB) project(classified web pages、relational data describing pages and hyperlinks)
http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-11/www/wwkb/
七、人工智能機器學習
http://duch-links.wikispaces.com/
八、文本分類,即rainbow的數據集
http://www-2.csNaNu.edu/afs/cs/project/theo-11/www/naive-bayes.html
九、Statlib 數理統計相關程序庫
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.statNaNu.edu/
http://lib.statNaNu.edu/datasets/
http://lib.statNaNu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2
十、癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
十一、金融、醫藥數據:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
十二、時間序列數據的網址
http://www.stat.wisc.edu/~reinsel/bjr-data/
1三、kdnuggets 相關連接各類數據集:
http://www.kdnuggets.com/datasets/index.html
1四、德國智能分析和信息系統
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
1五、IBM智能信息
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
1六、Frequent Set Counting
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php
1七、評分數據集
Movielens 電影評分數據
基本數據描述:包括如下三個數據集:
a.943個用戶對1682個電影的10萬條評分
b.6040個用戶對3900個電影的1百萬條評分
c.71567個用戶對10681個電影的1千萬條評分
http://www.grouplens.org/
Book-Crossing 書籍評分數據
基本數據描述:包含了278,858個用戶對271,379本書籍的1,149,780條評分。該數據集由Cai-Nicolas Ziegler 在2004年8-9月用4周的時間從Book-Crossing社區用網絡爬出。
http://www.informatik.uni-freiburg.de/~cziegler/BX/
Jester Joke Data Set 笑話評分集合
來自UC Berkeley的Ken Goldberg發佈的一個推薦系統使用的數據集。包含關於100個笑話的73,496名用戶評分的410萬條連續評分。
http://www.ieor.berkeley.edu/~goldberg/jester-data/
Netflix 數據集
也是電影評分數據集,480,189 個用戶,17,770 部電影,100,480,507 條評分記錄。與它相比,MovieLens 數據集少了 2 個數量級。它的位置相信會逐漸被 Netflix 數據所替代,這是時代進步的必然結果。
說明:以上四個均爲用戶評分數據
1八、GPS軌跡數據
GeoLife GPS Trajectories
http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx
GPS Trajectories with transportation mode labels
http://research.microsoft.com/apps/pubs/?id=141896
Movebank 動物軌跡
http://www.movebank.org/
1九、手機WIFI藍牙
A Community Resource for Archiving Wireless Data At Dartmouth
http://crawdad.cs.dartmouth.edu/
crowflow 手機和wifi軌跡
http://crowdflow.net/
20、OpenStreetMap Data
planet.openstreetmap.org 或者 http://metro.teczno.com/
2一、openpath上傳數據+API
https://openpaths.cc/
2二、FOURSQUARE
2三、GeoTime
http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx
2四、數據堂
http://www.datatang.com/
2五、http://www.kdnuggets.com/datasets/
2六、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html
IBM Almaden Research Center Data Mining Projects
Data Sets:
· Synthetic Data Generation Code for Associations and Sequential Patterns
· Synthetic Data Generation Code for Classification
· "Dense" Data-Sets (apriori binary format, 3.2Mb)
· Enron Email Data Set
Demos:
· General Visualizations for Associations
· Visualization Demo: Market Basket Analysis
IBM Intelligent Miner:
· IBM Intelligent Miner for Data
· Video and image clips from IBM Data Mining T.V. Ad
IBM Data Mining Resources:
· Business Intelligence Solutions Our colleagues offering data mining consultancy and services.
· Data Abstraction Research Group Our colleagues in IBM Thomas J. Watson Research Center. Our colleagues in France.
· Data Mining: Extending the Information Warehouse Framework IBM White Paper on Data Mining.
在下面的網址能夠找到reuters數據集
http://www.research.att.com/~lewis/reuters21578.html
關於基金的數據挖掘的網站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters數據集
http://www.research.att.com/~lewis/reuters21578.html
http://www-2.csNaNu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
關聯:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.phys.uni.torun.pl/~duch/software.html
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar
數據挖掘相關比賽以及數據集
2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.
ILP 2005 Challenge, on the prediction of functional classes of genes.
KDD Cup 2005, on classifying internet user search queries, deadline July 8.
Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.
KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets frombioinformatics and quantum physics.
InfoVis 2004 Contest, The History of InfoVis.
DATA MINING CUP 2004 (Chemnitz, Germany), for students.
InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.
KDD CUP 2003
http://www.cs.cornell.edu/projects/kddcup/index.html
KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.
DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.
KDD Cup 2002, focus on data mining in molecular biology.
Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.