數據集(二)

一、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026bphp

二、幾個實用的測試數據集下載的網站html

   Data for MATLAB hackers (Handwritten Digits、Faces、Text)git

   http://www.cs.toronto.edu/~roweis/data.htmlweb

三、UCI KDD Archive(各種數據集)數據庫

   http://kdd.ics.uci.edu/summary.task.type.html網絡

   http://kdd.ics.uci.edu/summary.data.type.htmlapp

四、UCI收集的機器學習數據集less

   ftp://pami.sjtu.edu.cn/  機器學習

   http://www.ics.uci.edu/~mlearn//MLRepository.htm  ide

五、樣本數據庫

   http://kdd.ics.uci.edu/

   WWW-pages were manually classified

   http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-20/www/data/  

六、CMU World Wide Knowledge Base (Web->KB) project(classified web pages、relational data describing pages and hyperlinks)

   http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-11/www/wwkb/  

七、人工智能機器學習

   http://duch-links.wikispaces.com/

八、文本分類,即rainbow的數據集

   http://www-2.csNaNu.edu/afs/cs/project/theo-11/www/naive-bayes.html  

九、Statlib 數理統計相關程序庫

   http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm

   http://lib.statNaNu.edu/

   http://lib.statNaNu.edu/datasets/

   http://lib.statNaNu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2

十、癌症基因:

   http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

十一、金融、醫藥數據:

   http://lisp.vse.cz/pkdd99/Challenge/chall.htm

十二、時間序列數據的網址

   http://www.stat.wisc.edu/~reinsel/bjr-data/  

1三、kdnuggets 相關連接各類數據集:

   http://www.kdnuggets.com/datasets/index.html

1四、德國智能分析和信息系統

   http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html  

   http://dctc.sjtu.edu.cn/adaptive/datasets/  

   http://fimi.cs.helsinki.fi/data/  

1五、IBM智能信息

   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets

   http://www.almaden.ibm.com/software/quest/Resources/index.shtml

1六、Frequent Set Counting

   http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php

1七、評分數據集

  Movielens 電影評分數據

   基本數據描述:包括如下三個數據集:

   a.943個用戶對1682個電影的10萬條評分

   b.6040個用戶對3900個電影的1百萬條評分

   c.71567個用戶對10681個電影的1千萬條評分

   http://www.grouplens.org/  

 

   Book-Crossing 書籍評分數據

   基本數據描述:包含了278,858個用戶對271,379本書籍的1,149,780條評分。該數據集由Cai-Nicolas Ziegler 在2004年8-9月用4周的時間從Book-Crossing社區用網絡爬出。

   http://www.informatik.uni-freiburg.de/~cziegler/BX/

 

  Jester Joke Data Set 笑話評分集合

   來自UC Berkeley的Ken Goldberg發佈的一個推薦系統使用的數據集。包含關於100個笑話的73,496名用戶評分的410萬條連續評分。

   http://www.ieor.berkeley.edu/~goldberg/jester-data/

 

  Netflix 數據集

   也是電影評分數據集,480,189 個用戶,17,770 部電影,100,480,507 條評分記錄。與它相比,MovieLens 數據集少了 2 個數量級。它的位置相信會逐漸被 Netflix 數據所替代,這是時代進步的必然結果。

   說明:以上四個均爲用戶評分數據

1八、GPS軌跡數據

   GeoLife GPS Trajectories

   http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx  

 

   GPS Trajectories with transportation mode labels

   http://research.microsoft.com/apps/pubs/?id=141896

 

   Movebank 動物軌跡

   http://www.movebank.org/

1九、手機WIFI藍牙

A Community Resource for Archiving Wireless Data At Dartmouth

   http://crawdad.cs.dartmouth.edu/

   crowflow  手機和wifi軌跡

   http://crowdflow.net/

20、OpenStreetMap Data

   planet.openstreetmap.org 或者 http://metro.teczno.com/

2一、openpath上傳數據+API

   https://openpaths.cc/  

2二、FOURSQUARE

2三、GeoTime

   http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx  

2四、數據堂

   http://www.datatang.com/

2五、http://www.kdnuggets.com/datasets/

2六、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html

IBM Almaden Research Center Data Mining Projects

Data Sets:

·         Synthetic Data Generation Code for Associations and Sequential Patterns

·         Synthetic Data Generation Code for Classification

·         "Dense" Data-Sets (apriori binary format, 3.2Mb)

·         Enron Email Data Set

Demos:

·         General Visualizations for Associations

·         Visualization Demo: Market Basket Analysis

 

IBM Intelligent Miner:

 

·         IBM Intelligent Miner for Data

·         Video and image clips from IBM Data Mining T.V. Ad

IBM Data Mining Resources:

·         Business Intelligence Solutions   Our colleagues offering data mining consultancy and services.

·         Data Abstraction Research Group   Our colleagues in IBM Thomas J. Watson Research Center.   Our colleagues in France.

·         Data Mining: Extending the Information Warehouse Framework   IBM White Paper on Data Mining.

在下面的網址能夠找到reuters數據集

   http://www.research.att.com/~lewis/reuters21578.html

關於基金的數據挖掘的網站

   http://www.gotofund.com/index.asp

   http://lans.ece.utexas.edu/~strehl/

reuters數據集

   http://www.research.att.com/~lewis/reuters21578.html

   http://www-2.csNaNu.edu/webkb

   http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf

關聯:

   http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

   http://www.phys.uni.torun.pl/~duch/software.html

WEKA:

   http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar  

1。A jarfile containing 37 classification problems, originally obtained from the UCI repository

   http://prdownloads.sourceforge.net/weka/datasets-UCI.jar  

2。A jarfile containing 37 regression problems, obtained from various sources

   http://prdownloads.sourceforge.net/weka/datasets-numeric.jar  

3。A jarfile containing 30 regression datasets collected by Luis Torgo

   http://prdownloads.sourceforge.net/weka/regression-datasets.jar  

數據挖掘相關比賽以及數據集

  • 2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.

  • ILP 2005 Challenge, on the prediction of functional classes of genes.

  • KDD Cup 2005, on classifying internet user search queries, deadline July 8.

  • Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.

  •  KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets frombioinformatics and quantum physics.

  •  InfoVis 2004 Contest, The History of InfoVis.

  • DATA MINING CUP 2004 (Chemnitz, Germany), for students.

  • InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.

  • KDD CUP 2003

  •  http://www.cs.cornell.edu/projects/kddcup/index.html

  •  KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.

  • DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.

  •  KDD Cup 2002, focus on data mining in molecular biology.

  •  Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.

相關文章
相關標籤/搜索