轉自:http://www.cnblogs.com/zzjhn/p/3855566.htmlphp
(一)hadoop 相關安裝部署html
一、hadoop在windows cygwin下的部署:java
http://lib.open-open.com/view/1333428291655node
http://blog.csdn.net/ruby97/article/details/7423088python
http://blog.csdn.net/savechina/article/details/5656937mysql
二、hadoop 僞分佈式安裝:linux
http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/git
三、hadoop全分佈式安裝教程:github
http://hi.baidu.com/leejun_2005/item/367da95bd69f4e0ce6c4a581web
http://www.cnblogs.com/flyoung2008/archive/2011/12/09/2281400.html
http://blog.sina.com.cn/s/blog_62186b4601012acs.html
關於eclipse沒法鏈接報錯:
"Map/Reduce location status updater". org/codehaus/jackson/map/JsonMappingException
通過查詢,是因爲hadoop的eclipse 插件裏面缺乏了包
按照這篇文章的說明 修改包後 從新運行成功
http://hi.baidu.com/wangyucao1989/blog/item/279cef87c4b37c34c75cc315.html
若是已經安裝了官方插件,發現無法鏈接的,須要先從eclipse中刪除這個jar包.而後重啓eclipse,(防止緩存)
而後再放入新jar包, 再重啓eclipse.
windows下用eclipse鏈接linux中的hadoop,並執行mr
http://superlxw1234.iteye.com/blog/1583164
http://rdc.taobao.com/team/top/tag/hadoop-hive-%E5%8D%81%E5%88%86%E9%92%9F%E6%95%99%E7%A8%8B/
ssh-keygen -t dsa -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
http://blogread.cn/it/article/6103?f=wb
(二)hive
一、基於hive的日誌統計實戰:
http://www.csdn.net/article/2010-11-28/282620
二、Hive實例:CSDN十大經常使用密碼
http://my.oschina.net/leejun2005/blog/81662
三、hive官方教程:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
四、Hive 隨談(四)– Hive QL
http://www.alidata.org/archives/581 # JOIN
http://wenku.baidu.com/view/242260c489eb172ded63b709.html
五、寫好Hive 程序的五個提示
http://www.alidata.org/archives/622 #排序
六、Hadoop數據倉庫工具--hive介紹(百度)
http://wenku.baidu.com/view/90dad7659b6648d7c1c7460e.html
七、hive 分享(淘寶網)
http://wenku.baidu.com/view/4e4a801ca76e58fafab003b1.html
八、hive簡介(美麗說)
http://wenku.baidu.com/view/0f252121a5e9856a56126025.html
九、Hive學習筆記(阿里巴巴)
http://wenku.baidu.com/view/233308340b4c2e3f5727632a.html
十、Hive - 運用於hadoop的拍字節範圍數據倉庫(論文)
http://wenku.baidu.com/view/b5aebfe9998fcc22bcd10d8a.html
十一、Hive: SQL for Hadoop(An Essential Tool for Hadoop-based Data Warehouses)
http://polyglotprogramming.com/papers/Hive-SQLforHadoop.pdf
十二、Programming Hive
http://www.itpub.net/thread-1724707-1-1.html
1三、Hive 隨談(六)– Hive 的擴展特性:
File Format、SerDe、Map/Reduce 腳本(Transform)、UDF、UDAF
http://www.alidata.org/archives/604
1四、hive 數據傾斜總結
http://www.alidata.org/archives/2109
1五、用hive查詢json格式的複雜數據
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
https://github.com/rcongiu/Hive-JSON-Serde
1六、同事總結的hive sql 優化
http://hbase.iteye.com/blog/1488745
http://superlxw1234.iteye.com/blog/1564456
1七、經過 thrift 接口實現 python 查詢 hive 數據倉庫
http://slaytanic.blog.51cto.com/2057708/734106
1八、經過 thrift 接口實現 php 查詢 hive 數據倉庫(以及phpHiveAdmin簡介)
http://slaytanic.blog.51cto.com/2057708/766230
http://slaytanic.blog.51cto.com/2057708/818721
http://slaytanic.blog.51cto.com/2057708/1071263
1九、Hive SQL使用和數據加載的一點總結
http://slaytanic.blog.51cto.com/2057708/782175
20、hive優化之——控制hive任務中的map數和reduce數
http://superlxw1234.iteye.com/blog/1582880
2一、hive中一些實用的小技巧
http://superlxw1234.iteye.com/blog/1565774
2二、數據倉庫數據模型之:極限存儲--歷史拉鍊表
http://superlxw1234.iteye.com/blog/1567320
2三、Programing Hive讀書筆記
http://www.gemini5201314.net/hadoop/programing-hive%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0.html
(三)pig
一、pig 實戰
http://www.cnblogs.com/xuqiang/archive/2011/06/06/2073601.html
二、pig官方教程
三、Apache Pig中文教程集合
http://www.codelast.com/?p=4550
四、Programming Pig
http://ofps.oreilly.com/titles/9781449302641/index.html
五、PigFly:hadoop 統一數據分析平臺設計(淘寶)
http://www.docin.com/p-344188827.html
http://coderplay.iteye.com/blog/1233865
六、用 Apache Pig 處理百萬歌曲數據(cloudera)
http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/
七、Pig Latin: A Not-So-Foreign Language for Data Processing(斯坦福大學論文)
http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf
八、Lecture 09: Parallel Databases, Big Data, Map/Reduce, Pig-Latin
http://www.cs.washington.edu/education/courses/csep544/11au/lectures/lecture09-parallel-db.pdf
九、Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data
https://github.com/a-b/elephant-bird/tree/master/javadoc
十、pig cookbook:性能調優
http://pig.apache.org/docs/r0.7.0/cookbook.html
http://pig.apache.org/docs/r0.10.0/perf.html#Replicated-Joins
十一、pig stream 用法:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec
http://www.slideshare.net/charmalloc/hadoop-streaming-tutorial-with-python
(四)hadoop原理與編碼
一、hadoop使用中的幾個小細節
http://blog.csdn.net/needle2/article/details/6182515
二、hadoop中map-reduce相關過程與概念的理解:更多請瀏覽目錄
http://hi.baidu.com/shirdrn/item/085a5518be8bfa797b5f25aa
三、hadoop 0.18 中文版官方文檔
http://hadoop.apache.org/docs/r0.20.0/cn/commands_manual.html
四、IBM developerworks:用 Hadoop 進行分佈式並行編程系列, 第 1 ~3 部分
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html
https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/
五、分佈式計算開源框架Hadoop介紹
http://www.infoq.com/cn/articles/hadoop-intro
六、Hadoop基本流程與應用開發( Java )
http://www.infoq.com/cn/articles/hadoop-process-develop
七、hadoop 源碼分析
http://caibinbupt.iteye.com/?page=2
八、hadoop數據流、做業提交分析
http://www.cnblogs.com/spork/category/226077.html
九、Hadoop管理員的十個最佳實踐
http://www.infoq.com/cn/articles/hadoop-ten-best-practice
十、hadoop、hive源碼分析及使用分享
十一、Hadoop計算能力調度器應用和配置(區別於默認的FIFO隊列調度)
http://www.cnblogs.com/ggjucheng/archive/2012/07/25/2608817.html
十二、淺析Hadoop 中的調度策略
http://www.ibm.com/developerworks/cn/opensource/os-hadoop-scheduling/index.html
http://dongxicheng.org/mapreduce/hadoop-schedulers/
Hadoop-0.20.2公平調度器算法解析
http://dongxicheng.org/mapreduce/hadoop-fair-scheduler/
Hadoop計算能力調度器算法解析
http://dongxicheng.org/mapreduce/hadoop-capacity-scheduler/
Hadoop資源感知調度器簡介
http://my.oschina.net/leejun2005/blog/96113
1三、hadoop做業調優參數整理及原理
http://blog.sina.com.cn/s/blog_ae33b83901015cm9.html
1四、比較全的hadoop源碼分析
http://hbase.iteye.com/blog/1024737
1五、如何在Hadoop上編寫MapReduce程序
http://dongxicheng.org/mapreduce/writing-hadoop-programes/
1六、Hadoop學習筆記(二):從map到reduce的數據流
http://www.cnblogs.com/beanmoon/archive/2012/12/08/2805636.html
1七、經過Hadoop的API管理Job
http://blog.csdn.net/dajuezhao/article/details/6591058
1八、揭祕InputFormat:掌控Map Reduce任務執行的利器
http://www.infoq.com/cn/articles/HadoopInputFormat-map-reduce
1九、Hadoop MapReduce開發最佳實踐(上篇)
http://www.infoq.com/cn/articles/MapReduce-Best-Practice-1
20、Hadoop實例:二度人脈與好友推薦
http://my.oschina.net/u/176897/blog/99761
2一、探索大數據分析和 Hadoop
http://www.ibm.com/developerworks/cn/training/kp/os-kp-hadoop/index.html
(五)數據倉庫
一、數據倉庫基礎培訓
http://wenku.baidu.com/view/c788400cba1aa8114431d95b.html
http://wenku.baidu.com/view/412b09e96294dd88d0d26bff.html
二、數據倉庫ods基礎學習
http://wenku.baidu.com/view/bb3e6263caaedd3383c4d3bf.html
三、HBDW-PM-數據倉庫基礎
http://wenku.baidu.com/view/e25bd14769eae009581bec5d.html
(六)Oozie工做流
一、Oozie簡介
http://www.infoq.com/cn/articles/introductionOozie
二、跟着示例學Oozie
http://www.infoq.com/cn/articles/oozieexample
三、擴展Oozie
http://www.infoq.com/cn/articles/ExtendingOozie
四、oozie相關安裝配置與問題解決例子
http://guoyunsky.iteye.com/category/187923
五、oozie總結
(七)HBase
一、hbase官方指南
http://hbase.apache.org/book.html
二、HBase技術介紹
http://www.searchtb.com/2011/01/understanding-hbase.html
三、HBase入門篇2-Java操做HBase例子
http://www.javabloger.com/article/apache-hbase-shell-and-java-api-html.html
四、hbase基本概念和hbase shell經常使用命令用法
http://www.cnblogs.com/flying5/archive/2011/09/15/2178064.html
五、 HBase簡介
http://blog.csdn.net/leeqing2011/article/details/7608261
六、HBase 官方文檔(中文版)
http://www.yankay.com/wp-content/hbase/book.html
七、HBase性能優化方法總結
http://blog.linezing.com/2012/03/hbase-performance-optimization
八、hbase系統架構及數據結構
http://blog.csdn.net/a221133/article/details/6894717
九、[翻譯] HBase存儲架構
http://www.spnguru.com/2010/07/%E7%BF%BB%E8%AF%91-hbase%E5%AD%98%E5%82%A8%E6%9E%B6%E6%9E%84/
十、HBase存儲文件格式概述
http://forchenyun.iteye.com/blog/828549
十一、Hbase, Hive and Pig 介紹(肯特大學)
http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx
十二、python 調用HBase 實例
http://hbase.iteye.com/blog/1178063
1三、hbase在淘寶的應用和優化小結
http://walkoven.com/hbase%20optimization%20and%20apply%20summary%20in%20taobao.pdf
1四、hbase僞分佈式安裝指南:
http://my.oschina.net/leejun2005/blog/91952
1五、HBase上關於CMS、GC碎片、大緩存的一種解決方案:Bucket Cache
http://zjushch.iteye.com/blog/1751387
注:做者來自阿里,據稱讀性能能提高一個數量級,該patch已被hbase社區接受。
1六、HBase 一些 tip
http://www.blogjava.net/changedi/archive/2012/12/28/393577.html
(八)flume
一、Flume日誌收集 原理與實踐
http://www.cnblogs.com/oubo/archive/2012/05/25/2517751.html
二、flume搭建調試
http://log.medcl.net/item/2012/03/flume-build-process/
(九)sqoop
http://blog.csdn.net/leeqing2011/article/details/7630690?utm_source=weibolife
二、Sqoop示例
http://baiyunl.iteye.com/blog/964254
三、使用Sqoop在HDFS和RDBMS之間導數據
http://www.linuxidc.com/Linux/2011-10/45080.htm
四、Sqoop User Guide (v1.4.2)
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html?utm_source=weibolife#_introduction
五、用sqoop進行mysql和hdfs系統間的數據互導
六、Mysql<->sqoop<->HDFS 數據交換實驗
http://leonarding.blog.51cto.com/6045525/1092764
(十)ZooKeeper
一、ZooKeeper Administrator's Guide
http://zookeeper.apache.org/doc/r3.4.3/zookeeperAdmin.html
二、ZooKeeper快速搭建
http://nileader.blog.51cto.com/1381108/795230
三、ZooKeeper管理員指南——部署與管理ZooKeeper
http://blogread.cn/it/article/5917?f=sinat
(十一)NOSQL
一、Redis資料彙總專題
http://blog.nosqlfan.com/html/3537.html
二、MongoDB資料彙總專題
http://blog.nosqlfan.com/html/3548.html
三、NoSQL數據庫筆談
http://sebug.net/paper/databases/nosql/Nosql.html
四、redis入門系列
http://www.cnblogs.com/xhan/archive/2011/02/08/1949867.html
五、Redis經驗談
http://www.programmer.com.cn/14577/
附:個人百度空間(因爲百度的升級門,致使許多博文丟失):