Hadoop、Pig、Hive、NOSQL 學習資源收集

轉自:http://www.cnblogs.com/zzjhn/p/3855566.htmlphp

(一)hadoop 相關安裝部署html

一、hadoop在windows cygwin下的部署:java

http://lib.open-open.com/view/1333428291655node

http://blog.csdn.net/ruby97/article/details/7423088python

http://blog.csdn.net/savechina/article/details/5656937mysql

二、hadoop 僞分佈式安裝:linux

http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/git

三、hadoop全分佈式安裝教程:github

http://hi.baidu.com/leejun_2005/item/367da95bd69f4e0ce6c4a581web

四、基於Eclipse的Hadoop應用開發環境配置

http://www.cnblogs.com/flyoung2008/archive/2011/12/09/2281400.html

http://blog.sina.com.cn/s/blog_62186b4601012acs.html

關於eclipse沒法鏈接報錯:

"Map/Reduce location status updater". org/codehaus/jackson/map/JsonMappingException

通過查詢,是因爲hadoop的eclipse 插件裏面缺乏了包

按照這篇文章的說明 修改包後 從新運行成功

http://hi.baidu.com/wangyucao1989/blog/item/279cef87c4b37c34c75cc315.html

若是已經安裝了官方插件,發現無法鏈接的,須要先從eclipse中刪除這個jar包.而後重啓eclipse,(防止緩存)

而後再放入新jar包, 再重啓eclipse.

windows下用eclipse鏈接linux中的hadoop,並執行mr

http://superlxw1234.iteye.com/blog/1583164

五、單臺服務器上安裝Hadoop和Hive十五分鐘教程

http://rdc.taobao.com/team/top/tag/hadoop-hive-%E5%8D%81%E5%88%86%E9%92%9F%E6%95%99%E7%A8%8B/

ssh-keygen -t dsa -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

http://blogread.cn/it/article/6103?f=wb

(二)hive

一、基於hive的日誌統計實戰:

http://www.csdn.net/article/2010-11-28/282620

二、Hive實例:CSDN十大經常使用密碼

http://my.oschina.net/leejun2005/blog/81662

三、hive官方教程:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

四、Hive 隨談(四)– Hive QL

http://www.alidata.org/archives/581   # JOIN

http://wenku.baidu.com/view/242260c489eb172ded63b709.html

五、寫好Hive 程序的五個提示

http://www.alidata.org/archives/622  #排序

六、Hadoop數據倉庫工具--hive介紹(百度)

http://wenku.baidu.com/view/90dad7659b6648d7c1c7460e.html

七、hive 分享(淘寶網)

http://wenku.baidu.com/view/4e4a801ca76e58fafab003b1.html

八、hive簡介(美麗說)

http://wenku.baidu.com/view/0f252121a5e9856a56126025.html

九、Hive學習筆記(阿里巴巴)

http://wenku.baidu.com/view/233308340b4c2e3f5727632a.html

十、Hive - 運用於hadoop的拍字節範圍數據倉庫(論文)

http://wenku.baidu.com/view/b5aebfe9998fcc22bcd10d8a.html

十一、Hive: SQL for Hadoop(An Essential Tool for Hadoop-based Data Warehouses)

http://polyglotprogramming.com/papers/Hive-SQLforHadoop.pdf

十二、Programming Hive

http://www.itpub.net/thread-1724707-1-1.html

1三、Hive 隨談(六)– Hive 的擴展特性:

File Format、SerDe、Map/Reduce 腳本(Transform)、UDF、UDAF

http://www.alidata.org/archives/604

1四、hive 數據傾斜總結

http://www.alidata.org/archives/2109

1五、用hive查詢json格式的複雜數據

http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/

https://github.com/rcongiu/Hive-JSON-Serde

1六、同事總結的hive sql 優化

http://hbase.iteye.com/blog/1488745

http://superlxw1234.iteye.com/blog/1564456

1七、經過 thrift 接口實現 python 查詢 hive 數據倉庫

http://slaytanic.blog.51cto.com/2057708/734106

1八、經過 thrift 接口實現 php 查詢 hive 數據倉庫(以及phpHiveAdmin簡介)

http://slaytanic.blog.51cto.com/2057708/766230

http://slaytanic.blog.51cto.com/2057708/818721

http://slaytanic.blog.51cto.com/2057708/1071263

1九、Hive SQL使用和數據加載的一點總結

http://slaytanic.blog.51cto.com/2057708/782175

20、hive優化之——控制hive任務中的map數和reduce數

http://superlxw1234.iteye.com/blog/1582880

2一、hive中一些實用的小技巧

http://superlxw1234.iteye.com/blog/1565774

2二、數據倉庫數據模型之:極限存儲--歷史拉鍊表

http://superlxw1234.iteye.com/blog/1567320

2三、Programing Hive讀書筆記

http://www.gemini5201314.net/hadoop/programing-hive%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0.html

(三)pig

一、pig 實戰

http://www.cnblogs.com/xuqiang/archive/2011/06/06/2073601.html

二、pig官方教程

http://pig.apache.org/

三、Apache Pig中文教程集合

http://www.codelast.com/?p=4550

四、Programming Pig

http://ofps.oreilly.com/titles/9781449302641/index.html

http://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCcQFjAA&url=http%3A%2F%2Fbigdata.googlecode.com%2Ffiles%2FOreilly.Programming.Pig.Sep.2011.pdf&ei=DLGDUNbcI4aTiQfus4HADQ&usg=AFQjCNGzTHIYcc2GuU6ko0TgIKm3UN9T5Q&sig2=2DZtn3yP4KVqro7xt_qAOA

五、PigFly:hadoop 統一數據分析平臺設計(淘寶)

http://www.docin.com/p-344188827.html

http://coderplay.iteye.com/blog/1233865

六、用 Apache Pig 處理百萬歌曲數據(cloudera

http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/

七、Pig Latin: A Not-So-Foreign Language for Data Processing(斯坦福大學論文)

http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf

八、Lecture 09: Parallel Databases, Big Data, Map/Reduce, Pig-Latin

http://www.cs.washington.edu/education/courses/csep544/11au/lectures/lecture09-parallel-db.pdf

九、Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/

https://github.com/a-b/elephant-bird/tree/master/javadoc

十、pig cookbook:性能調優

http://pig.apache.org/docs/r0.7.0/cookbook.html

http://pig.apache.org/docs/r0.10.0/perf.html#Replicated-Joins

十一、pig stream 用法:

http://wiki.apache.org/pig/PigStreamingFunctionalSpec

http://www.slideshare.net/charmalloc/hadoop-streaming-tutorial-with-python

(四)hadoop原理與編碼

一、hadoop使用中的幾個小細節

http://blog.csdn.net/needle2/article/details/6182515

二、hadoop中map-reduce相關過程與概念的理解:更多請瀏覽目錄

http://hi.baidu.com/shirdrn/item/085a5518be8bfa797b5f25aa

三、hadoop 0.18 中文版官方文檔

http://hadoop.apache.org/docs/r0.20.0/cn/commands_manual.html

四、IBM developerworks:用 Hadoop 進行分佈式並行編程系列, 第 1 ~3 部分

http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/

http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html

https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/

五、分佈式計算開源框架Hadoop介紹

http://www.infoq.com/cn/articles/hadoop-intro

六、Hadoop基本流程與應用開發( Java )

http://www.infoq.com/cn/articles/hadoop-process-develop

七、hadoop 源碼分析

http://caibinbupt.iteye.com/?page=2

八、hadoop數據流、做業提交分析

http://www.cnblogs.com/spork/category/226077.html

九、Hadoop管理員的十個最佳實踐

http://www.infoq.com/cn/articles/hadoop-ten-best-practice

十、hadoop、hive源碼分析及使用分享

http://www.oratea.net/?cat=7#

十一、Hadoop計算能力調度器應用和配置(區別於默認的FIFO隊列調度)

http://www.cnblogs.com/ggjucheng/archive/2012/07/25/2608817.html

十二、淺析Hadoop 中的調度策略

http://www.ibm.com/developerworks/cn/opensource/os-hadoop-scheduling/index.html

http://dongxicheng.org/mapreduce/hadoop-schedulers/

Hadoop-0.20.2公平調度器算法解析

http://dongxicheng.org/mapreduce/hadoop-fair-scheduler/

Hadoop計算能力調度器算法解析

http://dongxicheng.org/mapreduce/hadoop-capacity-scheduler/

Hadoop資源感知調度器簡介

http://my.oschina.net/leejun2005/blog/96113

1三、hadoop做業調優參數整理及原理

http://blog.sina.com.cn/s/blog_ae33b83901015cm9.html

1四、比較全的hadoop源碼分析

http://hbase.iteye.com/blog/1024737

1五、如何在Hadoop上編寫MapReduce程序

http://dongxicheng.org/mapreduce/writing-hadoop-programes/

1六、Hadoop學習筆記(二):從map到reduce的數據流

http://www.cnblogs.com/beanmoon/archive/2012/12/08/2805636.html

1七、經過Hadoop的API管理Job

http://blog.csdn.net/dajuezhao/article/details/6591058

1八、揭祕InputFormat:掌控Map Reduce任務執行的利器

http://www.infoq.com/cn/articles/HadoopInputFormat-map-reduce

1九、Hadoop MapReduce開發最佳實踐(上篇)

http://www.infoq.com/cn/articles/MapReduce-Best-Practice-1

20、Hadoop實例:二度人脈與好友推薦

http://my.oschina.net/u/176897/blog/99761

2一、探索大數據分析和 Hadoop

http://www.ibm.com/developerworks/cn/training/kp/os-kp-hadoop/index.html

(五)數據倉庫

一、數據倉庫基礎培訓

http://wenku.baidu.com/view/c788400cba1aa8114431d95b.html

http://wenku.baidu.com/view/412b09e96294dd88d0d26bff.html

二、數據倉庫ods基礎學習

http://wenku.baidu.com/view/bb3e6263caaedd3383c4d3bf.html

三、HBDW-PM-數據倉庫基礎

http://wenku.baidu.com/view/e25bd14769eae009581bec5d.html

(六)Oozie工做流

一、Oozie簡介

http://www.infoq.com/cn/articles/introductionOozie

二、跟着示例學Oozie

http://www.infoq.com/cn/articles/oozieexample

三、擴展Oozie

http://www.infoq.com/cn/articles/ExtendingOozie

四、oozie相關安裝配置與問題解決例子

http://guoyunsky.iteye.com/category/187923

五、oozie總結

http://dirlt.com/oozie.html

(七)HBase

一、hbase官方指南

http://hbase.apache.org/book.html

二、HBase技術介紹

http://www.searchtb.com/2011/01/understanding-hbase.html

三、HBase入門篇2-Java操做HBase例子

http://www.javabloger.com/article/apache-hbase-shell-and-java-api-html.html

四、hbase基本概念和hbase shell經常使用命令用法

http://www.cnblogs.com/flying5/archive/2011/09/15/2178064.html

五、 HBase簡介

http://blog.csdn.net/leeqing2011/article/details/7608261

六、HBase 官方文檔(中文版)

http://www.yankay.com/wp-content/hbase/book.html

七、HBase性能優化方法總結

http://blog.linezing.com/2012/03/hbase-performance-optimization

八、hbase系統架構及數據結構

http://blog.csdn.net/a221133/article/details/6894717

九、[翻譯] HBase存儲架構

http://www.spnguru.com/2010/07/%E7%BF%BB%E8%AF%91-hbase%E5%AD%98%E5%82%A8%E6%9E%B6%E6%9E%84/

十、HBase存儲文件格式概述

http://forchenyun.iteye.com/blog/828549

十一、Hbase, Hive and Pig 介紹(肯特大學)

http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx

十二、python 調用HBase 實例

http://hbase.iteye.com/blog/1178063

1三、hbase在淘寶的應用和優化小結

http://walkoven.com/hbase%20optimization%20and%20apply%20summary%20in%20taobao.pdf

1四、hbase僞分佈式安裝指南:

http://my.oschina.net/leejun2005/blog/91952

1五、HBase上關於CMS、GC碎片、大緩存的一種解決方案:Bucket Cache

http://zjushch.iteye.com/blog/1751387   

注:做者來自阿里,據稱讀性能能提高一個數量級,該patch已被hbase社區接受。

1六、HBase 一些 tip

http://www.blogjava.net/changedi/archive/2012/12/28/393577.html

(八)flume

一、Flume日誌收集 原理與實踐

http://www.cnblogs.com/oubo/archive/2012/05/25/2517751.html

二、flume搭建調試

http://log.medcl.net/item/2012/03/flume-build-process/

(九)sqoop

一、sqoop的安裝、配置及使用簡介

http://blog.csdn.net/leeqing2011/article/details/7630690?utm_source=weibolife

二、Sqoop示例

http://baiyunl.iteye.com/blog/964254

三、使用Sqoop在HDFS和RDBMS之間導數據

http://www.linuxidc.com/Linux/2011-10/45080.htm

四、Sqoop User Guide (v1.4.2)

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html?utm_source=weibolife#_introduction

五、用sqoop進行mysql和hdfs系統間的數據互導

http://abloz.com/2012/07/19/data-between-the-mysql-and-hdfs-system-of-mutual-conductance-using-sqoop.html

六、Mysql<->sqoop<->HDFS 數據交換實驗

http://leonarding.blog.51cto.com/6045525/1092764

(十)ZooKeeper

一、ZooKeeper Administrator's Guide

http://zookeeper.apache.org/doc/r3.4.3/zookeeperAdmin.html

二、ZooKeeper快速搭建

http://nileader.blog.51cto.com/1381108/795230

三、ZooKeeper管理員指南——部署與管理ZooKeeper

http://blogread.cn/it/article/5917?f=sinat

(十一)NOSQL

一、Redis資料彙總專題

http://blog.nosqlfan.com/html/3537.html

二、MongoDB資料彙總專題

http://blog.nosqlfan.com/html/3548.html

三、NoSQL數據庫筆談

http://sebug.net/paper/databases/nosql/Nosql.html

四、redis入門系列

http://www.cnblogs.com/xhan/archive/2011/02/08/1949867.html

五、Redis經驗談

http://www.programmer.com.cn/14577/

附:個人百度空間(因爲百度的升級門,致使許多博文丟失):

一、http://203.208.46.148/#q=site:baidu.com+hadoop+leejun_2005&hl=zh-CN&newwindow=1&prmd=imvns&ei=J1dwUKyBOcmsiAff9IHwAw&start=10&sa=N&bav=on.2,or.r_gc.r_pw.&fp=2ba1f2c2b0790967&biw=1366&bih=643

ZZ:http://my.oschina.net/leejun2005/blog/81771

相關文章
相關標籤/搜索