第二步:虛擬機環境搭建html
第三步:用戶信息java
第四步 安裝、配置Java環境node
第五步 Zookeeper安裝配置mysql
第八步:Sqoop安裝部署(sqoop1)shell
第九步:Hive安裝部署數據庫
我是要在另外一臺新服務器上搭建ESXi,部署了5個虛擬機,用 vSphere Client 管理。(注:若是選擇CD/DVD驅動器的時候,一直顯示正在鏈接,則須要重啓客戶端)
apache
這裏我選用的是Cloudera公司的CDH版本,問題少一些,而且能夠配套下載,避免遇到各類兼容問題。下載地址bootstrap
系統配置
相關軟件全放到/opt目錄下,並且環境變量全在各自的安裝目錄配置文件中設定(也能夠在~/.bashrc 中統一設置)
環境變量
配置文件
192.168.0.155 NameNode1
192.168.0.156 NameNode2
192.168.0.157 DataNode1
192.168.0.158 DataNode2
192.168.0.159 DataNode3
127.0.0.1 localhost #這個必需要有
節點配置圖
爲了之後的模塊化管理,打算hadoop,hbase,hive等等都單獨建用戶
由於這5臺機器建立用戶,配置權限等的操做是同樣的,咱們要不就是在五個機器上都敲一遍命令,要不就是在一臺機器上配完了再把文件複製過去,都比較繁瑣。
由於我用的是Xshell,使用 【Alt + t , k】或者【工具】->【發送鍵輸入到全部會話】,這樣只要在一個會話中輸入命令,全部打開的會話都會執行,就像是同時在這5臺機器上敲命令同樣。
su #使用root用戶 useradd -m hadoop -s /bin/bash #用一樣方式建立hbase,hive,zookeeper,sqoop用戶 passwd hadoop #給用戶設置密碼 visudo #給用戶設定權限 :98 在98行新加hadoop的權限便可
接下來就是安裝SSH、配置SSH無密碼登錄
首先更新一下系統軟件
yum upgrade
設置本機公鑰、私鑰
cd ~/.ssh/ # 若沒有該目錄,請先執行一次 mkdir ~/.ssh
ssh-keygen -t rsa #一路回車
cat id_rsa.pub >> authorized_keys # 將公鑰加入服務器
chmod 600 ./authorized_keys # 修改文件權限
-----------------------------------若是是非root用戶,下面這一步必需要作----------------------------------------------------
chmod 700 ~/.ssh #修改文件夾權限 mkdir生成的文件夾默認是775,必須改爲700;用ssh localhost生成的文件夾也能夠
上面介紹的SSH免密登陸本機的,而咱們的登陸關係是這樣的
因此 還要分別賦予公鑰
使用yum安裝java(每一臺虛擬機)
sudo yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel
默認安裝路徑: /usr/lib/jvm/java-1.7.0-openjdk
而後在 /etc/environment 中保存JAVA_HOME變量
sudo vi /etc/environment
內容以下
mv conf/zoo_example.cfg conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/data/zookeeper
dataLogDir=/home/hadoop/logs/zookeeper
clientPort=2181
server.0=NameNode1:2888:3888
server.1=NameNode2:2888:3888
server.2=DataNode1:2888:3888
server.3=DataNode2:2888:3888
server.4=DataNode3:2888:3888
echo 1 > /home/hadoop/data/zookeeper/myid #由於zoo.cfg文件中 NameNode2前面的數字是1 因此寫入1便可 #若是DataNode3的話就須要寫4
注:必定要建立這兩個目錄 不然報錯【ERROR [main:QuorumPeerMain@86] - Invalid config, exiting abnormally】
# sudo yum install ntpdate #若是沒有安裝ntpdate的話,須要先安裝
sudo ntpdate time.nist.gov
bin/zkServer.sh start
bin/zkServer.sh status
在/opt下面建立一個文件夾 software並更改用戶組
cd /opt sudo mkdir software sudo chown -R hadoop:hadoop software
而後全部大數據相關程序都放到這個文件夾中
export SOFTWARE_HOME=/opt/software
export HADOOP_HOME=/opt/hadoop/hadoop-2.5.0-cdh5.3.6 export HADOOP_PID_DIR=$SOFTWARE_HOME/data/hadoop/pid export HADOOP_LOG_DIR=$SOFTWARE_HOME/logs/hadoop
export YARN_LOG_DIR=$SOFTWARE_HOME/logs/yarn export YARN_PID_DIR=$SOFTWARE_HOME/data/yarn
export HADOOP_MAPRED_LOG_DIR=$SOFTWARE_HOME/logs/mapred export HADOOP_MAPRED_PID_DIR=$SOFTWARE_HOME/data/mapred
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://sardoop</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.users</name> <value>hadoop</value> </property> <property> <name>fs.trash.interval</name> <value>4230</value> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/software/hadoop-2.5.0-cdh5.3.6/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>NameNode1,NameNode2,DataNode1,DataNode2,DataNode3</value> </property> </configuration>
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>sardoop</value> </property> <property> <name>dfs.ha.namenodes.sardoop</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.sardoop.nn1</name> <value>NameNode1:9820</value> </property> <property> <name>dfs.namenode.rpc-address.sardoop.nn2</name> <value>NameNode2:9820</value> </property> <property> <name>dfs.namenode.http-address.sardoop.nn1</name> <value>NameNode1:9870</value> </property> <property> <name>dfs.namenode.http-address.sardoop.nn2</name> <value>NameNode2:9870</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value> qjournal://DataNode1:8485;DataNode2:8485;DataNode3:8485/sardoop</value> </property> <property> <name>dfs.client.failover.proxy.provider.sardoop</name> <value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/software/hadoop-2.5.0-cdh5.3.6/tmp/journal</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property>
<!--這裏必需要加上前綴 file:// 不然會出現警告 should be specified as a URI in configuration files.並沒有法啓動DataNode--> <property> <name>dfs.namenode.name.dir</name> <value>file:///opt/hdfsdata/namenode,file:///home/hadoop/data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///opt/hdfsdata/datanode,file:///home/hadoop/data/hdfs/datanode</value> </property> </configuration>
DataNode1
DataNode2
DataNode3
<configuration> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>NameNode1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>NameNode2</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarnha</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>NameNode1,NameNode2,DataNode1,DataNode2,DataNode3</value> </property> <property> <name>yarn.web-proxy.address</name> <value>NameNode2:9180</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4</value> </property> </configuration>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>NameNode1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>NameNode1:19888</value> </property> </configuration>
bin/hdfs dfsadmin -safemode leave
檢查HDFS
bin/hdfs fsck / -files -blocks
NameNode2
#主要修改這三項
export HBASE_PID_DIR=${HOME}/data/hbase
export HBASE_MANAGES_ZK=false
export HBASE_LOG_DIR=${HOME}/logs/hbase
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <!--這裏應該是要使用nameservice的,可是用了以後IP解析不正確,只能暫時換成HostName;還要注意一點 這裏的必須使用當前處於Active的NameNode--> <!--HBase若是要作HA,這裏之後必需要改爲Nameservice,不然NameNode發生變化的時候還要手動修改Hbase配置--> <value>hdfs://NameNode1:9820/hbase</value> <!--<value>hdfs://sardoop/hbase</value>--> </property> <property> <name>hbase.zookeeper.quorum</name> <value>NameNode1,NameNode2,DataNode1,DataNode2,DataNode3</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/hadoop/data/zookeeper</value> </property> </configuration>
NameNode2
DataNode1
DataNode2
DataNode3
注意:有時候啓動HBase的時候會出現【org.apache.Hadoop.hbase.TableExistsException: hbase:namespace】
或者什麼【Znode already exists】相關的問題,通常都是由於以前的HBase信息已經在Zookeeper目錄下已經存在引發的。
解決方法:
有時候用java調用hbase時,會發生訪問hbase時雖然沒有報錯,可是一直沒有響應。
解決方式:
在程序調用的機器中的hosts文件,添加hbase所在節點的hosts信息
經過sqoop咱們能夠實現RDMS與hadoop生態產品 如hdfs、hive、hbase(單向)的數據導入導出。
在導入的過程當中 咱們能夠指定mapper的數量,甚至是壓縮的方式。目前有sqoop1和sqoop2兩個大版本,且差別較大。Sqoop1與Sqoop2的相關功能支持程度
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/software/hadoop-2.5.0-cdh5.3.6
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/software/hadoop-2.5.0-cdh5.3.6
#set the path to where bin/hbase is available
export HBASE_HOME=/opt/software/hbase-0.98.6-cdh5.3.6
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/software/hive-0.13.1-cdh5.3.6
#Set the path for where zookeper config dir is (若是有獨立的ZooKeeper集羣,才須要配置這個)
export ZOOCFGDIR=/opt/software/zookeeper-3.4.5-cdh5.3.6/
cp mysql-connector-java-5.1.40-bin.jar /opt/software/sqoop-1.4.5-cdh5.3.6/lib/
--複製到全部虛擬機的Hadoop目錄
cp mysql-connector-java-5.1.40-bin.jar /opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp mysql-connector-java-5.1.40-bin.jar hadoop@NameNode2:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp mysql-connector-java-5.1.40-bin.jar hadoop@DataNode1:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp mysql-connector-java-5.1.40-bin.jar hadoop@DataNode2:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp mysql-connector-java-5.1.40-bin.jar hadoop@DataNode3:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
cp sqljdbc4.jar /opt/software/sqoop-1.4.5-cdh5.3.6/lib/
cp sqljdbc4.jar /opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp sqljdbc4.jar hadoop@NameNode2:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp sqljdbc4.jar hadoop@DataNode1:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp sqljdbc4.jar hadoop@DataNode2:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
scp sqljdbc4.jar hadoop@DataNode3:/opt/software/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/
bin/sqoop help
** 查看sqlserver數據庫列表 bin/sqoop list-databases --connect 'jdbc:sqlserver://192.168.0.154:1433;username=sa;password=123'
** 查看數據庫表
bin/sqoop list-tables --connect 'jdbc:mysql://192.168.0.154:3306/Test' --username sa --password 123
** 直接導表數據到HBase
bin/sqoop import --connect 'jdbc:sqlserver://192.168.0.154:1433;username=sa;password=123;database=Test' --table Cities --split-by Id
--hbase-table sqoop_Cities --column-family c --hbase-create-table --hbase-row-key Id
**用sql語句導入(若是使用了query的形式,則必需要在sql後面加上 $CONDITIONS)
bin/sqoop import --connect 'jdbc:sqlserver://192.168.0.154:1433;username=sa;password=123;database=Test'\
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE id>10 AND $CONDITIONS' -m 1\
--split-by Id --hbase-table sqoop_Cities --column-family c --hbase-create-table --hbase-row-key Id
** 導入HDFS(由於這是經過Mapper處理的,全部這個目標路徑必須不存在)
./sqoop import --connect 'jdbc:sqlserver://192.168.0.154:1433;username=sa;password=123;database=Test' --table Cities --target-dir /input/Cities
** 從hdfs處處到mysql
bin/sqoop export --connect jdbc:mysql://NameNode1:3306/test --username root --password 123
--table loghour --m 2 --export-dir /tmp/loghour/ --input-fields-terminated-by '\t'
注:
create database hive;
grant all on hive*.* to hive@'%' identified by 'hive';
flush privileges;
#爲了操做方便,能夠選擇建立軟連接(非必須)
ln -s hive-0.13.1-cdh5.3.6 hive
hive-default.xml.template --> hive-site.xml
hive-log4j.properties.template --> hive-log4j.properties
hive-exec-log4j.properties.template --> hive-exec-log4j.properties
hive-env.sh.template --> hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/software/hadoop-2.5.0-cdh5.3.6/
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/software/hive-0.13.1-cdh5.3.6/conf/
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/opt/software/hive-0.13.1-cdh5.3.6/lib/
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://NameNode1:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> <description>password to use against metastore database</description> </property> <!--用於遠程鏈接--> <property> <name>hive.metastore.uris</name> <value>thrift://127.0.0.1:9083</value> </property> <!--這個參數用於啓動hiveserver2,默認值存在bug 後續版本已修復--> <property> <name>hive.server2.long.polling.timeout</name> <value>5000ms</value> <description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description> </property>
<!--hive須要用到的包--> <property> <name>hive.aux.jars.path</name> <value>file:///opt/software/hive/lib/hive-hbase-handler-0.13.1-cdh5.3.6.jar,file:///opt/software/hive/lib/zookeeper-3.4.5-cdh5.3.6.jar,file:///opt/software/hive/lib/hbase-client-0.98.6-cdh5.3.6.jar</value> </property>
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
bin/hive --service metastore & #後面的&是用來讓hive服務在後臺運行,沒有&的話 關掉服務所在的ssh鏈接時,服務也會stop
#啓動成功後,敲任意鍵回到shell命令輸入模式,而後輸入exit退出便可(經過命令而不是直接關閉客戶端)
# bin/hive --service hiveserver & #這個命令啓動的服務用於java的api調用。若是沒有這個需求則不須要執行該命令
其餘:
①hive命令的調用有3種方式:
./hive –f ./hive-script.sql
./hive -e 'select * from table'
②UDF的建立
package com.sarnath.jobtask.hive.udf; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF;; /** * 將時間字符串轉換成所在的5分鐘區間 * * @author Zhanglei 2016年12月2日 */ @Description(name = "get5minTimeZone", value = "FUNC<time> - cast string to 5min timezone?") public class get5minTimeZone extends UDF { public long evaluate(long time) { long minTotal = time/60;//總分鐘數 long timeValue = minTotal*60;// 時間值 /* * 精確到秒的時間 + 5秒 - 秒位數值與5的餘數 */ return timeValue + 5 * 60 - timeValue % (5 * 60); } }
#若是已存在 要先刪除 #drop function get5minTimeZone; create function get5minTimeZone as 'com.sarnath.jobtask.hive.udf.logTimeConvert' using jar 'hdfs:///user/hadoop/hiveUDF/jobtask.jar';
③相關command
create table logdetail(proxyip string,origin string) row format delimited fields terminated by ';';
create table logdetail(proxyip string ,origin string) partitioned by(logdate string);
show partitions tablename;
load data [local] inpath '/opt/software/log/log2016_12_15.log' overwrite into table logdetail;
CREATE TABLE log(time string,total int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f:total") TBLPROPERTIES ("hbase.table.name" = "hbaseloghour");
參考:
附:
① 批處理執行腳本(當前節點爲NameNode1)
從新格式化時,須要刪除數據的腳本
echo --remove hdfs data rm -rf /opt/hdfsdata/datanode/* rm -rf /opt/hdfsdata/namenode/* rm -rf /home/hadoop/data/hdfs/namenode/* rm -rf /home/hadoop/data/hdfs/datanode/* ssh NameNode2 'rm -rf /opt/hdfsdata/datanode/*' ssh NameNode2 'rm -rf /opt/hdfsdata/namenode/*' ssh NameNode2 'rm -rf /home/hadoop/data/hdfs/namenode/*' ssh NameNode2 'rm -rf /home/hadoop/data/hdfs/datanode/*' ssh DataNode1 'rm -rf /opt/hdfsdata/datanode/*' ssh DataNode1 'rm -rf /opt/hdfsdata/namenode/*' ssh DataNode1 'rm -rf /home/hadoop/data/hdfs/namenode/*' ssh DataNode1 'rm -rf /home/hadoop/data/hdfs/datanode/*' ssh DataNode2 'rm -rf /opt/hdfsdata/datanode/*' ssh DataNode2 'rm -rf /opt/hdfsdata/namenode/*' ssh DataNode2 'rm -rf /home/hadoop/data/hdfs/namenode/*' ssh DataNode2 'rm -rf /home/hadoop/data/hdfs/datanode/*' ssh DataNode3 'rm -rf /opt/hdfsdata/datanode/*' ssh DataNode3 'rm -rf /opt/hdfsdata/namenode/*' ssh DataNode3 'rm -rf /home/hadoop/data/hdfs/namenode/*' ssh DataNode3 'rm -rf /home/hadoop/data/hdfs/datanode/*' echo --remove zookeeper data rm -rf ~/data/zookeeper/version-2/* rm -rf ~/data/zookeeper/zookeeper_server.pid ssh NameNode2 'rm -rf ~/data/zookeeper/version-2/*' ssh NameNode2 'rm -rf ~/data/zookeeper/zookeeper_server.pid' ssh DataNode1 'rm -rf ~/data/zookeeper/version-2/*' ssh DataNode1 'rm -rf ~/data/zookeeper/zookeeper_server.pid' ssh DataNode2 'rm -rf ~/data/zookeeper/version-2/*' ssh DataNode2 'rm -rf ~/data/zookeeper/zookeeper_server.pid' ssh DataNode3 'rm -rf ~/data/zookeeper/version-2/*' ssh DataNode3 'rm -rf ~/data/zookeeper/zookeeper_server.pid' echo --remove hadoop logs rm -rf /opt/software/hadoop-2.5.0-cdh5.3.6/tmp rm -rf /home/hadoop/logs/hadoop ssh NameNode2 'rm -rf /opt/software/hadoop-2.5.0-cdh5.3.6/tmp' ssh NameNode2 'rm -rf /home/hadoop/logs/hadoop' ssh DataNode1 'rm -rf /opt/software/hadoop-2.5.0-cdh5.3.6/tmp' ssh DataNode1 'rm -rf /home/hadoop/logs/hadoop' ssh DataNode2 'rm -rf /opt/software/hadoop-2.5.0-cdh5.3.6/tmp' ssh DataNode2 'rm -rf /home/hadoop/logs/hadoop' ssh DataNode3 'rm -rf /opt/software/hadoop-2.5.0-cdh5.3.6/tmp' ssh DataNode3 'rm -rf /home/hadoop/logs/hadoop' echo --remove hbase logs rm -rf ~/logs/hbase/* ssh NameNode2 'rm -rf ~/logs/hbase/*' ssh DataNode1 'rm -rf ~/logs/hbase/*' ssh DataNode2 'rm -rf ~/logs/hbase/*' ssh DataNode3 'rm -rf ~/logs/hbase/*'
啓動過程的腳本
echo --start zookeeper /opt/software/zookeeper-3.4.5-cdh5.3.6/bin/zkServer.sh start ssh NameNode2 '/opt/software/zookeeper-3.4.5-cdh5.3.6/bin/zkServer.sh start' ssh DataNode1 '/opt/software/zookeeper-3.4.5-cdh5.3.6/bin/zkServer.sh start' ssh DataNode2 '/opt/software/zookeeper-3.4.5-cdh5.3.6/bin/zkServer.sh start' ssh DataNode3 '/opt/software/zookeeper-3.4.5-cdh5.3.6/bin/zkServer.sh start' echo --start journalnodes cluster ssh DataNode1 '/opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start journalnode' ssh DataNode2 '/opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start journalnode' ssh DataNode3 '/opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start journalnode' echo --format one namenode /opt/software/hadoop-2.5.0-cdh5.3.6/bin/hdfs namenode -format /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start namenode echo --format another namenode ssh NameNode2 '/opt/software/hadoop-2.5.0-cdh5.3.6/bin/hdfs namenode -bootstrapStandby' sleep 10 ssh NameNode2 '/opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start namenode' sleep 10 #echo --start all datanodes /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemons.sh start datanode echo --zookeeper init /opt/software/hadoop-2.5.0-cdh5.3.6/bin/hdfs zkfc -formatZK echo --start hdfs /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/start-dfs.sh echo --start yarn /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/start-yarn.sh ssh NameNode2 '/opt/software/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start resourcemanager' /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/mr-jobhistory-daemon.sh start historyserver /opt/software/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start proxyserver
③用MapReduce操做HBase
默認狀況下,在MapReduce中操做HBase的時候 會出現各類 java.lang.NoClassDefFoundError 問題,這是由於沒有提供相關jar包。解決方法:
HBase官網文檔中的路徑是錯誤的,把jar包放到lib下面是沒有用的
④hdfs相關命令
//刷新節點 $HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes //查看目錄大小 hadoop dfs -count -q <dir> //查看目錄下子目錄大小 hadoop dfs -du <dir>