目錄html
第一章hadoop集羣搭建2java
各臺機器集羣配置情況分佈設置2node
1、 關於免密碼登陸操做2mysql
3、hadoop集羣搭建6github
3、配置高可用resourceManager+yarn16web
5、啓動hadoop25shell
1、沒法關閉 NodeManager和ResourceManager75
關閉重啓集羣時出現沒法關閉 NodeManager和ResourceManager75
集羣配置情況
|
NameNode |
DFSZKFailoverController |
Zookeeper |
DataNode |
JournalNode |
ResourceManager |
Hbase |
Spark |
Hive |
Mysql |
sp-01 |
|
|
|
√ |
√ |
|
HMaster |
√ |
√ |
|
sp-02 |
|
|
|
√ |
√ |
|
HRegionServer |
√ |
|
√ |
sp-03 |
|
|
|
√ |
√ |
√ |
HRegionServer |
√ |
|
|
sp-04 |
|
|
√ |
√ |
√ |
√ |
|
√ |
|
|
sp-05 |
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
sp-06 |
√ |
√ |
√ |
|
|
|
|
|
|
|
機器名稱 |
原有IP |
對應IP地址 |
sp-01 |
192.168.101.121 |
192.168.10.111 |
sp-02 |
192.168.101.122 |
192.168.10.112 |
sp-03 |
192.168.101.123 |
192.168.10.113 |
sp-04 |
192.168.101.124 |
192.168.10.114 |
sp-05 |
192.168.101.125 |
192.168.10.115 |
sp-06 |
192.168.101.126 |
192.168.10.116 |
主要操做步驟:
主節點執行命令生成密鑰:ssh-keygen -t rsa -P ""
2、進入文件夾cd .ssh (進入文件夾後能夠執行ls -a 查看文件)
3、將生成的公鑰id_rsa.pub 內容追加到authorized_keys(執行命令:cat id_rsa.pub >> authorized_keys)
從節點配置
一、以一樣的方式生成祕鑰(ssh-keygen -t rsa -P "" ),而後sp-02、sp-03、sp-04、sp-05和sp-06將生成的id_rsa.pub公鑰追加到sp-01的authorized_keys中)
執行命令scp id_rsa.pub sp-01:/home/hadoop/.ssh/id_rsa.pub.s1
(ps:id_rsa.pub.s1可視狀況定義sn;如下相同 只以s1爲例)
2、進入m1執行命令:cat id_rsa.pub.s1 >> authorized_keys
三、最後將生成的包含三個節點的祕鑰的authorized_keys 複製到sp-02、sp-03、sp-04、sp-05和sp-06的.ssh目錄下
scp authorized_keys sp-02:/home/hadoop/.ssh/
測試:ssh 主機名 例:ssh sp-02
作完以上步驟時可能會沒法互相免密碼登陸
解決方案
1、chmod 600 /home/hadoop/.ssh/authorized_keys
2、chmod 700 /home/hadoop/.ssh/
三、service sshd restart(ps:注意用戶權限問題)
一、下載Hadoop版本:http://mirror.bit.edu.cn/apache/hadoop/common/
下載版本hadoop-2.6.0-src.tar.gz
二、將hadoop-2.6.0-src.tar.gz文件上傳至集羣的/home/hadoop/hadoopInstallFile 文件夾下
並進行解壓到路徑/home/hadoop/hadoopInstallPath/下
tar -zxvf /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz -C /home/hadoop/hadoopInstallPath/
三、進入hadoop開始配置文件
cd hadoop-2.6.0/etc/hadoop/
四、開始配置文件 vim hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
五、配置文件vim core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://sp-06:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop</value>
</property>
六、配置文件vim hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>sp-05:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>sp-05:50091</value>
</property>
7、建立文件vim masters
8、修改配置文件 vim slaves
sp-01
sp-02
sp-03
sp-04
sp-05
九、配置hadoop環境變量vim .bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
10、當即加載重啓 配置的文件
source .bash_profile
11、輸入hdfs命令測試
十二、將hadoop-2.6.0.tar.gz文件複製到集羣的每一臺機器上
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-02:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-03:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-04:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-05:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-06:/home/hadoop/hadoopInstallFile/
13、將sp-02、sp-03、sp-04、sp-05、sp-06進行解壓tar -zxvf /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz -C /home/hadoop/hadoopInstallPath/
14、將sp-02、sp-03、sp-04、sp-05、sp-06機器/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
路徑中的配置文件與sp-01中/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
配置文件統一
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-02:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-03:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-04:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-05:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-06:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
1五、統一各個機器中hadoop環境變量配置
vim .bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source .bash_profile
1六、格式化配置NameNode
hdfs namenode -forma
17、在主節點sp-06啓動集羣
start-dfs.sh
1八、在界面訪問集羣
訪問NameNode (IP+端口號)
http://sp-06:50070/
訪問SecondNameNode (IP+端口號)
http://sp-05:50090/
1九、中止hdfs
stop-dfs.sh
3.1修改配置文件vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://tztd</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>sp-06:2181,sp-05:2181,sp-04:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopTmp</value>
</property>
</configuration>
3.2修改配置文件vim hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice爲tztd,須要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>tztd</value>
</property>
<!-- tztd下面有兩個NameNode,分別是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.tztd</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.tztd.nn1</name>
<value>sp-06:8020</value>
</property>
<!-- nn2的RPC通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.tztd.nn2</name>
<value>sp-05:8020</value>
</property>
<!-- nn1的http通訊地址 -->
<property>
<name>dfs.namenode.http-address.tztd.nn1</name>
<value>sp-06:50070</value>
</property>
<!-- nn2的http通訊地址 -->
<property>
<name>dfs.namenode.http-address.tztd.nn2</name>
<value>sp-05:50070</value>
</property>
<!-- 指定NameNode的元數據在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://sp-05:8485;sp-03:8485;sp-04:8485/tztd</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.tztd</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔離機制時須要ssh免密碼登錄 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoopTmp/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
3.2修改文件vim mapred-site.xml.template
mv mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置 MapReduce JobHistory Server 地址 ,默認端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默認端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
3.3修改配置文件yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--開啓resourcemanagerHA,默認爲false--> <property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<!--配置resourcemanager--> <property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>sp-03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>sp-04</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>sp-06:2181,sp-05:2181,sp-04:2181</value>
</property>
<!--開啓日誌整理-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--配置日誌整理位置-->
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/hadoopTmp/logs</value>
</property>
</configuration>
Ps:若沒有日誌這兩個配置yarn的日誌會產生到/tmp目錄下,在這次安裝因權限限制不是toot用戶因此寫不到根目錄下面的文件夾,所以須要指定位置
查看工做日誌的命令爲yarn logs -applicationId (jobId)例如:
yarn logs -applicationId application_1477992062510_0015
3.4將更新事後的文件同步到集羣的每一臺機器上的/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/目錄下
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-05:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-04:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-03:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-02:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-01:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
解壓文件 tar zxvf /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz -C /home/hadoop/hadoopInstallPath/
4.1進入cd /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
複製文檔cp zoo_sample.cfg zoo.cfg
編輯文檔 vim zoo.cfg
dataDir=/home/hadoop/hadoopTmp/zookeeper
server.1=sp-06:2888:3888
server.2=sp-05:2888:3888
server.3=sp-04:2888:3888
4.2分別在sp-06、sp-05、sp-04在dataDir下建立myid文件對應寫入1、2、3
4.3 將sp-06機器下/home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz 別發送到sp-05與sp-04 /home/hadoop/hadoopInstallFile/路徑下解壓安裝
scp /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz sp-04:/home/hadoop/hadoopInstallFile/
tar zxvf /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz -C /home/hadoop/hadoopInstallPath/
4.4將sp-06機器下/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/配置文件 別發送到sp-05與sp-04 /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/下
scp /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/* sp-05:/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
scp /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/* sp-04:/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
4.5啓動zookeeper
4.5.1刪除sp-06、sp-05 節點/home/hadoop/hadoopTmp/dfs中的 data 、name、namesecondary文件夾
4.5.2刪除sp-06、sp-05 節點/home/hadoop/hadoopInstallPath/hadoop-2.6.0/logs/下的全部文件
4.5.3刪除sp-01、sp-02、sp-03、sp-04中/home/hadoop/hadoopTmp/dfs/data/下的全部文件
4.5.4刪除sp-01、sp-02、sp-03、sp-04中/home/hadoop/hadoopInstallPath/hadoop-2.6.0/logs/下的全部文件
4.6啓動三臺sp-06、sp-05、sp-04 的zookeeper
cd /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/bin
./zkServer.sh start
Ps:關閉命令爲./zkServer.sh stop
5.1啓動三臺sp-03、sp-04、sp-05 的journalnode
cd /home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin/
./hadoop-daemon.sh start journalnode
5.2在其中一臺主節點(sp-06或sp-05)上格式化:hdfs namenode -format
5.3把剛剛格式化以後的元數據拷貝到另一個namenode上
5.3.1啓動剛剛格式化的./hadoop-daemon.sh start namenode
5.3.2在沒有格式化的namenode上執行:hdfs namenode –bootstrapStandby
5.3.3把格式化NameNode上的/home/hadoop/hadoopTmp/dfs/name/current文件複製到另外一臺機器上的/home/hadoop/hadoopTmp/dfs/中
scp -r /home/hadoop/hadoopTmp/dfs/name/current/* sp-05:/home/hadoop/hadoopTmp/dfs/
5.3.4啓動第二個./hadoop-daemon.sh start namenode
5.4在其中一個namenode上初始化zkfc:hdfs zkfc -formatZK
5.5中止上面節點:stop-dfs.sh
5.6全面啓動:start-dfs.sh
5.7可能會出現的DataNode沒有同步啓動的緣由
緣由:兩次或者兩次以上格式化NameNode
須要查看DataNode /home/hadoop/hadoopTmp/dfs/data/current/VERSION中
clusterID是否與主節點相同
主節點路徑/home/hadoop/hadoopTmp/dfs/name/current/VERSION
5.8分別在sp-04、sp-03啓動resourcemanage
/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./yarn-daemon.sh start resourcemanage
訪問sp-04:8088/
sp-03:8088/
5.9分別在每臺節點啓動Nodemanager
/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./hadoop-daemon.sh start Nodemanager
搭建完成
執行代碼:
import java.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount
{
public static class WordCountMapper
extends Mapper<Object,Text,Text,IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key,Text value,Context context)
throws IOException, InterruptedException {
String[] words = value.toString().split(" ");//words數組用來接收截取文件中的各個字段內容
for (String str: words)
{
word.set(str);
context.write(word,one);
}
}
}
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int total=0;
for (IntWritable val : values){
total++;
}
context.write(key, new IntWritable(total));
}
}
public static void main (String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
執行命令:
hadoop jar WC.jar /a.txt /output
執行過程:
執行結果:
測試完成
7.1下載Hadoop2.6.0-eclipse-plugin
下載地址https://github.com/winghc/hadoop2x-eclipse-plugin
7.2找一個目錄,解壓hadoop-2.6.0.tar.gz,本人是用D:\hadoop\hadoop-2.6.0(如下用$HADOOP_HOME表示)
7.3添加環境變量
HADOOP_HOME=D:\hadoop\hadoop-2.6.0
HADOOP_PREFIX=D:\hadoop\hadoop-2.6.0
HADOOP_BIN_PATH=%HADOOP_HOME%\bin
在path路徑中加入%HADOOP_HOME%\bin
7.4 下載windows64位平臺的hadoop2.6插件包(hadoop.dll,winutils.exe)
http://files.cnblogs.com/files/yjmyzz/hadoop2.6%28x64%29V0.2.zip
將winutils.exe複製到$HADOOP_HOME\bin目錄,將hadoop.dll複製到%windir%\system32目錄 (主要是防止插件報各類莫名錯誤,好比空對象引用啥的)
7.5配置hadoop-eclipse-plugin插件
7.5.1將下載好的hadoop-eclipse-plugin-2.6.0放入eclipse中plugin文件夾下(hadoop-eclipse-plugin也可換其餘版本)
啓動eclipse,windows->show view->other
7.5.2window->preferences->hadoop map/reduce 指定電腦上的hadoop根目錄(即:$HADOOP_HOME)
7.5.3而後在Map/Reduce Locations 面板中,點擊小象圖標
添加一個Location
能夠看到
能夠在文件上右擊,選擇刪除試下,一般第一次是不成功的,會提示一堆東西,大意是權限不足之類,緣由是當前的win7登陸用戶不是虛擬機裏hadoop的運行用戶,解決辦法有不少,好比你能夠在win7上新建一個hadoop的管理員用戶,而後切換成hadoop登陸win7,再使用eclipse開發,可是這樣太煩,最簡單的辦法:
hdfs-site.xml裏添加
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
而後在集羣裏,運行hadoop dfsadmin -safemode leave
(僅限於測試環境涉及到的用戶權限問題)
7.6建立WoldCount示例項目
7.6.1WoldCount代碼示例
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCJob {
public static void main(String[] args) {
//設置提交用戶
System.setProperty("HADOOP_USER_NAME", "root");
//讀取 class path下邊配置文件
Configuration conf = new Configuration();
//1、本地運行方式
conf.set("fs.defaultFS", "hdfs://sp-06:8020");
conf.set("yarn.resourcemanager.hostname", "sp-04");
try {
//這裏建立job對象不要經過new的方式,new的方式是廢棄的,這裏使用getInstance方式
Job job = Job.getInstance(conf);
//指定程序入口
job.setJarByClass(WCJob.class);
//設置job名稱,在resourcemanager界面能夠查看這個job的運行狀況
job.setJobName("wc job");
//指定map程序
job.setMapperClass(WCMapper.class);
//指定reduce程序
job.setReducerClass(WCReducer.class);
//指定map程序輸出的key、value類型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//設置輸入文件路徑,第二個參數使用的是路徑未具體到文件,則會讀取該路徑下的全部文件,注意這裏路徑具體到hdfs上的某個文件所以會讀取對應的文件
FileInputFormat.addInputPath(job, new Path("/user/wc/input/a.txt"));
Path output = new Path("/user/wc/output");
FileSystem fs = FileSystem.get(conf);
if (fs.exists(output)) {
fs.delete(output, true);
}
//文件輸出路徑
FileOutputFormat.setOutputPath(job, output);
//提交job
Boolean flag = job.waitForCompletion(true);
if (flag) {
System.out.println("job success~~");
}
} catch (Exception e) {
e.printStackTrace();
};
}
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
//map方法是循環調用的,是框架自動調用的,每次讀取一行數據進行處理
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] strs = StringUtils.split(line, ' ');
for(String s : strs) {
context.write(new Text(s), new IntWritable(1));
}
}
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text text, Iterable<IntWritable> iterable,
Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable i : iterable) {
sum += i.get();
}
context.write(text, new IntWritable(sum));
}
}
7.6.2上傳文件
hello hadoop
hello word
hello nihao
hello scala
hello spark
必須設置
運行主程序Run on Hadoop
注意:輸出路徑不能存在
結果
7.6、IntelliJ IDEA 15.0.5鏈接hadoop測試
用以上代碼可直接運行程序,可是沒有相似hadoop-eclipse-plugin這樣的插件,只能在每次測試完後上集羣中刪除輸出目錄文件
至此hadoop搭建測試鏈接完成
Hbase下載地址http://apache.fayea.com/hbase/hbase-1.0.3/
hbase-1.0.3-bin.tar.gz
2、上傳jar包到sp-01機器 /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz路徑下
解壓安裝到/home/hadoop/hadoopInstallPath/
tar -zxvf /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz -C /home/hadoop/hadoopInstallPath/
3.1進入配置文件地址
cd /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
配置文件 vim hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
配置vim hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://tztd/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>sp-06,sp-05,sp-04</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hadoopTmp/zookeeper</value>
</property>
</configuration>
配置vim regionservers
vim backup_master
複製hdfs-site.xml文件
cp -a /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/hdfs-site.xml /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
3.2將文件複製到sp-02、sp-03機器中
scp /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz sp-02:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz sp-03:/home/hadoop/hadoopInstallFile/
分別解壓
tar zxvf /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz -C /home/hadoop/hadoopInstallPath/
3.3將sp-01文件/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/下配置好的文件傳送到
sp-02、sp-03/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/下
scp /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/* sp-03:/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
scp /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/* sp-02:/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
3.4統一環境變量sp-01、sp-02、sp-03
vim ~/.bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HBASE_HOME=/home/hadoop/hadoopInstallPath/hbase-1.0.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
進入cd /home/hadoop/hadoopInstallPath/hbase-1.0.3
./ bin/start-hbase.sh
./bin/hbase shell
下載地址:http://spark.apache.org/downloads.html
下載 /spark-1.6.0-bin-hadoop2.6.tgz
2.1解壓文件tar zxvf /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz -C /home/hadoop/hadoopInstallPath/
2.2進入安裝目錄cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
2.3複製文件並改名
cp spark-env.sh.template spark-env.sh
配置文件 vim spark-env.sh
主要配置
Java環境變量
Spark的master主機地址(spark自帶的集羣須要配置一個master作資源調度)
Spark的master主機端口(默認即爲7077)
Spark的worker的cpu核數
Spark的worker實例數(一個worker就至關於一個進程,設置一個機器實例數)
Spark的worker的內存使用大小(設置一臺機器使用內存)
其他五個配置是爲了將spark安裝到yarn上,此後任務調度就無需開啓spark直接運行便可
(使用yarn模式時不須要去start-all.sh)
Ps:
此配置文件沒有問題可是根據各臺機器可用內存多少應該實際配置,否則會形成程序沒錯可是就是出不來程序結果尤爲在yarn-client模式時
查看內存命令: cat /proc/meminfo | grep MemTotal
切記設置spark運行內存不要過大
2.4複製並改名文件
cp slaves.template slaves
vim slaves
2.5將要配置spark的節點拷貝文件
scp /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-01:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-02:/home/hadoop/hadoopInstallFile/
/home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-04:/home/hadoop/hadoopInstallFile/
每臺機器分別解壓文件
tar zxvf /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz -C /home/hadoop/hadoopInstallPath/
2.6將要配置spark的節點conf文件統一
/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-01:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-02:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-04:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
必須在主節點執行
/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin
./start-all.sh
注:(1)、啓動命令要在master節點上啓動,不然可能會致使主節點服務啓動不起來(在本人機器上進行過測試,只能啓動worker服務)
(2)、該啓動命令與hadoop啓動命令相同,所以不要配置環境變量,避免命令衝突,只需進入sbin目錄下啓動便可
客戶端訪問:
sp-04:8080
Spark集羣搭建啓動完畢
關閉命令cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin
stop-all.sh
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/bin/
./spark-shell
Yarn 測試cluster
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 10
控制檯無數據輸出
在界面 http://sp-04:8088/cluster resourcesmanager查看
Yarn 測試client
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 10
本機測試完成
Spark高可用配置
高可用模式主要用於standalone模式但實際生產環境都是在yarn上運行 原本能夠不用配置當時爲了防止之後應用配置(ps能夠不開啓使用)
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf
vim spark-env.sh
更改
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=sp-04:2181,sp-05:2181,sp-06:2181"
將每臺spark機器同步
如今已經將sp-04臺機器做爲了主節點
先想要多加一臺節點,例如我如今想將sp-03做爲另外一個主節點如今進行更改
只更改這一臺便可
啓動服務
(1)、啓動zookeeper集羣
./zkserver.sh start
(2)、啓動spark集羣(該命令沒法啓動master備用節點,需單獨啓動sp-04)
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin/
./start-all.sh
(3)、啓動備用master節點sp-03
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin/
./start-master.sh
頁面查看:
測試spark高可用
高可用搭建完成
停掉集羣(關閉standalone模式)
[hadoop@sp-03 sbin]$ ./stop-all.sh
(hive選用版本爲:apache-hive-1.1.1-bin.tar.gz)
Hive安裝在第一個節點上(IP:192.168.101.121),在第二個節點上安裝mysql(IP:192.168.101.199)
將下載好的安裝包放到 /home/hadoop/hadoopInstallFile目錄下
2.1將安裝包解壓到/home/hadoop/hadoopInstallPath目錄下:
[hadoop@sp-01 hadoopInstallFile]$ tar -zxvf apache-hive-1.1.1-bin.tar.gz -C /home/hadoop/hadoopInstallPath
2.2進入到安裝目錄下的conf目錄:
[hadoop@sp-01 hadoopInstallFile]$ cd apache-hive-1.1.1-bin/conf
//配置hive-site.xml
[hadoop@sp-01 conf]$ cp hive-default.xml.template hive-site.xml
//配置 hive-conf.sh 此文件原來沒有,只須要建立便可
[hadoop@sp-01 conf]$ vim hive-conf.sh
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HIVE_CONF_DIR=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/conf
//配置hive-env.sh
[hadoop@sp-01 conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@sp-01 conf]$ vim hive-env.sh
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HADOOP_USER_CLASSPATH_FIRST=true
export HIVE_CONF_DIR=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/conf
2.3進入到安裝目錄下的bin目錄:
[hadoop@sp-01 bin]$ vim hive-config.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
export HIVE_HOME=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
*******************************************************************************
啓動hive時報錯:
把hive-site.xml中的全部${system:java.io.tmpdir},替換爲/home/hadoop/tmp/ 後又報錯
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to create log directory /home/hive/tmp/${system:user.name}
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
*******************************************************************************
把hive-site.xml中全部帶有${system:java.io.tmpdir}的value值所有替換爲/home/hadoop/tmp/【經檢查hive-site.xml文件的value值中包含${system:java.io.tmpdir}的值有兩部分因此要進行兩次替換確保所有替換完】
[hadoop@sp-01 conf]$ vim hive-site.xml
替換命令::%s/${system:java.io.tmpdir}\/${hive.session.id}_resources/\/home\/hadoop\/hadoopInstallPath\/tmp/g #此處替換結果在57行能夠看到
:%s/${system:java.io.tmpdir}\/${system:user.name}/\/home\/hadoop\/hadoopInstallPath\/tmp/g #此處替換結果在2721行能夠看到
所有替換完畢後重啓hive
[hadoop@sp-01 bin]$ ./hive
Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-1.1.1-bin/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-1.1.1-bin/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
啓動成功……
備註;hive的數據保存在HDFS上(/user/hive/warehouse),此目錄不能刪除
2.4將hive的命令所在目錄添加到系統參數PATH中
修改profile文件:
[hadoop@sp-01 ~]$ vim .bash_profile
export HIVE_HOME=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
以後每次啓動hive,只要輸入「hive」便可。
緣由:是由於採用了derby這個內嵌數據庫做爲數據庫,它不支持多用戶同時訪問,解決辦法就是從新換MySQL做爲元數據庫,
問題解決:
在第二個節點上安裝mysql,用yum的方式來安裝MySQL,要在root用戶下操做
[root@sp-02 hadoopInstallPath]# yum -y install mysql*
執行完上述命令後,啓動MySQL服務報錯
[root@sp-02 hadoopInstallPath]# service mysqld start (報錯以下)
緣由:是沒有執行mysql的數據庫安裝文件
查看啓動日誌:cat /var/log/mysqld.log 以下
解決方式:
[root@sp-02 hadoopInstallPath]# mysql_install_db 執行完這條命令在啓動MySQL
登陸mysql
[root@sp-02 ~]# mysql
mysql> create user 'hive' identified by 'mysql';
mysql> grant all on *.* to hive@'sp-02' identified by 'unioncast.cn';
mysql> flush privileges;
mysql> quit
Bye
修改配置文件:vim hive-site.xml
[hadoop@sp-01 ~]$cd /home/hadoop/apache-hive-1.1.1-bin/conf
[hadoop@sp-01 conf]$ vim hive-site.xml
修改配置文件中的如下內容
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://sp-02/hive?characterEncoding=UTF-8</value> #此處用的是安裝MySQL那個節點的主機名(ip地址也能夠),後面鏈接的是在MySQL建立的那個hive數據庫
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value> #鏈接的用戶名是hive用戶
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql</value> #hive用戶的密碼是mysql
<description>password to use against metastore database</description>
</property>
進mysql中建立hive數據庫
mysql> create database hive;
mysql> quit;
Bye
登陸hive數據倉庫建立個表測試在mysql中是否有映射
hive> create table aa(id int);
mysql> show databases;
mysql> use hive;
mysql> show tables;
mysql> select * from TBLS;
Empty set (0.00 sec)
mysql> select * from TBLS;
鏈接成功……
小記:作的時候把mysql-connector-java-5.1.5-bin.jar拷貝到/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/lib(此包能夠去網上下載上)
其緣由在於
yarn-daemon.sh 腳本產生的配置文件在/tmp下 而當前用戶沒有權限在/tmp目錄下產生pid文件,即便產生也有很大的可能會被按期清理掉因此須要轉移目錄地址
主要修改文件是/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin下
vim yarn-daemon.sh hadoop-daemon.sh兩個文件(切記應該先關閉在修改當前關閉使用kill-9 任務編號關閉或者kill 任務編號 任務編號使用jps命令查詢)
vim yarn-daemon.sh
YARN_PID_DIR=/home/hadoop/hadoopTmp/Tmp
在/home/hadoop/hadoopTmp/目錄下建立Tmp文件夾
mkdir Tmp
更改後
vim hadoop-daemon.sh
YARN_PID_DIR=/home/hadoop/hadoopTmp/Tmp
更改後
與集羣中每一臺機器同步
從新啓動hadoop集羣
start-all.sh
啓動yarn
cd /home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./start-yarn.sh
指定文件下回生成這幾個pid文件
系統出現兩個standby而沒有active節點時查看DFSZKFailoverController是否啓動。若是沒有啓動則須要單獨啓動進入/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./yarn-daemon.sh start zkfc
想要DFSZKFailoverController自動須要在其中一臺NameNode上從新格式化zookeeper hdfs zkfc -formatZK
util.NativeCodeLoader: Unable to load native-hadoop library for your platform
文章地址
http://www.secdoctor.com/html/yyjs/31101.html
2.6.0替換包下載地址http://akamai.bintray.com/73/731a49d122fd009679c50222a9a5a4926d1a26b6?__gda__=exp=1477999546~hmac=c3edc3b5d46ee1b544147165797e33a37df0c6034060f646c374e29ec78cda8d&response-content-disposition=attachment%3Bfilename%3D%22hadoop-native-64-2.6.0.tar%22&response-content-type=application%2Foctet-stream&requestInfo=U2FsdGVkX18Z4F9qc-WicETGP2g0HHM8YwPr_ZaSw7nIT1_inzlCCD3rV4WS71l5CcwKOa9r6oe1-mVB08RN0TeQxPBlIKMP7jsZd0DbLsj3S4dNFIsREUEmR6lcDKaNai_TEy8ToFbAR3GSenbD1A
其根本緣由就是版本不兼容問題須要替換
本文章有不少不足之處,但願你們多多指正。歡迎你們轉載,請註明出處!碼農打字不易,敬請諒解,謝謝! http://www.cnblogs.com/baierfa/p/6689022.html