Hadoopjava
安裝 Ubuntu環境
192.168.1.64 HNClient
192.168.1.65 HNNamenode
SUSE,Ubuntu的vi不能使用退格鍵刪除數據
刪除的時候,要按ESC,再按X才能刪除數據
插入數據,使用i
在當前行之下新開一行,使用oc++
在HNClient上操做
norman@HNClient:~$ sudo vi /etc/hostname
norman@HNClient:~$ HNClient
norman@HNClient:~$ sudo apt-get install openssh-serverweb
norman@HNClient:~$ sudo vi /etc/hosts
192.168.1.64 HNClient
192.168.1.65 HNNameapache
norman@HNClient:~$ ssh-keygen (下面直接默認回車)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:rj3kM5OeqxceqGP6DcofXa+hZFReLQmKqksqoYL+YH4 norman@HNClient
The key's randomart image is:
+---[RSA 2048]----+
| . |
| . . . o |
| . . . + . |
| . o . . |
| . ..S |
|.. o.o+. |
|+= o.++o+. |
|Xo.E+ +X+ |
|oo.=+ |
+----[SHA256]-----+ubuntu
norman@HNClient:~$ ssh localhost (ssh localhost,仍是須要密碼認證)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)bash
251 packages can be updated.
79 updates are security updates.oracle
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.app
Last login: Wed Oct 31 23:14:08 2018 from 192.168.1.65dom
norman@HNClient:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
norman@HNClient:~$ ssh localhost (ssh localhost,不須要密碼認證了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:18:02 2018 from 127.0.0.1
norman@HNClient:~$ ssh HNName (ssh HNName,仍是須要密碼認證)
norman@hnname's password:
norman@HNClient:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@HNName
norman@HNClient:~$ ssh HNName (ssh HNName,不須要密碼就能登錄HNName了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
254 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:23:21 2018 from 192.168.1.64
norman@HNName:~$
在HNName上操做
norman@HNName:~$ sudo vi /etc/hosts
192.168.1.64 HNClient
192.168.1.65 HNName
norman@HNName:~$ ssh-keygen (下面直接默認回車)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:YXrPGdhKYkPsAroDlIZJ4sYdbrpHyvaMQccMV3GJn9I norman@HNName
The key's randomart image is:
+---[RSA 2048]----+
|.. . oo.. |
|.+ oo.. |
|oO.= = + |
|+.B. + E + |
|oo =. B S o |
|+.= o = + o |
|o . . + |
|..* |
| . o |
+----[SHA256]-----+
norman@HNClient:~$ ssh localhost (ssh localhost,仍是須要密碼認證)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 22:55:29 2018 from 127.0.0.1
norman@HNName:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
norman@HNName:~$ ssh localhost (ssh localhost,不須要密碼認證了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
254 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:00:28 2018 from 127.0.0.1
norman@HNName:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@hnclient
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/norman/.ssh/id_rsa.pub"
The authenticity of host 'hnclient (192.168.1.64)' can't be established.
ECDSA key fingerprint is SHA256:w5dwBrXor00JfFtpGXc0G/+deJJwmAxKmjXE32InhgA.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
norman@hnclient's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'norman@hnclient'"
and check to make sure that only the key(s) you wanted were added.
norman@HNName:~$ ssh hnclient (ssh hnclient,不須要密碼就能登錄hnclient了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:05:13 2018 from 192.168.1.58
norman@HNClient:~$ exit
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
[sudo] password for norman:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'openjdk-7-jdk' has no installation candidate
是由於Ubuntu16.04的安裝源已經默認沒有openjdk7了,因此要本身手動添加倉庫,以下:
norman@HNName:~$ sudo add-apt-repository ppa:openjdk-r/ppa (添加oracle openjdk ppa source)( add-apt-repository ppa: xxx/ppa 這句話的意思是獲取最新的我的軟件包檔案源,將其添加至當前apt庫中,並自動導入公鑰。)
norman@HNName:~$ sudo apt-get update
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
norman@HNName:~$ java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-3)
OpenJDK Client VM (build 24.95-b01, mixed mode, sharing)
norman@HNName:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNName:~$ dir /usr/local/hadoop
bin hadoop-ant-1.2.0.jar hadoop-tools-1.2.0.jar NOTICE.txt
build.xml hadoop-client-1.2.0.jar ivy README.txt
c++ hadoop-core-1.2.0.jar ivy.xml sbin
CHANGES.txt hadoop-examples-1.2.0.jar lib share
conf hadoop-minicluster-1.2.0.jar libexec src
contrib hadoop-test-1.2.0.jar LICENSE.txt webapps
norman@HNName:~$ sudo vi $HOME/.bashrc (末尾添加如下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
norman@HNName:~$ exec bash
norman@HNName:~$ $PATH
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
( The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options. Empty by default. 設置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true
Installing Apache Hadoop (Single Node)
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HNName:10001</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>HNName:10002</value>
</property>
</configuration>
norman@HNName:~$ sudo mkdir /usr/local/hadoop/tmp
norman@HNName:~$ sudo chown norman /usr/local/hadoop/tmp
norman@HNName:~$ hadoop namenode -format (能看到如下說明成功)
18/11/01 19:07:36 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
norman@HNName:~$ hadoop-daemons.sh start namenode (出如下錯誤)
localhost: mkdir: cannot create directory ?usr/local/hadoop/libexec/../logs? Permission denied
localhost: chown: cannot access '/usr/local/hadoop/libexec/../logs': No such file or directory
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 137: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: head: cannot open '/usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out' for reading: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 147: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 148: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 root root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/
norman@HNName:~$ sudo chown norman /usr/local/hadoop
norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 norman root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/
norman@HNName:~$ hadoop-daemons.sh start namenode
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
norman@HNName:~$ start-all.sh
norman@HNName:~$ jps
23297 DataNode
23610 TaskTracker
23484 JobTracker
23739 Jps
23102 NameNode
23416 SecondaryNameNode
norman@HNName:~$ dir /usr/local/hadoop/bin
hadoop hadoop-daemon.sh rcc start-all.sh start-dfs.sh start-mapred.sh stop-balancer.sh stop-jobhistoryserver.sh task-controller
hadoop-config.sh hadoop-daemons.sh slaves.sh start-balancer.sh start-jobhistoryserver.sh stop-all.sh stop-dfs.sh stop-mapred.sh
http://192.168.1.65:50070/dfshealth.jsp
http://192.168.1.65:50030/jobtracker.jsp
http://192.168.1.65:50060/tasktracker.jsp
Managing HDFS
http://www.gutenberg.org/files/2600/2600-0.txt (下載文本文件)
複製網頁內容到war_and_peace.txt
https://www.ncdc.noaa.gov/orders/qclcd/ (下載任意數據)
QCLCD201701.zip,QCLCD201702.zip,而後解壓出201701hourly.txt, 201702hourly.txt
在HNClient上操做
將數據 war_and_peace.txt 放到 /home/norman/data/book
將數據201701hourly.txt,201702hourly.txt放到 /home/norman/data/weather
norman@HNClient:~$ sudo mkdir -p /home/norman/data/book
norman@HNClient:~$ sudo mkdir -p /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/book
norman@HNClient:~$ sudo add-apt-repository ppa:openjdk-r/ppa
norman@HNClient:~$ sudo apt-get update
norman@HNClient:~$ sudo apt-get install openjdk-7-jdk
norman@HNClient:~$ java -version
norman@HNClient:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNClient:~$ sudo vi $HOME/.bashrc (末尾添加如下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
norman@HNClient:~$ exec bash
norman@HNClient:~$ $PATH
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
(The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options. Empty by default. 設置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HNName:10001</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>HNName:10002</value>
</property>
</configuration>
norman@HNClient:~$ hadoop fs -mkdir test
norman@HNClient:~$ hadoop fs -ls
Found 1 items
drwxr-xr-x - norman supergroup 0 2018-11-02 01:17 /user/norman/test
norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/small
norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/big
網頁打開http://192.168.1.65:50070
http://192.168.1.65:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/
norman@HNClient:~$ hadoop fs -rmr test (測試刪除)
Deleted hdfs://HNName:10001/user/norman/test
norman@HNClient:~$ hadoop fs -moveFromLocal /home/norman/data/book/war_and_peace.txt hdfs://hnname:10001/data/small/war_and_peace.txt
能夠看到如下數據
norman@HNClient:~$ hadoop fs -copyToLocal hdfs://hnname:10001/data/small/war_and_peace.txt /home/norman/data/book/war_and_peace.bak.txt (測試複製到本地)
norman@HNClient:~$ hadoop fs -put /home/norman/data/weather hdfs://hnname:10001/data/big
能夠看到如下數據
norman@HNClient:~$ hadoop dfsadmin -report
Configured Capacity: 19033165824 (17.73 GB)
Present Capacity: 13114503168 (12.21 GB)
DFS Remaining: 12005150720 (11.18 GB)
DFS Used: 1109352448 (1.03 GB)
DFS Used%: 8.46%
Under replicated blocks: 19
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.1.65:50010
Decommission Status : Normal
Configured Capacity: 19033165824 (17.73 GB)
DFS Used: 1109352448 (1.03 GB)
Non DFS Used: 5918662656 (5.51 GB)
DFS Remaining: 12005150720(11.18 GB)
DFS Used%: 5.83%
DFS Remaining%: 63.07%
Last contact: Fri Nov 02 01:49:43 GMT-08:00 2018
norman@HNClient:~$ hadoop dfsadmin -safemode enter (upgrade的時候,須要用到safemode)
Safe mode is ON
norman@HNClient:~$ hadoop dfsadmin -safemode leave
Safe mode is OFF
在HNName上操做
norman@HNName:~$ hadoop fsck -blocks
Status: HEALTHY
Total size: 1100586452 B
Total dirs: 13
Total files: 4
Total blocks (validated): 19 (avg. block size 57925602 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 38 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 01:54:46 GMT-08:00 2018 in 1049 milliseconds
The filesystem under path '/' is HEALTHY
norman@HNName:~$ hadoop fsck /data/big
Status: HEALTHY
Total size: 1097339705 B
Total dirs: 2
Total files: 2
Total blocks (validated): 17 (avg. block size 64549394 B)
Minimally replicated blocks: 17 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 17 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 34 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 19:33:55 GMT-08:00 2018 in 14 milliseconds
The filesystem under path '/data/big' is HEALTHY