安裝要注意的地方: 1.防火牆 2.ssh 3.selinux 4.JAVA版本(大於1.7,非openjdk)
core-site.xmlhtml
<property> <name>io.file.bufer.size</name> <value>131072</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property>
hdfs-site.xmlnode
<property> <name>dfs.namenode.name.dir</name> <value>/hadoop/namenode</value> </property> <property> <name>dfs.blocksize</name> <value>268435456</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property>
yarn-site.xmllinux
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aus-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property>
其餘要注意的地方:slvaes文件 *env.sh的JAVA_HOMEweb
解壓修改配置,注意,最好不使用自身自帶zookeeper
<property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/opt/zookeeper/var/data</value> </property>
其餘:JAVA_HOME regionserversapache
####編譯 官網下載2.3.1版本,提示app
解壓,修改ivy/ivy.xml 全局修改版本ssh
%s/2.5.2/2.7.2/g
取消註釋webapp
:<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />
執行 ant runtimeoop
解壓便可 4.6版本
記錄爬坑: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds> 直接用 solrurl構建索引會出現奇怪的問題 錯誤日誌裏輸入 NullpointerException 解決方案: crawl <seedDir> <crawlID> <numberOfRounds> 忽略 solrUlr 進行第二步,能夠解決 nutch solrindex [<solrUrl>] <crawlID> -reindexurl
No IndexWriters activated - check your configuration 解決,修改nuth-site.xml,增長plugin.includes爲下述。 protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)