不少內容以前的博客已經提過,這裏再也不贅述,詳細內容參照本系列前面的博客:http://www.javashuo.com/article/p-smnfzqth-be.htmlhtml
建立並修改配置文件conf/spark.confnode
cp conf/spark.conf.template conf/spark.conf
參考:https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md,設置屬性爲下列值git
1 # Spark home 2 hibench.spark.home /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/spark 3 4 # Spark master 5 # standalone mode: spark://xxx:7077 6 # YARN mode: yarn-client 7 hibench.spark.master yarn-client
執行腳本github
bin/workloads/micro/wordcount/prepare/prepare.sh
返回信息app
[root@node1 prepare]# ./prepare.sh patching args= Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf probe sleep jar: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.2-tests.jar start HadoopPrepareWordcount bench hdfs rm -r: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Input Deleted hdfs://node1:8020/HiBench/Wordcount/Input Submit MapReduce Job: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn jar /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=32000 -D mapreduce.randomtextwriter.bytespermap=4000 -D mapreduce.job.maps=8 -D mapreduce.job.reduces=8 hdfs://node1:8020/HiBench/Wordcount/Input The job took 12 seconds. finish HadoopPrepareWordcount bench
執行腳本dom
bin/workloads/micro/wordcount/spark/run.sh
返回信息ide
[root@node1 spark]# ./run.sh patching args= Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf probe sleep jar: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.2-tests.jar start ScalaSparkWordcount bench hdfs rm -r: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Output Deleted hdfs://node1:8020/HiBench/Wordcount/Output hdfs du -s: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -du -s hdfs://node1:8020/HiBench/Wordcount/Input Export env: SPARKBENCH_PROPERTIES_FILES=/home/cf/app/HiBench-master/report/wordcount/spark/conf/sparkbench/sparkbench.conf Export env: HADOOP_CONF_DIR=/etc/hadoop/conf.cloudera.yarn Submit Spark job: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/spark/bin/spark-submit --properties-file /home/cf/app/HiBench-master/report/wordcount/spark/conf/sparkbench/spark.conf --class com.intel.hibench.sparkbench.micro.ScalaWordCount --master yarn-client --num-executors 2 --executor-cores 4 --executor-memory 4g /home/cf/app/HiBench-master/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar hdfs://node1:8020/HiBench/Wordcount/Input hdfs://node1:8020/HiBench/Wordcount/Output 19/06/04 20:23:34 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. finish ScalaSparkWordcount bench
查看report/hibench.reportoop
Type Date Time Input_data_size Duration(s) Throughput(bytes/s) Throughput/node HadoopWordcount 2019-06-04 16:59:04 37055 20.226 1832 610 ScalaSparkWordcount 2019-06-04 20:23:34 36072 16.255 2219 739
\report\wordcount\spark下有多個文件:monitor.log是原始日誌,bench.log是scheduler.DAGScheduler和scheduler.TaskSetManager信息,monitor.html可視化了系統的性能信息,\conf\wordcount.conf、\conf\sparkbench\spark.conf和\conf\sparkbench\sparkbench.conf是本次任務的環境變量性能
monitor.html中包含了Memory usage heatmap等統計圖:測試
根據官方文檔 https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md ,還能夠修改 hibench.scale.profile 調整測試的數據規模,修改 hibench.default.map.parallelism 和 hibench.default.shuffle.parallelism 調整並行化,修改hibench.yarn.executor.num、hibench.yarn.executor.cores、spark.executor.memory和spark.driver.memory控制Spark executor 的數量、核數、內存和driver的內存。