HiBench成長筆記——(3) HiBench測試Spark

不少內容以前的博客已經提過,這裏再也不贅述,詳細內容參照本系列前面的博客:http://www.javashuo.com/article/p-smnfzqth-be.htmlhtml

建立並修改配置文件conf/spark.confnode

cp conf/spark.conf.template conf/spark.conf

參考:https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md,設置屬性爲下列值git

  1 # Spark home
  2 hibench.spark.home      /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/spark
  3 
  4 # Spark master
  5 #   standalone mode: spark://xxx:7077
  6 #   YARN mode: yarn-client
  7 hibench.spark.master    yarn-client

執行腳本github

 bin/workloads/micro/wordcount/prepare/prepare.sh

返回信息app

[root@node1 prepare]# ./prepare.sh
patching args=
Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf
Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf
Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf
Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf
probe sleep jar: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.2-tests.jar
start HadoopPrepareWordcount bench
hdfs rm -r: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Input
Deleted hdfs://node1:8020/HiBench/Wordcount/Input
Submit MapReduce Job: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn jar /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=32000 -D mapreduce.randomtextwriter.bytespermap=4000 -D mapreduce.job.maps=8 -D mapreduce.job.reduces=8 hdfs://node1:8020/HiBench/Wordcount/Input
The job took 12 seconds.
finish HadoopPrepareWordcount bench

執行腳本dom

bin/workloads/micro/wordcount/spark/run.sh

返回信息ide

[root@node1 spark]# ./run.sh
patching args=
Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf
Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf
Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf
Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf
probe sleep jar: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.2-tests.jar
start ScalaSparkWordcount bench
hdfs rm -r: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Output
Deleted hdfs://node1:8020/HiBench/Wordcount/Output
hdfs du -s: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -du -s hdfs://node1:8020/HiBench/Wordcount/Input
Export env: SPARKBENCH_PROPERTIES_FILES=/home/cf/app/HiBench-master/report/wordcount/spark/conf/sparkbench/sparkbench.conf
Export env: HADOOP_CONF_DIR=/etc/hadoop/conf.cloudera.yarn
Submit Spark job: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/spark/bin/spark-submit  --properties-file /home/cf/app/HiBench-master/report/wordcount/spark/conf/sparkbench/spark.conf --class com.intel.hibench.sparkbench.micro.ScalaWordCount --master yarn-client --num-executors 2 --executor-cores 4 --executor-memory 4g /home/cf/app/HiBench-master/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar hdfs://node1:8020/HiBench/Wordcount/Input hdfs://node1:8020/HiBench/Wordcount/Output
19/06/04 20:23:34 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
finish ScalaSparkWordcount bench

 查看report/hibench.reportoop

Type         Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)  Throughput/node     
HadoopWordcount 2019-06-04 16:59:04 37055                20.226               1832                 610                 
ScalaSparkWordcount 2019-06-04 20:23:34 36072                16.255               2219                 739                 

\report\wordcount\spark下有多個文件:monitor.log是原始日誌,bench.log是scheduler.DAGScheduler和scheduler.TaskSetManager信息,monitor.html可視化了系統的性能信息,\conf\wordcount.conf、\conf\sparkbench\spark.conf和\conf\sparkbench\sparkbench.conf是本次任務的環境變量性能

monitor.html中包含了Memory usage heatmap等統計圖:測試

根據官方文檔 https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md ,還能夠修改 hibench.scale.profile 調整測試的數據規模,修改 hibench.default.map.parallelism 和 hibench.default.shuffle.parallelism 調整並行化,修改hibench.yarn.executor.num、hibench.yarn.executor.cores、spark.executor.memory和spark.driver.memory控制Spark executor 的數量、核數、內存和driver的內存。

相關文章
相關標籤/搜索