上一章節,介紹了sqoop數據遷移工具安裝以及簡單導入實例的相關知識;本篇博客,博主將繼續爲小夥伴們分享sqoop的使用。php
1、sqoop數據導入java
(1)、導入關係表到HIVEmysql
./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1
執行報錯web
[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1 Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 18:46:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 18:46:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 18:46:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/18 18:46:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/18 18:46:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 18:46:49 INFO tool.CodeGenTool: Beginning code generation 19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 18:46:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 18:46:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.jar 19/03/18 18:46:51 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 18:46:51 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 18:46:51 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 18:46:51 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 18:46:51 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 18:46:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 18:46:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 18:46:52 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 18:46:54 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 18:46:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 18:46:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0003 19/03/18 18:46:54 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0003 19/03/18 18:46:54 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0003/ 19/03/18 18:46:54 INFO mapreduce.Job: Running job: job_1552898029697_0003 19/03/18 18:47:06 INFO mapreduce.Job: Job job_1552898029697_0003 running in uber mode : false 19/03/18 18:47:06 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 18:47:13 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 18:47:13 INFO mapreduce.Job: Job job_1552898029697_0003 completed successfully 19/03/18 18:47:13 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=206933 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=151 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3950 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3950 Total vcore-milliseconds taken by all map tasks=3950 Total megabyte-milliseconds taken by all map tasks=4044800 Map-Reduce Framework Map input records=5 Map output records=5 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=65 CPU time spent (ms)=680 Physical memory (bytes) snapshot=135651328 Virtual memory (bytes) snapshot=1715556352 Total committed heap usage (bytes)=42860544 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=151 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 21.0263 seconds (7.1815 bytes/sec) 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 18:47:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 18:47:13 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 18:47:13 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/18 18:47:13 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 19/03/18 18:47:13 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44) ... 12 more
解決方案:sql
# 查看HiveConf.class類是否存在 [hadoop@centos-aaron-h1 lib]$ jcd /home/hadoop/apps/apache-hive-1.2.2-bin/lib [hadoop@centos-aaron-h1 lib]$ jar tf hive-common-1.2.2.jar |grep HiveConf.class org/apache/hadoop/hive/conf/HiveConf.class [hadoop@centos-aaron-h1 lib]$ 查看到HiveConf.class類明明存在,只是環境沒有找到。
修改環境配置,將hive的lib添加HADOOP_CLASSPATH中數據庫
#編輯環境變量,而且添加如下內容 vi /etc/profile export HADOOP_CLASSPATH=/home/hadoop/apps/hadoop-2.9.1/lib/* export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/* #生效環境變量 source /etc/profile
再次執行,報錯以前導入emp的臨時目錄已經存在,須要刪除apache
[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1 Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 19:13:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 19:13:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 19:13:03 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/18 19:13:03 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/18 19:13:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 19:13:03 INFO tool.CodeGenTool: Beginning code generation 19/03/18 19:13:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 19:13:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 19:13:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/d1c8de7d06b0dc6c09379069fe10322a/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 19:13:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d1c8de7d06b0dc6c09379069fe10322a/emp.jar 19/03/18 19:13:07 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 19:13:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 19:13:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 19:13:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 19:13:07 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 19:13:08 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 19:13:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 19:13:08 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 19:13:09 ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://centos-aaron-h1:9000/user/hadoop/emp already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:279) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:145) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
解決方案:json
hdfs dfs -rm -r /user/hadoop/emp
再次執行,成功centos
[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1 Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 19:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 19:15:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 19:15:15 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/18 19:15:15 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/18 19:15:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 19:15:15 INFO tool.CodeGenTool: Beginning code generation 19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 19:15:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 19:15:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.jar 19/03/18 19:15:17 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 19:15:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 19:15:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 19:15:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 19:15:17 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 19:15:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 19:15:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 19:15:19 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 19:15:20 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/18 19:15:20 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 19:15:20 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 19:15:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0004 19/03/18 19:15:21 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0004 19/03/18 19:15:21 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0004/ 19/03/18 19:15:21 INFO mapreduce.Job: Running job: job_1552898029697_0004 19/03/18 19:15:28 INFO mapreduce.Job: Job job_1552898029697_0004 running in uber mode : false 19/03/18 19:15:28 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 19:15:34 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 19:15:34 INFO mapreduce.Job: Job job_1552898029697_0004 completed successfully 19/03/18 19:15:34 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=206933 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=151 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3734 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3734 Total vcore-milliseconds taken by all map tasks=3734 Total megabyte-milliseconds taken by all map tasks=3823616 Map-Reduce Framework Map input records=5 Map output records=5 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=59 CPU time spent (ms)=540 Physical memory (bytes) snapshot=129863680 Virtual memory (bytes) snapshot=1715556352 Total committed heap usage (bytes)=42860544 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=151 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 15.9212 seconds (9.4842 bytes/sec) 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 19:15:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 19:15:34 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 19:15:34 INFO hive.HiveImport: Loading uploaded data into Hive Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties OK Time taken: 2.138 seconds Loading data to table default.emp Table default.emp stats: [numFiles=1, totalSize=151] OK Time taken: 0.547 seconds
查看結果:api
hive> [hadoop@centos-aaron-h1 bin]$ hadoop fs -cat /user/hive/warehouse/emp/part-m-00000 1gopalmanager50000.00TP 2manishaProof reader50000.00TP 3khalilphp dev30000.00AC 4prasanthphp dev30000.00AC 5kranthiadmin20000.00TP
(2)、指定行分隔符和列分隔符,指定hive-import,指定覆蓋導入,指定自動建立hive表,指定表名,指定刪除中間結果數據目錄
./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp \ --fields-terminated-by "\t" \ --lines-terminated-by "\n" \ --hive-import \ --hive-overwrite \ --create-hive-table \ --delete-target-dir \ --hive-database mydb_test \ --hive-table emp
執行到最後報錯hive庫找不到
手動建立mydb_test數據塊
hive> create database mydb_test; OK Time taken: 0.678 seconds hive>
再次執行,依然報錯找不到hive庫,用命令查看數據庫是存在的;
解決方法:複製hive/conf下的hive-site.xml到sqoop工做目錄的conf下,實際上該database是在hive中存在的,因爲sqoop下的配置文件太舊引發的,通常會出如今,換臺機器執行sqoopCDH 默認路徑在sqoop下: /etc/hive/conf/hive-site.xml copy到 /etc/sqoop/conf/hive-site.xm
再次執行,成功
hive> [hadoop@centos-aaron-h1 bin]$ cd ~/sqoop/bin [hadoop@centos-aaron-h1 bin]$ ./sqoop import \ > --connect jdbc:mysql://centos-aaron-03:3306/test \ > --username root \ > --password 123456 \ > --table emp \ > --fields-terminated-by "\t" \ > --lines-terminated-by "\n" \ > --hive-import \ > --hive-overwrite \ > --create-hive-table \ > --delete-target-dir \ > --hive-database mydb_test \ > --hive-table emp Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 20:49:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 20:49:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 20:49:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 20:49:59 INFO tool.CodeGenTool: Beginning code generation 19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 20:50:00 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 20:50:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.jar 19/03/18 20:50:03 INFO tool.ImportTool: Destination directory emp deleted. 19/03/18 20:50:03 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 20:50:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 20:50:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 20:50:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 20:50:03 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 20:50:03 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 20:50:03 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 20:50:03 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 20:50:04 INFO mapreduce.JobSubmitter: number of splits:5 19/03/18 20:50:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 20:50:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0016 19/03/18 20:50:05 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0016 19/03/18 20:50:05 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0016/ 19/03/18 20:50:05 INFO mapreduce.Job: Running job: job_1552898029697_0016 19/03/18 20:50:12 INFO mapreduce.Job: Job job_1552898029697_0016 running in uber mode : false 19/03/18 20:50:12 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 20:50:18 INFO mapreduce.Job: map 20% reduce 0% 19/03/18 20:50:21 INFO mapreduce.Job: map 40% reduce 0% 19/03/18 20:50:22 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 20:50:23 INFO mapreduce.Job: Job job_1552898029697_0016 completed successfully 19/03/18 20:50:23 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=1034665 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=491 HDFS: Number of bytes written=151 HDFS: Number of read operations=20 HDFS: Number of large read operations=0 HDFS: Number of write operations=10 Job Counters Killed map tasks=1 Launched map tasks=5 Other local map tasks=5 Total time spent by all maps in occupied slots (ms)=32416 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=32416 Total vcore-milliseconds taken by all map tasks=32416 Total megabyte-milliseconds taken by all map tasks=33193984 Map-Reduce Framework Map input records=5 Map output records=5 Input split bytes=491 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=1240 CPU time spent (ms)=3190 Physical memory (bytes) snapshot=660529152 Virtual memory (bytes) snapshot=8577761280 Total committed heap usage (bytes)=214302720 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=151 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 20.6001 seconds (7.3301 bytes/sec) 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 20:50:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 20:50:23 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 20:50:23 INFO hive.HiveImport: Loading uploaded data into Hive Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties OK Time taken: 1.131 seconds Loading data to table mydb_test.emp Table mydb_test.emp stats: [numFiles=5, numRows=0, totalSize=151, rawDataSize=0] OK Time taken: 0.575 seconds [hadoop@centos-aaron-h1 bin]$
查看結果數據:
[hadoop@centos-aaron-h1 bin]$ hive Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties hive> show databases; OK default mydb_test wcc_log Time taken: 0.664 seconds, Fetched: 3 row(s) hive> use mydb_test; OK Time taken: 0.027 seconds hive> show tables; OK emp Time taken: 0.038 seconds, Fetched: 1 row(s) hive> select * from emp; OK 1 gopal manager 50000.0 TP 2 manisha Proof reader 50000.0 TP 3 khalil php dev 30000.0 AC 4 prasanth php dev 30000.0 AC 5 kranthi admin 20000.0 TP Time taken: 0.634 seconds, Fetched: 5 row(s) hive>
上面的語句等價於:
sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp \ --fields-terminated-by "\t" \ --lines-terminated-by "\n" \ --hive-import \ --hive-overwrite \ --create-hive-table \ --hive-table mydb_test.emp \ --delete-target-dir
(3)、導入到HDFS指定目錄
在導入表數據到HDFS使用Sqoop導入工具,咱們能夠指定目標目錄。如下是指定目標目錄選項的Sqoop導入命令的語法:
--target-dir <new or exist directory in HDFS>
下面的命令是用來導入emp表數據到'/queryresult'目錄。
./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --target-dir /queryresult \ --table emp --m 1
執行效果
[hadoop@centos-aaron-h1 bin]$ ./sqoop import \ > --connect jdbc:mysql://centos-aaron-03:3306/test \ > --username root \ > --password 123456 \ > --target-dir /queryresult \ > --table emp --m 1 Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 21:00:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 21:00:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 21:00:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 21:00:59 INFO tool.CodeGenTool: Beginning code generation 19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 21:00:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 21:01:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.jar 19/03/18 21:01:01 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 21:01:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 21:01:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 21:01:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 21:01:01 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 21:01:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 21:01:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 21:01:02 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 21:01:04 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 21:01:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 21:01:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0017 19/03/18 21:01:04 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0017 19/03/18 21:01:04 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0017/ 19/03/18 21:01:04 INFO mapreduce.Job: Running job: job_1552898029697_0017 19/03/18 21:01:11 INFO mapreduce.Job: Job job_1552898029697_0017 running in uber mode : false 19/03/18 21:01:11 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 21:01:17 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 21:01:17 INFO mapreduce.Job: Job job_1552898029697_0017 completed successfully 19/03/18 21:01:17 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=206929 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=151 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3157 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3157 Total vcore-milliseconds taken by all map tasks=3157 Total megabyte-milliseconds taken by all map tasks=3232768 Map-Reduce Framework Map input records=5 Map output records=5 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=60 CPU time spent (ms)=530 Physical memory (bytes) snapshot=133115904 Virtual memory (bytes) snapshot=1715552256 Total committed heap usage (bytes)=42860544 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=151 19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 14.555 seconds (10.3744 bytes/sec) 19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.
查看數據結果:
[hadoop@centos-aaron-h1 bin]$ hdfs dfs -ls /queryresult Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2019-03-18 21:01 /queryresult/_SUCCESS -rw-r--r-- 2 hadoop supergroup 151 2019-03-18 21:01 /queryresult/part-m-00000 [hadoop@centos-aaron-h1 bin]$ hdfs dfs -cat /queryresult/part-m-00000 1,gopal,manager,50000.00,TP 2,manisha,Proof reader,50000.00,TP 3,khalil,php dev,30000.00,AC 4,prasanth,php dev,30000.00,AC 5,kranthi,admin,20000.00,TP [hadoop@centos-aaron-h1 bin]$
(4)、導入表數據子集
咱們能夠導入表的使用Sqoop導入工具,"where"子句的一個子集。它執行在各自的數據庫服務器相應的SQL查詢,並將結果存儲在HDFS的目標目錄。
where子句的語法以下:
--where <condition>
下面的命令用來導入emp表數據的子集。子集查詢檢索員工ID爲3,
./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --where "id =3 " \ --target-dir /wherequery \ --table emp --m 1
執行效果
(5)、按需導入
./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --target-dir /wherequery2 \ --query 'select id,name,deg from emp WHERE id>2 and $CONDITIONS' \ --split-by id \ --fields-terminated-by '\t' \ --m 1
執行效果
(6)、增量導入
咱們能夠導入表的使用Sqoop導入工具,"where"子句的一個子集。它執行在各自的數據庫服務器相應的SQL查詢,並將結果存儲在HDFS的目標目錄。增量導入是僅導入新添加的表中的行的技術。它須要添加‘incremental’, ‘check-column’, 和 ‘last-value’選項來執行增量導入。
下面的語法用於Sqoop導入命令增量選項:
--incremental <mode> --check-column <column name> --last value <last check column value>
假設新添加的數據轉換成emp表以下:
6, satish p, grp des, 20000, GR
下面的命令用於在emp表執行增量導入:
./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp --m 1 \ --target-dir /wherequery \ --incremental append \ --check-column id \ --last-value 5
執行效果:
2、Sqoop的數據導出
將數據從HDFS導出到RDBMS數據庫;導出前,目標表必須存在於目標數據庫中;默認操做是將文件中的數據使用INSERT語句插入到表中;更新模式下,是生成UPDATE語句更新表數據;
語法:
如下是export命令語法
sqoop export (generic-args) (export-args)
示例:
數據是在HDFS 中「/queryresult 」目錄的hdfs dfs -cat /queryresult/part-m-00000文件中。所述hdfs dfs -cat /queryresult/part-m-00000以下:
1,gopal,manager,50000.00,TP 2,manisha,Proof reader,50000.00,TP 3,khalil,php dev,30000.00,AC 4,prasanth,php dev,30000.00,AC 5,kranthi,admin,20000.00,TP
(1)、首先須要手動建立mysql中的目標表
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | azkaban | | hive | | hivedb | | mysql | | performance_schema | | test | | urldb | | web_log_wash | +--------------------+ 9 rows in set (0.00 sec) mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> CREATE TABLE employee ( -> id INT NOT NULL PRIMARY KEY, -> name VARCHAR(20), -> deg VARCHAR(20), -> salary INT, -> dept VARCHAR(10)); Query OK, 0 rows affected (0.02 sec) Aborted
(2)、而後執行導出命令
./sqoop export \ --connect "jdbc:mysql://centos-aaron-03:3306/test?useUnicode=true&characterEncoding=utf-8" \ --username root \ --password 123456 \ --table employee \ --fields-terminated-by "," \ --export-dir /queryresult/part-m-00000 \ --columns="id,name,deg,salary,dept"
報錯
具體問題是數據中有中文,而數據庫表編碼不支持
解決方案以下:
將表的數據導出,刪除表後從新建立表,指定編碼DEFAULT CHARSET=utf8
繼續報錯,分析確認hdfs上數據內容與建表時的int字段不匹配,須要將表的int改成decimal類型
繼續執行,成功
驗證效果:
3、Sqoop做業
注:Sqoop做業——將事先定義好的數據導入導出任務按照指定流程運行
語法:
如下是建立Sqoop做業的語法
$ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
建立做業(--create)
在這裏,咱們建立一個名爲myjob,這能夠從RDBMS表的數據導入到HDFS做業
#該命令建立了一個從db庫的employee表導入到HDFS文件的做業 ./sqoop job --create myimportjob -- import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --m 1
驗證做業 (--list)
‘--list’ 參數是用來驗證保存的做業。下面的命令用來驗證保存Sqoop做業的列表。
#它顯示了保存做業列表。 sqoop job --list
檢查做業(--show)
‘--show’ 參數用於檢查或驗證特定的工做,及其詳細信息。如下命令和樣本輸出用來驗證一個名爲myjob的做業。
#它顯示了工具和它們的選擇,這是使用在myjob中做業狀況。 sqoop job --show myjob
[hadoop@centos-aaron-h1 bin]$ sqoop job --show myimportjob Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 22:46:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: Job: myimportjob Tool: import Options: ---------------------------- verbose = false hcatalog.drop.and.create.table = false db.connect.string = jdbc:mysql://centos-aaron-03:3306/test codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 split.limit = null hbase.create.table = false mainframe.input.dataset.type = p db.require.password = true skip.dist.cache = false hdfs.append.dir = false db.table = emp codegen.input.delimiters.escape = 0 accumulo.create.table = false import.fetch.size = null codegen.input.delimiters.enclose.required = false db.username = root reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 sqoop.throwOnError = false hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = false codegen.input.delimiters.enclose = 0 accumulo.batch.size = 10240000 hive.drop.delims = false customtool.options.jsonmap = {} codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 sqlconnection.metadata.transaction.isolation.level = 2 codegen.output.delimiters.field = 44 export.new.update = UpdateOnly incremental.mode = None hdfs.file.format = TextFile sqoop.oracle.escaping.disabled = true codegen.compile.dir = /tmp/sqoop-hadoop/compile/e0ba9288d4916ac38fdbbe98737f9829 direct.import = false temporary.dirRoot = _sqoop hive.fail.table.exists = false db.batch = false [hadoop@centos-aaron-h1 bin]$
執行做業 (--exec)
‘--exec’ 選項用於執行保存的做業。下面的命令用於執行保存的做業稱爲myjob
sqoop job --exec myjob #正常狀況它會顯示下面的輸出。 10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation
報錯:
分析是因爲mysql訪問權限引發,須要修改數據庫權限:
#123456表示數據庫鏈接密碼 grant all privileges on *.* to root@'%' identified by '123456' ; FLUSH PRIVILEGES;
再次執行sqoop job,成功
[hadoop@centos-aaron-h1 bin]$ sqoop job --exec myimportjob Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 23:02:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 19/03/18 23:02:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 23:02:11 INFO tool.CodeGenTool: Beginning code generation 19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 23:02:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 23:02:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.jar 19/03/18 23:02:13 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 23:02:13 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 23:02:13 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 23:02:13 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 23:02:13 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 23:02:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 23:02:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 23:02:14 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 23:02:16 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/18 23:02:16 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 23:02:16 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 23:02:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0030 19/03/18 23:02:17 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0030 19/03/18 23:02:17 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0030/ 19/03/18 23:02:17 INFO mapreduce.Job: Running job: job_1552898029697_0030 19/03/18 23:02:24 INFO mapreduce.Job: Job job_1552898029697_0030 running in uber mode : false 19/03/18 23:02:24 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 23:02:30 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 23:02:30 INFO mapreduce.Job: Job job_1552898029697_0030 completed successfully 19/03/18 23:02:30 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=207365 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=180 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3466 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3466 Total vcore-milliseconds taken by all map tasks=3466 Total megabyte-milliseconds taken by all map tasks=3549184 Map-Reduce Framework Map input records=6 Map output records=6 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=63 CPU time spent (ms)=590 Physical memory (bytes) snapshot=132681728 Virtual memory (bytes) snapshot=1715552256 Total committed heap usage (bytes)=42860544 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=180 19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Transferred 180 bytes in 15.5112 seconds (11.6045 bytes/sec) 19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Retrieved 6 records. [hadoop@centos-aaron-h1 bin]$
4、Sqoop的原理
概述:Sqoop的原理其實就是將導入導出命令轉化爲mapreduce程序來執行,sqoop在接收到命令後,都要生成mapreduce程序;使用sqoop的代碼生成工具能夠方便查看到sqoop所生成的java代碼,並可在此基礎之上進行深刻定製開發。
代碼定製:
如下是Sqoop代碼生成命令的語法
$ sqoop-codegen (generic-args) (codegen-args)
示例:以USERDB數據庫中的表emp來生成Java代碼爲例。
下面的命令用來生成導入
sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir .
若是命令成功執行,那麼它就會產生以下的輸出
[hadoop@centos-aaron-h1 bin]$ sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir . Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 23:21:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 23:21:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 23:21:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 23:21:24 INFO tool.CodeGenTool: Beginning code generation 19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 19/03/18 23:21:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: ./emp.java使用或覆蓋了已過期的 API。 注: 有關詳細信息, 請使用 -Xlint:deprecation 從新編譯。 19/03/18 23:21:26 INFO orm.CompilationManager: Writing jar file: ./emp.jar [hadoop@centos-aaron-h1 bin]$ ll
驗證: 查看輸出目錄下的文件
若是想作深刻定製導出,則可修改上述代碼文件。
最後寄語,以上是博主本次文章的所有內容,若是你們以爲博主的文章還不錯,請點贊;若是您對博主其它服務器大數據技術或者博主本人感興趣,請關注博主博客,而且歡迎隨時跟博主溝通交流。