sqoop 安裝完成以後,下面介紹基本操做。html
sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password Oracle123java
sqoop list-tables \
--connect jdbc:mysql://localhost:3306/hive \
--username root \
--password Oracle123node
[hadoop@hd1 conf]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password Oracle123 \
> --table sqoop1 \
> --delete-target-dir \
> -m 1mysql
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742) at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369) at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.lang.ClassNotFoundException: org.json.JSONObject at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
看樣子缺乏jar包,導入json.jar包從新導入。sql
40899430803_0003/libjars/parquet-hadoop-1.5.0-cdh5.7.0.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
看錯誤提示 不能知足MR 最低副本=1的要求,能夠肯定要麼DN死了,要麼NN不能正常跟DN通訊。apache
通常解決方法有2中:json
1:檢查DN跟NN之間的防火牆設置,網絡是否通暢,DN節點的DataNode是否啓動api
2:因爲NN從新格式化過,因此有可能形成NN的版本號跟DN記錄的不一致,這個時候須要從新format NN。服務器
關閉NN於DN的防火牆以後從新執行。網絡
430803_0004_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1540918072803 found 1540908325730 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) . Failing the application. 18/10/30 21:55:26 INFO mapreduce.Job: Counters: 0 18/10/30 21:55:26 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 211.1476 seconds (0 bytes/sec) 18/10/30 21:55:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Retrieved 0 records. 18/10/30 21:55:26 ERROR tool.ImportTool: Error during import: Import job failed!
看報錯信息,提示NN跟DN之間的時間不一致,致使sync出錯。
設置服務器時間:
date -s "2018-10-30 18:22:10" && hwclock --systohc
hwclock --show
修改以後從新執行:
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
感受是驅動的問題,換了一個驅動仍是報這個錯誤,我決定換一個MySQL版本(mysql-5.6.23)。
[root@hd1 mysql]# vi /etc/my.cnf wait_timeout=86400 [mysqld] basedir=/home/mysql/mysql-5.6.23 datadir=/home/mysql/mysql-5.6.23/data socket=/tmp/mysql.sock log_error=/home/mysql/mysql-5.6.23/mysql.err user=mysql [mysql] socket=/tmp/mysql.sock
[mysql@hd1 support-files]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.23 MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables; ERROR 1046 (3D000): No database selected mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.00 sec)
繼續執行仍是報錯,百思不得其解,最後我把localhost改爲具體IP地址
--connect jdbc:mysql://192.168.83.11:3306/sqoop 在執行盡然成功了。
18/10/31 13:22:16 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=137094 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=4 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5935 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5935 Total vcore-seconds taken by all map tasks=5935 Total megabyte-seconds taken by all map tasks=6077440 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=139 CPU time spent (ms)=990 Physical memory (bytes) snapshot=107266048 Virtual memory (bytes) snapshot=2714398720 Total committed heap usage (bytes)=22544384 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=4 18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Transferred 4 bytes in 24.3133 seconds (0.1645 bytes/sec) 18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Retrieved 1 records.
爲何會這樣?
在官網上看到以下帖子:
You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.
Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn't running.
This stack trace is very common when the cluster is being shut down -because at that point Hadoop services are being torn down across the cluster, which is visible to those services and applications which haven't been shut down themselves. Seeing this error message during cluster shutdown is not anything to worry about.
If the application or cluster is not working, and this message appears in the log, then it is more serious.
The exception text declares both the hostname and the port to which the connection failed. The port can be used to identify the service. For example, port 9000 is the HDFS port. Consult the Ambari port reference, and/or those of the supplier of your Hadoop management tools.
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
Check the port the client is trying to talk to using matches that the server is offering a service on. The netstat command is useful there.
On the server, try a telnet localhost <port> to see if the port is open there.
On the client, try a telnet <server> <port> to see if the port is accessible remotely.
Please do not file bug reports related to your problem, as they will be closed as Invalid
sqoop create-hive-table --connect jdbc:mysql://hd1:3306/hive --table TBLS --username root --password Oracle123 --hive-table TBLS
18/10/31 17:16:02 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 18/10/31 17:16:02 ERROR tool.CreateHiveTableTool: Encountered IOException running create table job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.CreateHiveTableTool.run(CreateHiveTableTool.java:58) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:259) at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44) ... 11 more
解決方法:添加HADOOP_CLASSPATH到用戶環境變量中。
export HADOOP_CLASSPATH=$HIVE_HOME/lib/*