1.注意win下直接複製進linux 改一下--等html
sqoop-list-databases --connect jdbc:mysql://122.206.79.212:3306/ --username root -P
先看一下有什麼數據庫,發現有些數據庫,能查詢到的數據庫才能導入,很奇怪。java
2.導入到hdfsnode
sqoop import --connect jdbc:mysql://122.206.79.212:3306/dating --username root --password 123456 --table t_rec_top --driver com.mysql.jdbc.Driver
那個數據庫 端口號 帳戶名 密碼 那個表 不須要加上驅動 那沒指定導入到hdfs的哪,確定會有默認位置的mysql
能夠看出只有map任務 沒有reduce任務linux
Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/03/15 11:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/03/15 11:05:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/03/15 11:05:12 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 17/03/15 11:05:12 INFO manager.SqlManager: Using default fetchSize of 1000 17/03/15 11:05:12 INFO tool.CodeGenTool: Beginning code generation 17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0 17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0 17/03/15 11:05:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/mapreduce Note: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/03/15 11:05:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.jar 17/03/15 11:05:18 INFO mapreduce.ImportJobBase: Beginning import of t_rec_top SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/03/15 11:05:19 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/03/15 11:05:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0 17/03/15 11:05:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/03/15 11:05:21 INFO client.RMProxy: Connecting to ResourceManager at CentOSMaster/192.168.58.180:8032 17/03/15 11:05:28 INFO db.DBInputFormat: Using read commited transaction isolation 17/03/15 11:05:28 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM t_rec_top 17/03/15 11:05:28 INFO mapreduce.JobSubmitter: number of splits:1 17/03/15 11:05:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489547007191_0001 17/03/15 11:05:30 INFO impl.YarnClientImpl: Submitted application application_1489547007191_0001 17/03/15 11:05:31 INFO mapreduce.Job: The url to track the job: http://CentOSMaster:8088/proxy/application_1489547007191_0001/ 17/03/15 11:05:31 INFO mapreduce.Job: Running job: job_1489547007191_0001 17/03/15 11:05:48 INFO mapreduce.Job: Job job_1489547007191_0001 running in uber mode : false 17/03/15 11:05:48 INFO mapreduce.Job: map 0% reduce 0% 17/03/15 11:06:06 INFO mapreduce.Job: map 100% reduce 0% 17/03/15 11:06:07 INFO mapreduce.Job: Job job_1489547007191_0001 completed successfully 17/03/15 11:06:07 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=127058 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=99 HDFS: Number of bytes written=21 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=13150 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=13150 Total vcore-milliseconds taken by all map tasks=13150 Total megabyte-milliseconds taken by all map tasks=13465600 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=99 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=183 CPU time spent (ms)=1200 Physical memory (bytes) snapshot=107761664 Virtual memory (bytes) snapshot=2069635072 Total committed heap usage (bytes)=30474240 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=21 17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Transferred 21 bytes in 46.7701 seconds (0.449 bytes/sec) 17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Retrieved 1 records.
建立一個user/yonhumig的目錄,其中t_rec_top裏就是咱們的數據,不過沒有標頭,能夠看出只是以m,表示map任務就結束了web
wc00是配置文件sql
"AS 1 "License"); 1 ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. 1 (the 1 --> 3 2.0 1 <!-- 3 </configuration> 1 </description> 1 </property> 15 <?xml 1 <configuration> 1 <description>Amount 1 <description>List 1 <description>Number 1 <description>The 7 <description>Where 1 <description>Whether 1 <description>fair-scheduler 1 <description>the 1 <name>yarn.log-aggregation-enable</name> 1 <name>yarn.nodemanager.aux-services</name> 1 <name>yarn.nodemanager.local-dirs</name> 1 <name>yarn.nodemanager.remote-app-log-dir</name> 1 <name>yarn.nodemanager.resource.cpu-vcores</name> 1 <name>yarn.nodemanager.resource.memory-mb</name> 1 <name>yarn.resourcemanager.address</name> 1 <name>yarn.resourcemanager.admin.address</name> 1 <name>yarn.resourcemanager.hostname</name> 1 <name>yarn.resourcemanager.resource-tracker.address</name> 1 <name>yarn.resourcemanager.scheduler.address</name> 1 <name>yarn.resourcemanager.scheduler.class</name> 1 <name>yarn.resourcemanager.webapp.address</name> 1 <name>yarn.resourcemanager.webapp.https.address</name> 1 <name>yarn.scheduler.fair.allocation.file</name> 1 <property> 15 <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value> 1 <value>${yarn.resourcemanager.hostname}:8030</value> 1 <value>${yarn.resourcemanager.hostname}:8031</value> 1 <value>${yarn.resourcemanager.hostname}:8032</value> 1 <value>${yarn.resourcemanager.hostname}:8033</value> 1 <value>${yarn.resourcemanager.hostname}:8088</value> 1 <value>${yarn.resourcemanager.hostname}:8090</value> 1 <value>/home/hxsyl/Spark_Relvant/yarn/local</value> 1 <value>/tmp/logs</value> 1 <value>12</value> 1 <value>30720</value> 1 <value>CentOSMaster</value> 1 <value>mapreduce_shuffle</value> 1 <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> 1 <value>true</value> 1 ANY 1 An 1 Apache 1 BASIS, 1 CONDITIONS 1 CPU 1 Configs 1 IS" 1 Individual 1 KIND, 1 LICENSE 1 License 3 License, 1 License. 2 Licensed 1 MB, 1 Manager 1 OF 1 OR 1 RM 3 RM.</description> 2 Resource 1 See 2 Site 1 Unless 1 Version 1 WARRANTIES 1 WITHOUT 1 YARN 1 You 1 a 1 a-zA-Z0-9_ 1 accompanying 1 adddress 1 address 4 admin 1 aggregate 1 aggregation</description> 1 agreed 1 allocated 2 an 1 and 2 applicable 1 application's 1 application.</description> 2 applications 1 as 1 at 1 be 4 by 1 called 1 can 3 class 1 compliance 1 conf 1 configuration 1 contain 1 container_${contid}, 1 containers' 1 containers.</description> 2 copy 1 cores 1 directories 1 directories, 1 directory 1 distributed 2 either 1 enable 1 except 1 express 1 file 2 file. 1 files 1 for 3 found 1 governing 1 hostname 1 http 1 http://www.apache.org/licenses/LICENSE-2.0 1 https 1 implied. 1 in 4 in. 1 in: 1 interface 1 interface.</description> 2 is 1 language 1 law 1 limitations 1 localized 2 location</description> 1 log 1 logs 1 manager 1 may 2 memory, 1 name 1 not 2 numbers</description> 1 obtain 1 of 11 on 1 only 1 or 2 permissions 1 physical 1 properties 1 required 1 resource 1 scheduler 1 scheduler.</description> 1 service 1 should 1 software 1 specific 2 start 1 store 1 subdirectories 1 that 2 the 15 this 1 this. 1 to 5 to.</description> 1 under 3 use 2 valid 1 version="1.0"?> 1 web 2 will 2 with 2 work 1 writing, 1 you 1
--target-dir /path 放到那個路徑 -m :標書numberMapper數據庫
從hdfs上打開的文件能夠看出 默認是逗號 --fields-terminated-by '\t' 這個分隔符不是爲了寫入到hdfs來分割,而是原始數據的分隔符express
--columns 'id,account,income' 只導入某些特定的列apache
符合特定條件的列才被導入,--where "id>2 and id <9"
從多個表查詢或者指定查詢語句 --query "select * form t_detail where id >5 and $CONDITIONS" $那個必須加
可是若是-m大於1 就須要指定各個Mapper讀取幾條記錄或者找分隔符 --split-by t_detail.id $CONDITIONS就是根據分割的信息找到記錄條數,進而切分數據,
建議使用單引號 使用雙引號須要轉義, --後邊跟的是全稱 -是簡寫
單引號與雙引號的最大不一樣在於雙引號仍然能夠保有變量的內容,但單引號內僅能是通常字符 ,而不會有特殊符號。咱們以底下的例子作說明:假設您定義了一個變量, name=VBird ,如今想以 name 這個變量的內容定義出 myname 顯示 VBird its me 這個內容,要如何訂定呢? [root@linux ~]# name=VBird [root@linux ~]# echo $name VBird [root@linux ~]# myname="$name its me" [root@linux ~]# echo $myname VBird its me [root@linux ~]# myname='$name its me' [root@linux ~]# echo $myname $name its me 發現了嗎?沒錯!使用了單引號的時候,那麼 $name 將失去原有的變量內容, 僅爲通常字符的顯示型態而已!這裏必須要特別當心在乎!