從MySQL向Greenplum集羣中導入數據

咱們要從MySQL當中導出數據到Greenplum當中,按照如下步驟就能夠mysql

1:將MySQL當中的表導出外部文件

以schema_name.table_name爲例sql

select 
 product_id, number, name, english_name, purchase_name, system_name, bar_code, category_one, category_two, category_three,
 parent_id, parent_number, brand_id, supplier_id, price, ad_word, give_integral, shelf_life, FROM_UNIXTIME(shelve_date), product_area, country,
 sale_unit, specification, weight, length, width, height, storage_conditions, storage, model, refuse_notes, status, is_promote, 
 is_gift, is_book, is_outgoing, is_presale, is_fragile, is_have, is_cod, is_return, is_oos, is_seasonal, is_multicity, is_package, is_show, click,
 favorite, min_purchase_unit, in_price, refer_in_price, mwaverage_price, is_unique_number, is_batch_number, qs_proportion, shelf_life_proportion, box_specification,
 max_unsalable, advent_shelves, pro_warning, FROM_UNIXTIME(add_time), operator_id,FROM_UNIXTIME( audit_time), remark, price_type, new_tag, product_type, business_model, is_sell, return_policy,
 package, inventory, merchant_number, modified_time ,now()
 from schema_name.table_name  INTO OUTFILE '/tmp/table_name.txt';

 

導的時候須要注意,一些字符的轉換,對於這張表來講,主要就是在MySQL當中一些時間格式存儲的爲INT類型,咱們須要進行轉化後而後導出,並且在Greenplum當中建表的時候會多一個時間字段,咱們這裏默認導出如今時間。按照以上格式進行導出。服務器

2:將文件拷貝到Greenplum服務器上,而且建立外部表

先將文件拷貝到外部表的目錄下,這個比較簡單,什麼方法均可以,而後建立外部表:spa

create external  TABLE  schema_name.table_name_ext( product_id int,
number varchar(10),
name varchar(100),
english_name varchar(100),
purchase_name varchar(100),
system_name varchar(100),
bar_code varchar(255),
category_one int,
category_two int,
category_three int,
parent_id int,
parent_number int,
brand_id int,
supplier_id int,
price int,
ad_word varchar(100),
give_integral int,
shelf_life int,
shelve_date timestamp without time zone,
product_area int,
country int,
sale_unit varchar(20),
specification varchar(255),
weight decimal(10,2) ,
length int,
width int,
height int,
storage_conditions varchar(255),
storage smallint,
model varchar(20),
refuse_notes varchar(255),
status smallint,
is_promote smallint,
is_gift smallint,
is_book smallint,
is_outgoing smallint,
is_presale int,
is_fragile smallint,
is_have smallint,
is_cod smallint,
is_return smallint,
is_oos smallint,
is_seasonal smallint,
is_multicity smallint,
is_package smallint,
is_show smallint,
click int,
favorite int,
min_purchase_unit int,
in_price int,
refer_in_price int,
mwaverage_price int,
is_unique_number int,
is_batch_number int,
qs_proportion int,
shelf_life_proportion DOUBLE PRECISION,
box_specification varchar(50),
max_unsalable int,
advent_shelves int,
pro_warning int,
add_time timestamp without time zone,
operator_id int,
audit_time timestamp without time zone,
remark varchar(255),
price_type smallint,
new_tag int,
product_type int,
business_model smallint,
is_sell smallint,
return_policy smallint,
package varchar(200),
inventory varchar(200),
merchant_number int,
modified_time timestamp without time zone,
dw_modified_time timestamp without time zone
)  location(
'gpfdist://172.16.16.34:9888/table_name.txt' ) 
FORMAT
'TEXT' SEGMENT REJECT LIMIT 1000000 rows ;
這裏咱們要指定'gpfdist://172.16.16.34:9888/table_name.txt',這個IP地址加上外部表就能夠了,後面要把這個文件拷貝到 gpfdist 的目錄當中,咱們看下啓動方式gpfdist -d /tmp -p 9888,也就是要把外部文件拷貝到/tmp目錄下才能夠。其餘的注意列名對應就好
而後查詢一下,通常狀況列對上就不會有問題。
3:導入到Greenplum當中正式表

先建立一張正式表:code

create table schema_name.table_name ( product_id int,
number varchar(10),
name varchar(100),
english_name varchar(100),
purchase_name varchar(100),
system_name varchar(100),
bar_code varchar(255),
category_one int,
category_two int,
category_three int,
parent_id int,
parent_number int,
brand_id int,
supplier_id int,
price int,
ad_word varchar(100),
give_integral int,
shelf_life int,
shelve_date timestamp without time zone,
product_area int,
country int,
sale_unit varchar(20),
specification varchar(255),
weight decimal(10,2) ,
length int,
width int,
height int,
storage_conditions varchar(255),
storage smallint,
model varchar(20),
refuse_notes varchar(255),
status smallint,
is_promote smallint,
is_gift smallint,
is_book smallint,
is_outgoing smallint,
is_presale int,
is_fragile smallint,
is_have smallint,
is_cod smallint,
is_return smallint,
is_oos smallint,
is_seasonal smallint,
is_multicity smallint,
is_package smallint,
is_show smallint,
click int,
favorite int,
min_purchase_unit int,
in_price int,
refer_in_price int,
mwaverage_price int,
is_unique_number int,
is_batch_number int,
qs_proportion int,
shelf_life_proportion DOUBLE PRECISION,
box_specification varchar(50),
max_unsalable int,
advent_shelves int,
pro_warning int,
add_time timestamp without time zone,
operator_id int,
audit_time timestamp without time zone,
remark varchar(255),
price_type smallint,
new_tag int,
product_type int,
business_model smallint,
is_sell smallint,
return_policy smallint,
package varchar(200),
inventory varchar(200),
merchant_number int,
modified_time timestamp without time zone,
dw_modified_time timestamp without time zone
) distributed by(product_id);

而後導入數據:blog

insert into schema_name.table_name
select * from schema_name.table_name_ext

 這樣就把外部表數據導出到了內部表,均勻分佈在每一個segment上。注意schema_name.table_name的結構要和schema_name.table_name_ext是一致的。three

相關文章
相關標籤/搜索