PostgreSQL數據庫中的表和數據（Tables & Data）

時間 2019-11-12

標籤 postgresql 數據庫數據 tables data 欄目 Postgre SQL 简体版

原文原文鏈接

照顧好你的數據,數據庫也會照顧你。保持數據庫的整潔，查詢起來也會更快，應用也會少些錯誤。半夜被叫醒解決數據問題並不酷。接下來，就和章郎蟲博主一塊兒來了解postgresql的表和數據吧。html

1、選擇一個好的數據庫對象名（Choosing good names for database objects）

讓其餘人能夠快速瞭解數據庫的最簡單方法就是給數據庫各對象取一個有意義的名字。具體注意事項能夠參考《PostgreSQL-9-Admin-Cookbook》的96頁。sql

在postgresql中，標準的索引表的格式是：{tablename}_{columnname(s)}_{suffix} ，即{表名}_{列名}_{後綴}。後綴有pkey、key、excl、idx和seq幾種，分別對應主鍵約束、惟一約束、排他性約束、其它類型的索引和序列。數據庫

postgresql中的表能夠同時包含多個觸發器。觸發器名中能夠包含一些動做，好比update、delete等。觸發器一個有用的命名規範格式爲：{tablename}_{actionname}_{after|before}__trig 。服務器

2、處理包含引用名的對象（Handling objects with quoted names）

博主第一次看到這個標題，實在不明白是什麼意思（英語很差），不過看了如下例子你們應該就會明白。dom

首先建立包含引用的對象，CREATE TABLE 「MyCust」 AS SELECT * FROM cust;ide

而後用下面幾個語句查詢，能夠發現都出現了相同的錯誤。工具

postgres=# SELECT count(*) FROM mycust;post

ERROR: relation 「mycust」 does not exist測試

LINE 1: SELECT * FROM mycust;ui

postgres=# SELECT count(*) FROM MyCust;

ERROR: relation 「mycust」 does not exist

LINE 1: SELECT * FROM mycust;

而這個是對的。

postgres=# SELECT count(*) FROM 「MyCust」;

count

——-

(1 row)

從上面這個例子能夠發現，若是建立對象的時候名字中用了引號，那麼查詢時也必定要包含引號。並且postgresql中對象名對大小寫不敏感，也就是說「SELECT * FROM mycust;」、「SELECT * FROM MYCUST;」和「SELECT * FROM MyCust;」是同樣的。

3、執行相同的名稱,相同的列定義（Enforcing same name, same column definition）

兩個比較複雜的sql。

Columns ：

We can identify columns that are defined in different ways in different tables using a query

against the catalog.

SELECT table_schema,table_name,column_name,data_type ||coalesce(‘ ‘ || text(character_maximum_length), 」) ||coalesce(‘ ‘ || text(numeric_precision), 」) ||coalesce(‘,’ || text(numeric_scale), 」) as data_type FROM information_schema.columns

WHERE column_name IN(

SELECT column_name FROM

(SELECT column_name,data_type,character_maximum_length,numeric_precision,numeric_scale FROM information_schema.columns

WHERE table_schema = ‘public’

GROUP BY column_name,data_type,character_maximum_length,numeric_precision,numeric_scale

) derived

GROUP BY column_name

HAVING count(*) > 1

) AND table_schema NOT IN (‘information_schema’, ‘pg_catalog’)

ORDER BY column_name ;

Tables：

The following query looks for all tables of the same name (and

hence in different schemas) that have different definitions.

SELECT table_schema,table_name,column_name,data_type FROM information_schema.columns

WHERE table_name IN

(SELECT table_name FROM

(SELECT table_schema,table_name,string_agg(‘ ‘||column_name||’ ‘||data_type) FROM information_schema.columns

GROUP BY table_schema,table_name

) def

GROUP BY table_name

HAVING count(*) > 1

) ORDER BY table_name,table_schema,column_name;

4、識別和去除重複定義（Identifying and removing duplicates）

關係型數據庫中能夠標識惟一的數據項，可是可能不知道什麼緣由，數據中會出現重複。

好比這個例子，在customerid就有重複的數據。

postgres=# SELECT * FROM cust;

customerid | firstname | lastname | age

————+———–+———-+—–

1 | Philip | Marlowe | 38

2 | Richard | Hannay | 42

3 | Holly | Martins | 25

4 | Harry | Palmer | 36

4 | Mark | Hall | 47

(5 rows)

這裏能夠用下面這個語句找出重複的數據。

SELECT * FROM cust WHERE customerid IN (SELECT customerid FROM cust GROUP BY customerid HAVING count(*) > 1);

找出重複數據後能夠對這些數據進行更新或者刪除。

5、防止出現重複行（Preventing duplicate rows）

從四中咱們能夠知道，數據庫可能會出現重複的數據項。若是咱們不想讓某列出現重複，那麼咱們能夠在定義數據庫表的時候進行惟一限制。具體有下面幾種方法。

1.建立主鍵

ALTER TABLE newcust ADD PRIMARY KEY(customerid);

運行後建立新索引newcust_pkey 。

2.建立惟一約束

ALTER TABLE newcust ADD UNIQUE(customerid);

運行後建立新索引newcust_customerid_key 。

3.建立惟一索引

CREATE UNIQUE INDEX ON newcust (customerid);

運行後建立新索引newcust_customerid_idx 。

6、在一組數據中找出惟一鍵（Finding a unique key for a set of data）

沒有工具，咱們找惟一鍵可能也會很快，好比看列的名字、外鍵就能夠了。這裏咱們使用postgresql提供的optimizer statistics。

postgresql=# analyze article ;

ANALYZE

postgresql=# select attname,n_distinct from pg_stats where schemaname = ‘public’ AND tablename = ‘article’ ;

attname | n_distinct

————–+————

rply_cnt | 564

read_cnt | 930

url_hash | -1

hash_plain | -1

title_hash | -1

guid | -1

neg_pos | 1

match_code | -0.937369

tm_spider | -0.389967

aid | -1

style | 3

oaid | 1102

fid | 6

bid | 67

cid | 2

tid | 3

url | -1

tm_post | -0.915479

tm_last_rply | 0

author | 49

title | -0.95474

content | 0

ab_content | -0.924905

tm_update | -0.685363

stage | 1

rply_cut | 473

read_cut | 814

src | 1

rfid | 5

labels | 172

kwds | 0

like_cnt | 186

(32 rows)

若是n_distinct等於-1，那麼說明在檢查的這些數據中，這個列是惟一的。若是有多個-1，那麼咱們可能須要判斷下。（We would then need to use our judgment to decide whether one or both of those columns are

unique by chance, or as part of the design of the database that created them.）

7、生成測試數據（Generating test data）

生成順序數

zhangnq=# select * from generate_series(1,5) ;

generate_series

—————–

(5 rows)

生成時間

zhangnq=# SELECT date(generate_series(now(), now() + ’1 week’, ’1 day’));

date

————

2014-02-25

2014-02-26

2014-02-27

2014-02-28

2014-03-01

2014-03-02

2014-03-03

2014-03-04

(8 rows)

隨機整數

zhangnq=# select (random()*(2*10^9))::integer ;

int4

———–

958536259

(1 row)

隨機長整型數字

zhangnq=# select (random()*(9*10^18))::bigint ;

int8

———————

6527764440514147328

(1 row)

隨機小數數字

zhangnq=# select (random()*100.)::numeric(4,2);

numeric

———

39.97

(1 row)

隨機重複的字符串，最長長度40 。

zhangnq=# select repeat(’1′,(random()*40)::integer) ;

repeat

———–

111111111

(1 row)

隨機長度字符串

zhangnq=# select substr(‘abcdefghijklmnopqrstuvwxyz’,1, (random()*26)::integer) ;

substr

————

abcdefghij

(1 row)

從一個字符串列表中隨機取一個字符串

zhangnq=# select (ARRAY['one','two','three'])[1+random()*3] ;

array

——-

two

(1 row)

用隨機數據生成表

zhangnq=# SELECT generate_series(1,10) as key ,(random()*100.)::numeric(4,2) ,repeat(’1′,(random()*25)::integer);

8、隨機抽樣數據（Randomly sampling data）

生成隨機數據

pg_dump –-exclude-table=MyBigTable > db.dmp

pg_dump –-table=MyBigTable –schema-only > mybigtable.schema

psql -c ‘\copy (SELECT * FROM MyBigTable

WHERE random() < 0.01) to mybigtable.dat’

導入隨機數據

psql -f db.dmp

psql -f mybigtable.schema

psql -c ‘\copy mybigtable from mybigtable.dat’

總的來講,若是能夠的話,個人建議是避免取樣，或者至少減小一些大表取樣。

9、從表格中加載數據（Loading data from a spreadsheet）

如今大多數小數據都經過表格儲存，因此從表格中加載數據是不少數據庫管理員必需要面對的問題。

在導入到數據庫以前，表格須要知足一下幾個條件：

1.全部列都只有一列

2.全部行都只有一行

3.數據只在一個工做表

4.第一行的列是一些描述或者標題（可選條件）

固然如今不少表格中包含了公式、彙總、宏、圖像等等，因此在操做以前，須要先把這些轉化成符合上面條件的最簡單表的格式。而後把表格另存爲csv格式，上傳到服務器。

開始導入數據

postgres=# \COPY sample FROM sample.csv CSV HEADER

postgres=# SELECT * FROM sample;

或者psql -c ‘\COPY sample FROM sample.csv CSV HEADER’

10、從平面文件加載數據（Loading data from flat files）

這個主要講pgloader的用法，以後博主會再介紹。

PostgreSQL數據庫中的表和數據（Tables & Data）

1、選擇一個好的數據庫對象名（Choosing good names for database objects）

2、處理包含引用名的對象（Handling objects with quoted names）

3、執行相同的名稱,相同的列定義（Enforcing same name, same column definition）

4、識別和去除重複定義（Identifying and removing duplicates）

5、防止出現重複行（Preventing duplicate rows）

6、在一組數據中找出惟一鍵（Finding a unique key for a set of data）

7、生成測試數據（Generating test data）

8、隨機抽樣數據（Randomly sampling data）

9、從表格中加載數據（Loading data from a spreadsheet）

10、從平面文件加載數據（Loading data from flat files）

相關內容