PostgreSQL的基礎數據類型分析記錄

時間 2019-11-17

標籤 postgresql 基礎數據類型分析記錄欄目 Postgre SQL 简体版

原文原文鏈接

前期，我參與了公司開發的數據庫數據遷移工具的工做，以及以前的對Page的分析記錄，在此進一步將數據庫的數據類型作一下分析記錄。 html

1、數據庫系統表pg_type

PostgreSQL的全部數據類型都存儲在系統表pg_type中。
pg_type的表結構以下（這裏是從源碼中進行介紹的，源碼能夠點擊pg_type.h）： node

CATALOG(pg_type,1247) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71) BKI_SCHEMA_MACRO
{
	NameData	typname;		/* type name */
	Oid			typnamespace;	/* OID of namespace containing this type */
	Oid			typowner;		/* type owner */
	int16		typlen;
	bool		typbyval;
	char		typtype;
	char		typcategory;	/* arbitrary type classification */
	bool		typispreferred; /* is type "preferred" within its category? */
	bool		typisdefined;
	char		typdelim;		/* delimiter for arrays of this type */
	Oid			typrelid;		/* 0 if not a composite type */
	Oid			typelem;
	Oid			typarray;
	regproc		typinput;		/* text format (required) */
	regproc		typoutput;
	regproc		typreceive;		/* binary format (optional) */
	regproc		typsend;
	regproc		typmodin;
	regproc		typmodout;
	regproc		typanalyze;
	char		typalign;
	char		typstorage;
	bool		typnotnull;
	Oid			typbasetype;
	int32		typtypmod;
	int32		typndims;
	Oid			typcollation;
#ifdef CATALOG_VARLEN			/* variable-length fields start here */
	pg_node_tree typdefaultbin;
	text		typdefault;
	aclitem		typacl[1];
#endif
} FormData_pg_type;

下面來簡單介紹pg_type的各個字段含義。

    typname、typnamespace、typowner 這三個字段名字上就能夠看出來他們的含義。
    typlen：這是標明類型的長度的，若是類型是定長的就是寫明字段的長度（字節）。若是是變長的則是-1。好比int4也就是int或者integer，typlen爲4，佔用4個字節，varchar則爲-1。
    typbyval：判斷內部過程傳遞這個類型的數值時是經過傳值仍是傳引用。若是該類型不是 1, 2, 4, 8 字節長將只能按應用傳遞，所以 typbyval 最好是假。即便能夠傳值，typbyval 也能夠爲假。好比float4就是如此。
    typtype:對於基礎類型是b，對於複合類型是 c (好比，一個表的行類型)。對於域類型是d，對於僞類型是p.
本博文也是主要分析基礎類型。
    typcategory：這是對數據類型進行分類的，int二、int四、int8的typcategory都是N。typcategory的分類詳看下錶： git

Code	Category
A	Array types
B	Boolean types
C	Composite types
D	Date/time types
E	Enum types
G	Geometric types
I	Network address types
N	Numeric types
P	Pseudo-types
R	Range types
S	String types
T	Timespan types
U	User-defined types
V	Bit-string types
X	unknown type

      typispreferred：這個字段和 typcategory是一塊兒工做的，表示是否在 typcategory分類中首選的。
     typisdefined：這個字段是類型可否使用的前提，標識數據類型是否被定義，false的話，根本沒法使用。（你們能夠將int4的 typis的fined改成false，而後用int4做爲表的字段類型建表，會直接報錯type integer is only a shell）。
     typdelim：當分析數組輸入時，分隔兩個此類型數值的字符請注意該分隔符是與數組元素數據類型相關聯的，而不是和數組數據類型關聯。
     typrelid：若是是複合類型(見 typtype)那麼這個字段指向 pg_class 中定義該表的行。對於自由存在的複合類型，pg_class 記錄並不表示一個表，可是總須要它來查找該類型鏈接的 pg_attribute 記錄。對於非複合類型爲零。
     typelem：若是不爲 0 ，那麼它標識 pg_type 裏面的另一行。當前類型能夠當作一個產生類型爲 typelem 的數組來描述。一個"真正的"數組類型是變長的(typlen = -1)，可是一些定長的(typlen > 0)類型也擁有非零的 typelem(好比 name 和 point)。若是一個定長類型擁有一個 typelem ，那麼他的內部形式必須是 typelem 數據類型的某個數目的個數值，不能有其它數據。變長數組類型有一個該數組子過程定義的頭(文件)。
     typarray：指向同類型的數組類型的Oid。
     typinput，typoutput：類型的輸入輸出函數，數據庫進行對數字進行存儲或者輸出，首先由客戶端獲取數據（通常爲字符串）進行轉化，變爲數據庫可以使用的數據類型。輸出函數亦然。
    typreceive，typsend：輸入、輸出轉換函數，多用於二進制格式。
    typmodin，typmodout：對於變長的數據的輸入、輸出，這裏主要是指vachar、time、timestamp等。這個字段和系統表pg_attribute的atttypmod相關聯。
    typanalyze：自定義的 ANALYZE 函數，若是使用標準函數，則爲 0。
    typalign：當存儲此類型的數值時要求的對齊性質。它應用於磁盤存儲以及該值在 PostgreSQL 內部的大多數形式。若是數值是連續存放的，好比在磁盤上以徹底的裸數據的形式存放時，那麼先在此類型的數據前填充空白，這樣它就能夠按照要求的界限存儲。對齊引用是該序列中第一個數據的開頭。
可能的值有：
                c = char 對齊，也就是不須要對齊。
                s = short 對齊(在大多數機器上是 2 字節)
                i = int 對齊(在大多數機器上是 4 字節)
                d = double 對齊(在大多數機器上是 8 字節，但不必定是所有)
    typstorage：告訴一個變長類型(那些有 typlen = -1)的)說該類型是否準備好應付很是規值，以及對這種屬性的類型的缺省策略是什麼。可能的值有：
                                                    p: 數值老是以簡單方式存儲
                                                    e: 數值能夠存儲在一個"次要"關係中
                                                    m: 數值能夠之內聯的壓縮方式存儲
                                                    x: 數值能夠之內聯的壓縮方式或者在"次要"表裏存儲。
請注意 m 域也能夠移到從屬表裏存儲，但只是最後的解決方法(e 和 x 域先移走)。
    typnotnull：表明在某類型上的一個 NOTNULL 約束。目前只用於域。
    typbasetype：若是這是一個衍生類型(參閱 typtype)，那麼該標識做爲這個類型的基礎的類型。若是不是衍生類型則爲零。
    typtypmod：域使用 typtypmod 記錄要做用到它們的基礎類型上的 typmod (若是基礎類型不使用 typmod 則爲 -1)。若是這種類型不是域，那麼爲 -1 。
    typndims：若是一個域是數組，那麼 typndims 是數組維數的數值(也就是說，typbasetype 是一個數組類型；域的 typelem 將匹配基本類型的 typelem)。非域非數組域爲零。
    typcollation：指定類型的排序規則。若是類型不支持的排序規則，這將是零。支持排序規則基本類型都會有DEFAULT_COLLATION_OID這裏。在一個collatable類型一個域能夠有一些其餘的排序規則的OID，若是已爲域指定。
    typdefaultbin：若是爲非 NULL ，那麼它是該類型缺省表達式的 nodeToString() 表現形式。目前這個字段只用於域。
    typdefault：若是某類型沒有相關缺省值，那麼 typdefault 是 NULL 。若是 typdefaultbin 不是 NULL ，那麼 typdefault 必須包含一個 typdefaultbin 表明的缺省表達式的人類可讀的版本。若是 typdefaultbin 爲 NULL 但 typdefault 不是，那麼 typdefault 是該類型缺省值的外部表現形式，能夠把它交給該類型的輸入轉換器生成一個常量。
    typacl[1]：用戶對類型的權限。

rolename=xxxx -- privileges granted to a role
        =xxxx -- privileges granted to PUBLIC

            r -- SELECT ("read")
            w -- UPDATE ("write")
            a -- INSERT ("append")
            d -- DELETE
            D -- TRUNCATE
            x -- REFERENCES
            t -- TRIGGER
            X -- EXECUTE
            U -- USAGE
            C -- CREATE
            c -- CONNECT
            T -- TEMPORARY
      arwdDxt -- ALL PRIVILEGES (for tables, varies for other objects)
            * -- grant option for preceding privilege

        /yyyy -- role that granted this privilege

以上就是對系統表pg_type的介紹。下面主要針對每個基礎數據類型分析。算法

2、類型詳解：

一、整數類型

（1）整數類型：

首先是整數類型int二、int4（等價integer）、int8。
爲了方便說明，用下表來講明一下： sql

PostgreSQL類型名 shell	佔位（字節）	C\C++類型名	Java類型名	取值範圍
int2（samllint）	2	short int	short 數據庫	-32,768到32,767
int4（int、integer）	4	int	int	-2,147,483,648到2,147,483,647
int8（bigint）	8	long int	long	-9,223,372,036,854,775,808到9,223,372,036, 854,775,807

在數據庫物理文件存儲的整數數字是以二進制的形式存儲的。下面作一下小實驗：數組

highgo=# create table aa(a1 int2, a2 int4, a3 int8);
CREATE TABLE
highgo=# insert into aa values (204,56797,2863311530);
INSERT 0 1
highgo=# checkpoint ;
CHECKPOINT

經過hexdump（輸出16進制）對二進制文件進行查看：

[root@localhost 12943]# hexdump 16385
0000000 0000 0000 1420 018a 0000 0000 001c 1fd8
0000010 2000 2004 0000 0000 9fd8 0050 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fd0 0000 0000 0000 0000 069b 0000 0000 0000
0001fe0 0000 0000 0000 0000 0001 0003 0800 0018
0001ff0 00cc 0000 dddd 0000 aaaa aaaa 0000 0000
0002000

cc、dddd、aaaaaaaa正好是我插入的三個數字204, 56797, 2863311530。 app

（2）浮點數

float四、float8：這兩個類型有些不一樣，先看看範圍：函數

float4（real）	4	float	float	6 位十進制數字精度
float8（double precision）	8	double	double	15 位十進制數字精度

在源碼中爲：

typedef float float4;
typedef double float8;

存儲方式和C\C++中是相同的。能夠看一下示例：

postgres=# create table floatdouble(f1 float4, d1 float8);
CREATE TABLE
postgres=# insert into floatdouble values (12345, 12345);
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT

看一下物理文件存儲的數據（這裏都是以16進制顯示的）：

[root@localhost 12814]# hexdump 16399
0000000 0000 0000 9bc0 0188 0000 0000 001c 1fd8
0000010 2000 2004 0000 0000 9fd8 0050 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fd0 0000 0000 0000 0000 06b9 0000 0000 0000
0001fe0 0000 0000 0000 0000 0001 0002 0800 0018
0001ff0 e400 4640 0000 0000 0000 0000 1c80 40c8
0002000

12345變爲了 e400 4640（float4），12345變爲了 1c80 40c8。

如今簡單介紹一下float，它的存儲方式爲：

共計32位，摺合4字節。由最高到最低位分別是第 3一、30、2九、……、0位。31位是符號位，1表示該數爲負，0反之。30-23位，一共8位是指數位。22-0位，一共23 位是尾數位。

如今讓咱們按照IEEE浮點數表示法，一步步的將float型浮點數12345轉換爲十六進制代碼。首先數字是正整數，因此符號位爲0，接下來12345的二進制表示爲11000000111001，小數點向左移，一直移到離最高位只有1位，就是最高位的1。即1.1000000111001*2^13，全部的二進制數字最前邊都有一個1，因此能夠去掉，那麼尾數位的精確度其實能夠爲24 bit。再來看指數位，由於是有8 bit，因此只爲可以表示的爲0~255，也能夠說是-128~127，因此指數爲爲正的話，必須加上127，即13+127=140，即10001100。好了，全部的數據都整理好了，如今表示12345的float存儲方式即01000110010000001110010000000000，如今把它轉化爲16進制，即4640 e400，而存儲文件是從下向上寫入的，因此表示爲 e400 4640。

double，它的存儲方式爲：

指數位與尾數部分都要比float增長了長度，因此計算方法仍是同上，只不過如今的指數位要加的是1023，尾數部分自動補更多的零。

注：PostgreSQL 還支持 SQL 標準表示法 float 和 float(p) 用於聲明非精確的數值類型。其中的 p 聲明以二進制位表示的最低可接受精度。在選取 real 類型的時候，PostgreSQL 接受 float(1) 到 float(24)，在選取 double precision 的時候，接受 float(25) 到 float(53) 。在容許範圍以外的 p 值將致使一個錯誤。沒有聲明精度的 float 將被看成 double precision 。

（3）Numeric

數字類型還有一種即是numeric（decimal），這種數據類型是數字當中最爲複雜的一種了，他是一種結構體，在源碼中爲：

typedef int16 NumericDigit;

struct NumericShort
{
	uint16		n_header;		/* Sign + display scale + weight */
	NumericDigit n_data[1];		/* Digits */
};

struct NumericLong
{
	uint16		n_sign_dscale;	/* Sign + display scale */
	int16		n_weight;		/* Weight of 1st digit	*/
	NumericDigit n_data[1];		/* Digits */
};

union NumericChoice
{
	uint16		n_header;		/* Header word */
	struct NumericLong n_long;	/* Long form (4-byte header) */
	struct NumericShort n_short;	/* Short form (2-byte header) */
};

struct NumericData
{
	int32		vl_len_;		/* varlena header (do not touch directly!) */
	union NumericChoice choice; /* choice of format */
};

由於這裏使用的是union，因此咱們能夠對struct從新定義一下，按照在內存中的表現形式：

struct NumericShort_memory
{
	int32		vl_len_;
	uint16		n_header;		
	NumericDigit n_data[1];		
};

struct NumericLong_memory
{
	int32		vl_len_;
	uint16		n_sign_dscale;
	int16		n_weight;		
	NumericDigit n_data[1];	
};

還有一個比較重要的結構體，它的做用是最爲char*和numeric之間進行轉化的中間體：

typedef struct NumericVar
{
	int			ndigits;		/* # of digits in digits[] - can be 0! */
	int			weight;			/* weight of first digit */
	int			sign;			/* NUMERIC_POS, NUMERIC_NEG, or NUMERIC_NAN */
	int			dscale;			/* display scale */
	NumericDigit *buf;			/* start of palloc'd space for digits[] */
	NumericDigit *digits;		/* base-NBASE digits */
} NumericVar;

組成numeric的結構體就有四個，比較複雜，並且基本上都是經過數組進行存儲的，他的範圍爲小數點前爲131072位，小數點後爲16383位。
首先要講的是NumericVar，這是將數據變爲numeirc的第一步，如今以‘12345.678’爲例子講一下答題過程，具體的函數之後可能會繼續講一下。數據庫首先讀取字符串'12345.678'，而後將字符串變爲NumericVar，要說明的是，數據都是存儲到buf（這應該是在物理文件中的補齊所設置的，不過不是特別肯定）和digits中的，好比'12345.678'，是這樣存儲的 0000 0001 2345 6780，這些都是數字存入到數組中。ndigits是指的digits數組元素的個數，這裏就是3，而weight表示的是整數部分所佔用的數組元素個數，不過進行了一系列的運算，在保證有整數部分， weight = （整數部分個數 + 4 - 1）/4 - 1。sign，這是對數字進行標記的，有正負標記。dscale則表示的是小數部分數字個數。

下面主要講一下NumericData，按照上面的順序說明一下各個結構體的結構，

NumericShort，這是數據庫對小數據進行存儲用的格式。其中n_header是對數據的標記，根據正負、類型（指的是數字大小類型：NUMERIC_SIGN_MASK、NUMERIC_POS、NUMERIC_NEG、NUMERIC_SHORT、NUMERIC_NAN）weight進行運算獲得一個標記。n_data和NumericVar中的digits是相同的。

標記的運算：

result->choice.n_short.n_header =
	(sign == NUMERIC_NEG ? (NUMERIC_SHORT | NUMERIC_SHORT_SIGN_MASK)
	    : NUMERIC_SHORT)
	    | (var->dscale << NUMERIC_SHORT_DSCALE_SHIFT)
	    | (weight < 0 ? NUMERIC_SHORT_WEIGHT_SIGN_MASK : 0)
	    | (weight & NUMERIC_SHORT_WEIGHT_MASK);

NumericLong，這是數據庫對大數據進行存儲用的格式。其中n_sign_dscale是對數據的標記，根據正負、類型（指的是數字大小類型：NUMERIC_SIGN_MASK、NUMERIC_POS、NUMERIC_NEG、NUMERIC_SHORT、NUMERIC_NAN）進行運算獲得一個標記。weight和NumericVar的是相同的。n_data和NumericVar中的digits是相同的。

標記的運算：

result->choice.n_long.n_sign_dscale =
	sign | (var->dscale & NUMERIC_DSCALE_MASK);
result->choice.n_long.n_weight = weight;

NumericChoice，這是union，這能引用同一個存儲塊。而後最後總的NumericData，這裏的vl_len_是對數據所佔位計算而來的，計算方法見下。

在Java中能夠用getBigDecimal來讀取數據。

下面看一下物理存儲：

postgres=# create table numerictest (n1 numeric);
CREATE TABLE
postgres=# select pg_relation_filepath('numerictest');
 pg_relation_filepath 
----------------------
 base/12892/16390
(1 row)

postgres=# insert into numerictest values (123),(1234),(12345),(12345.678),(12345.6789),(12345.678901),(12345.123456789);
INSERT 0 7
postgres=# checkpoint ;
CHECKPOINT

[root@localhost 12892]# hexdump 16390
0000000 0000 0000 91b0 0173 0000 0000 0038 1ee0
0000010 2000 2004 0000 0000 9fe0 003a 9fc0 003a
0000020 9fa0 003e 9f78 0042 9f50 0042 9f28 0046
0000030 9f00 004a 9ee0 0036 0000 0000 0000 0000
0000040 0000 0000 0000 0000 0000 0000 0000 0000
*
0001ee0 06a0 0000 0000 0000 0000 0000 0000 0000
0001ef0 0008 0001 0802 0018 0007 0080 0000 0000
0001f00 069f 0000 0000 0000 0000 0000 0000 0000
0001f10 0007 0001 0802 0018 811b 0184 2900 d209
0001f20 2e04 2816 0023 0000 069f 0000 0000 0000
0001f30 0000 0000 0000 0000 0006 0001 0802 0018
0001f40 0117 0183 2900 8509 641a 0000 0000 0000
0001f50 069f 0000 0000 0000 0000 0000 0000 0000
0001f60 0005 0001 0802 0018 0113 0182 2900 8509
0001f70 001a 0000 0000 0000 069f 0000 0000 0000
0001f80 0000 0000 0000 0000 0004 0001 0802 0018
0001f90 8113 0181 2900 7c09 001a 0000 0000 0000
0001fa0 069f 0000 0000 0000 0000 0000 0000 0000
0001fb0 0003 0001 0802 0018 010f 0180 2900 0009
0001fc0 069f 0000 0000 0000 0000 0000 0000 0000
0001fd0 0002 0001 0802 0018 000b d280 0004 0000
0001fe0 069f 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0802 0018 000b 7b80 0000 0000
0002000

這裏列一個表具體的看一下（這裏只說一下short類型的）：

數值	ndigits	digits	16進制	標記	文件存儲
123	1	0123	7b	0x8000	000b 7b80 0000
1234	1	1234	04d2	0x8000	000b d280 0004
12345	2	0001 2345	0001 0929	0x8001	010f 0180 2900 0009
12345.678	3	0001 2456 6780	0001 0929 1a7c	0x8181	8113 0181 2900 7c09 001a
12345.6789	3	0001 2345 6789	0001 0929 1a85	0x8201	0113 0182 2900 8509 001a 0000
12345.678901	4	0001 2345 6789 0100	0001 0929 1a85 0064	0x8301	0117 0183 2900 8509 641a 0000 0000
12345.123465789	5	0001 2345 1234 5678 9000	0001 0929 04d2 162e 2328	0x8481	811b 0184 2900 d209 2e04 2816 0023 0000
0	0	0000	0000	0x8000	0007 0080

注：這裏的16進制是按照digits內存儲的整數轉換的，好比12345在數組digits內爲0001，2345，轉化爲16進製爲0001 0929。
再好比帶有小數的數字例如，12345.678，在數組中爲0001，2345，6780，轉化爲16進製爲0001 0929 1a7c。
這上面的存儲的前兩個字節中的第一個（看起來是第二個），這個值和數據長度vl_len_是相關的，它的計算公式爲：

正常的計算爲：

Short：
len = NUMERIC_HDRSZ_SHORT + n * sizeof(NumericDigit);
Long：
len = NUMERIC_HDRSZ + n * sizeof(NumericDigit);

SET_VARSIZE(result, len);
#define SET_VARSIZE(PTR, len)			SET_VARSIZE_4B(PTR, len)
#define SET_VARSIZE_4B(PTR,len) \
	(((varattrib_4b *) (PTR))->va_4byte.va_header = (((uint32) (len)) << 2))

當數據庫向物理文件進行寫入的時候，數據將會發生改變，計算公式以下：

else if (VARLENA_ATT_IS_PACKABLE(att[i]) &&
	VARATT_CAN_MAKE_SHORT(val))
{
    /* convert to short varlena -- no alignment */
    data_length = VARATT_CONVERTED_SHORT_SIZE(val);
    SET_VARSIZE_SHORT(data, data_length);
    memcpy(data + 1, VARDATA(val), data_length - 1);
			}

注：一個 numeric 類型的標度(scale)是小數部分的位數，精度(precision)是所有數據位的數目，也就是小數點兩邊的位數總和。所以數字 23.5141 的精度爲 6 而標度爲 4 。你能夠認爲整數的標度爲零。

二、貨幣類型

數字類型中的money，也不能說它徹底是數字類型，還可以支持‘$1000.00’，這種格式。在C\C++和Java中都沒有對應的數字類型。他的範圍是-92233720368547758.08 to +92233720368547758.07，int8是它的100倍，它在物理文件存儲爲：

postgres=# create table moneytable(m1 money);
CREATE TABLE
postgres=# insert into moneytable values ('$1')
;
INSERT 0 1
postgres=# select * from moneytable ;
  m1   
-------
 $1.00
(1 row)

postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('moneytable');
 pg_relation_filepath 
----------------------
 base/12814/16467
(1 row)

postgres=# insert into moneytable values ('2')
;
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# insert into moneytable values ('100')
;
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT

[root@localhost 12814]# hexdump 16467
0000000 0000 0000 68e0 019e 0000 0000 0024 1fa0
0000010 2000 2004 0000 0000 9fe0 0040 9fc0 0040
0000020 9fa0 0040 0000 0000 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fa0 06eb 0000 0000 0000 0000 0000 0000 0000
0001fb0 0003 0001 0800 0018 2710 0000 0000 0000
0001fc0 06ea 0000 0000 0000 0000 0000 0000 0000
0001fd0 0002 0001 0800 0018 00c8 0000 0000 0000
0001fe0 06e9 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0900 0018 0064 0000 0000 0000
0002000

每一個值都變爲原來的100倍。這也是爲何能表示兩位小數的緣由。

三、字符類型

字符類型有：char、char(n)、bpchar、bpchar(n)、character(n) 、varchar、varchar(n)、character varying(n)、text、name、cstring。

（1）通常字符類型

char、char(n) 、character(n)、bpchar、bpchar(n)，這些（這些類型都是bpchar的馬甲）是同一種類型，使用的是同一個輸入輸出函數。

character(n) 、varchar、varchar(n)、character varying(n)，這些（這些類型都是varchar的馬甲）是同一種類型，使用的是相同的輸入輸出函數。

text是一種非SQL標準類型，它和上邊除了char單字節外，用的都是相同的結構體：

typedef struct varlena bytea;
typedef struct varlena text;
typedef struct varlena BpChar;	/* blank-padded char, ie SQL char(n) */
typedef struct varlena VarChar; /* var-length char, ie SQL varchar(n) */

struct varlena
{
	char		vl_len_[4];		/* Do not touch this field directly! */
	char		vl_dat[1];
};

這裏還要說一個類型cstring，這個類型，在C中爲char*。不能做爲一個類型對字段進行定義。它和text的關係比較近。

在textin中是這麼定義的：

Datum
textin(PG_FUNCTION_ARGS)
{
	char	   *inputText = PG_GETARG_CSTRING(0);

	PG_RETURN_TEXT_P(cstring_to_text(inputText));
}

text *
cstring_to_text(const char *s)
{
	return cstring_to_text_with_len(s, strlen(s));
}

text *
cstring_to_text_with_len(const char *s, int len)
{
	text	   *result = (text *) palloc(len + VARHDRSZ);

	SET_VARSIZE(result, len + VARHDRSZ);
	memcpy(VARDATA(result), s, len);

	return result;
}

這裏對text的處理只是在cstring基礎上加了一個長度而已。其餘的類型處理仍是比較多的。

這裏bpchar對數據的存儲爲當聲明長度的時候，輸入函數會對輸入的數據進行判斷，當長度大於聲明的長度時，數據庫會中斷請求，報錯。當小於時，函數會對數據進行填補空格，直到達到長度爲止。
varchar的輸入函數不會對數據進行補白，可是當聲明長度時，超過期，一樣會報錯。
text不須要進行長度聲明，它的存儲幾乎沒有限制。

可是，這些存儲確實是有限制的：

if (*tl > MaxAttrSize)
    ereport(ERROR,
    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
        errmsg("length for type %s cannot exceed %d",
            typename, MaxAttrSize)));

#define MaxAttrSize		(10 * 1024 * 1024)

這裏的限制大小是10GB，可是還有一個數據庫自己對文件的限制：

Maximum size for a database? unlimited (32 TB databases exist)
Maximum size for a table? 32 TB
Maximum size for a row? 400 GB
Maximum size for a field? 1 GB
Maximum number of rows in a table? unlimited
Maximum number of columns in a table? 250-1600 depending on column types
Maximum number of indexes on a table? unlimited

因此目前對字段最大存儲爲1GB。

下面介紹一下在物理文件存儲的格式：

創建表test：

postgres=# create table test(t1 char, t2 char(10), t3 varchar, t4 varchar(10), t5 bpchar, t6 text);
CREATE TABLE
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('test');
 pg_relation_filepath 
----------------------
 base/12892/16490
(1 row)

插入數值：

postgres=# insert into test values ('a','a','a','a','a','a');
INSERT 0 1
postgres=# insert into test values ('b','b','b','b','b','b');
INSERT 0 1
postgres=# insert into test values ('a','aa','aa','aa','aa','aa');
INSERT 0 1
postgres=# insert into test values ('b','bb','bb','bb','bb','bb');
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select * from test;
 t1 |     t2     | t3 | t4 | t5 | t6 
----+------------+----+----+----+----
 a  | a          | a  | a  | a  | a
 b  | b          | b  | b  | b  | b
 a  | aa         | aa | aa | aa | aa
 b  | bb         | bb | bb | bb | bb
(4 rows)

看一下物理文件：

[root@localhost 12892]# hexdump 16490
0000000 0000 0000 ab48 0189 0000 0000 0028 1f30
0000010 2000 2004 0000 0000 9fd0 005a 9fa0 005a
0000020 9f68 0062 9f30 0062 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
0001f30 06de 0000 0000 0000 0000 0000 0000 0000
0001f40 0004 0006 0802 0018 6205 6217 2062 2020
0001f50 2020 2020 0720 6262 6207 0762 6262 6207
0001f60 0062 0000 0000 0000 06dd 0000 0000 0000
0001f70 0000 0000 0000 0000 0003 0006 0802 0018
0001f80 6105 6117 2061 2020 2020 2020 0720 6161
0001f90 6107 0761 6161 6107 0061 0000 0000 0000
0001fa0 06dc 0000 0000 0000 0000 0000 0000 0000
0001fb0 0002 0006 0802 0018 6205 6217 2020 2020
0001fc0 2020 2020 0520 0562 0562 0562 0062 0000
0001fd0 06db 0000 0000 0000 0000 0000 0000 0000
0001fe0 0001 0006 0802 0018 6105 6117 2020 2020
0001ff0 2020 2020 0520 0561 0561 0561 0061 0000
0002000

字段類型	文本內容	物理文件內容	文本內容	物理文件內容	文本內容	物理文件內容	文本內容	物理文件內容
char	a	0x6105	b	0x6205	a	0x6105	b	0x626207
char(10)	a	0x6117 2020 2020 2020 2020 20	b	0x6217 2020 2020 2020 2020 20	aa	0x6117 2062 2020 2020 2020 20	bb	0x6217 2062 2020 2020 2020 20
varchar	a	0x6105	b	0x6205	aa	0x626207	bb	0x626207
varchar(10)	a	0x6105	b	0x6205	aa	0x626207	bb	0x626207
bpchar	a	0x6105	b	0x6205	aa	0x626207	bb	0x626207
text	a	0x6105	b	0x6205	aa	0x626207	bb	0x626207

這裏的數據都受到SET_VARSIZE_SHORT的影響，表示長度的位置標爲1字節，而後進行計算。

還要說明的是，當數據達到必定長度時，數據庫會對數據進行壓縮，主要是採用的TOAST機制。採用了一種LZ壓縮算法，這是一種無損壓縮算法，該算法在函數toast_compress_datum 中進行了具體實現。簡單來講，LZ壓縮算法被認爲是基於字符串匹配的算法。LZ算法壓縮算法的詳情，能夠參閱相關文獻，這裏就很少展開了。

（2）name

name：基礎類型，在C\C++中沒有直接對應的類型，在源碼中是這樣定義的：

typedef struct nameData
{
	char		data[NAMEDATALEN];
} NameData;
typedef NameData *Name;

，在物理文件的存儲以下：

postgres=# create table nametable(n1 name);
CREATE TABLE
postgres=# insert into nametable values ('liu');
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('nametable');
 pg_relation_filepath 
----------------------
 base/12814/16461
(1 row)

[root@localhost 12814]# hexdump 16461
0000000 0000 0000 5528 019b 0000 0000 001c 1fa8
0000010 2000 2004 0000 0000 9fa8 00b0 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fa0 0000 0000 0000 0000 06de 0000 0000 0000
0001fb0 0000 0000 0000 0000 0001 0001 0800 0018
0001fc0 696c 0075 0000 0000 0000 0000 0000 0000
0001fd0 0000 0000 0000 0000 0000 0000 0000 0000
*
0002000

liu = 6c 69 75（16進制）。

四、日期時間類型

這裏列舉數據庫支持的日期類型的大概信息：

名字	存儲空間（單位：字節）	描述	最低值	最高值	Resolution
timestamp [ (p) ] [ without time zone ]	8	日期和時間	4713 BC	294276 AD	1 microsecond / 14 digits
timestamp [ (p) ] with time zone	8	日期和時間，帶時區	4713 BC	294276 AD	1 microsecond / 14 digits
date	4	只用於日期	4713 BC	5874897 AD	1 day
time [ (p) ] [ without time zone ]	8	只用於一日內時間	00:00:00	24:00:00	1 microsecond / 14 digits
time [ (p) ] with time zone	12	只用於一日內時間，帶時區	00:00:00+1459	24:00:00-1459	1 microsecond / 14 digits
interval [ fields ] [ (p) ]	12	時間間隔	-178000000 years	178000000 years	1 microsecond / 14 digits

（1）date

這裏首先要說明的是date類型，它的定義其實很簡單：

typedef int32 DateADT;

PostgreSQL按照儒略日(Julian day,JD)，即公元前4713年1月1日做爲起始，具體的緣由這裏就不去探究了。

它實際上是一個整型數字，之因此可以表示 'yyyy-mm-dd'的緣由主要是date類型的輸入輸出函數。它對輸入的字符，即格式爲'yyyy-mm-dd'或'yyyy:mm:dd'或'yyyy.mm.dd'的字符串進行讀取，而後進行一系列的運算而後獲得一個32bits的數字，存入到物理文件中。好比'2012-12-08'存入數據庫中爲4725。

postgres=# create table datetest(d1 date);
CREATE TABLE
postgres=# insert into datetest values ('2012-12-8');
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('datetest');
 pg_relation_filepath 
----------------------
 base/12892/16499
(1 row)

[root@localhost 12892]# hexdump 16499
0000000 0000 0000 5380 018c 0000 0000 001c 1fe0
0000010 2000 2004 0000 0000 9fe0 0038 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fe0 06e4 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0800 0018 1275 0000 0000 0000
0002000

0x1275即4725。

（2）time和time with time zone

這裏的time和time with time zone，表示時間的部分和date相似都是整型。爲了增長時區，這裏有新的結構體TimeTzADT，它們的源碼爲：

#ifdef HAVE_INT64_TIMESTAMP
typedef int64 TimeADT;
#else
typedef float8 TimeADT;
#endif

typedef struct
{
	TimeADT		time;			/* all time units other than months and years */
	int32		zone;			/* numeric time zone, in seconds */
} TimeTzADT;

這裏對事件的存儲，是按照秒數來計算的，而且因爲可以表示到小數點後6位，在此擴大了1000000倍。即，10：10：10.000001表示爲數字36610000001。
還有對時區的存儲也是表示爲秒數，好比正八區（+8：00：00）爲-28800，即0xFFFF8F80。

postgres=# create table timeandtimetz(t1 time, t2 timetz);
CREATE TABLE
postgres=# insert into timeandtimetz values ('10:10:10.000001', '10:10:10.000001 +8:00:00');
the y is 2014 , the m is 4 , the d is 21
the century is 68
the julian1 is 2454943
the julian2 is 2456595
the julian3 is 2456769
the time_in time is 36610000001
the timetz_in time is 36610000001
the timetz_in tz is -28800
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('timeandtimetz');
 pg_relation_filepath 
----------------------
 base/12892/16508
(1 row)

[root@localhost 12892]# hexdump 16508
0000000 0000 0000 1308 018f 0000 0000 001c 1fd0
0000010 2000 2004 0000 0000 9fd0 0058 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fd0 06ec 0000 0000 0000 0000 0000 0000 0000
0001fe0 0001 0002 0800 0018 4481 8620 0008 0000
0001ff0 4481 8620 0008 0000 8f80 ffff 0000 0000
0002000

（3）timestamp 和 timestamp with time zone

這兩個類型都包含了日期與時間，惟一不一樣的地方即是timestamp with time zone帶有時區，它們的定義爲：

typedef int64 Timestamp;
typedef int64 TimestampTz;

一樣是通過一系列的轉換，公式，將格式爲'yyyy-mm-dd hh:mm:ss +/-hh:mm:ss'，變爲一個長整型。好比：'2013-1-1 20:00:00.000001'，爲410385600000001；'2013-1-1 20:00:00.000001 +8:00:00'爲410356800000001。

postgres=# create table timesandtimestz(t1 timestamp(6), t2 timestamptz(6));
CREATE TABLE
postgres=# insert into timesandtimestz values ('2013-1-1 20:00:00.000001', '2013-1-1 20:00:00.000001 +8:00:00');
the y is 2013 , the m is 1 , the d is 1
the century is 68
the julian1 is 2454213
the julian2 is 2455865
the julian3 is 2456294
the y is 2013 , the m is 1 , the d is 1
the century is 68
the julian1 is 2454213
the julian2 is 2455865
the julian3 is 2456294
timestamp_in timestamp is 410385600000001
the y is 2013 , the m is 1 , the d is 1
the century is 68
the julian1 is 2454213
the julian2 is 2455865
the julian3 is 2456294
timestamptz_out timestamptz is 410356800000001
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('timesandtimestz');
 pg_relation_filepath 
----------------------
 base/12892/16528
(1 row)

[root@localhost 12892]# hexdump 16528
0000000 0000 0000 0488 0196 0000 0000 001c 1fd8
0000010 2000 2004 0000 0000 9fd8 0050 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fd0 0000 0000 0000 0000 06fc 0000 0000 0000
0001fe0 0000 0000 0000 0000 0001 0002 0800 0018
0001ff0 b001 57e8 753e 0001 9001 a34b 7537 0001
0002000

（4）interval

interval，時間間隔類型，這個反而是全部時間類型當中最複雜的數據類型。

typedef struct
{
	TimeOffset	time;			/* all time units other than days, months and
								 * years */
	int32		day;			/* days, after time for alignment */
	int32		month;			/* months and years, after time for alignment */
} Interval;

typedef int64 TimeOffset;

這裏只是一個混合的結構體。

注：這裏的時間類型格式還有其餘形式，我這就不一一列舉了，大致過程相似，都是將日期變爲數字，進行存儲。

五、對象標識符類型

oid：基礎類型，佔位4字節。下面是Oid的定義：

typedef unsigned int Oid;

徹底按照ascii碼錶示的。

postgres=# create table oidt(o1 oid);
CREATE TABLE
postgres=# insert into oidt values (1);
INSERT 0 1
postgres=# insert into oidt values (2);
INSERT 0 1
postgres=# insert into oidt values (1);
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('oidt');
 pg_relation_filepath 
----------------------
 base/12892/16522
(1 row)

[root@localhost 12892]# hexdump 16522
0000000 0000 0000 cf08 0193 0000 0000 0024 1fa0
0000010 2000 2004 0000 0000 9fe0 0038 9fc0 0038
0000020 9fa0 0038 0000 0000 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fa0 06f8 0000 0000 0000 0000 0000 0000 0000
0001fb0 0003 0001 0800 0018 0001 0000 0000 0000
0001fc0 06f7 0000 0000 0000 0000 0000 0000 0000
0001fd0 0002 0001 0800 0018 0002 0000 0000 0000
0001fe0 06f6 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0800 0018 0001 0000 0000 0000
0002000

六、布爾型

bool：基礎類型，佔位1字節。以0、1來表示false, true。

postgres=# create table boolt(b1 bool);
CREATE TABLE
postgres=# insert into boolt values ('t'),('f');
INSERT 0 2
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('boolt');
 pg_relation_filepath 
----------------------
 base/12892/16525
(1 row)

[root@localhost 12892]# hexdump 16522
0000000 0000 0000 cf08 0193 0000 0000 0024 1fa0
0000010 2000 2004 0000 0000 9fe0 0038 9fc0 0038
0000020 9fa0 0038 0000 0000 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fa0 06f8 0000 0000 0000 0000 0000 0000 0000
0001fb0 0003 0001 0800 0018 0001 0000 0000 0000
0001fc0 06f7 0000 0000 0000 0000 0000 0000 0000
0001fd0 0002 0001 0800 0018 0002 0000 0000 0000
0001fe0 06f6 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0800 0018 0001 0000 0000 0000
0002000

七、二進制類型

bytea，二進制類型，和text等用的相同的結構體，一樣受到數據庫的限制。

typedef struct varlena bytea;

postgres=# create table byteat(b1 bytea);
CREATE TABLE
postgres=# insert into byteat values ('ab');
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT
postgres=# select pg_relation_filepath('byteat');
 pg_relation_filepath 
----------------------
 base/12892/16516
(1 row)

postgres=# insert into byteat values ('abcde');
the data_length is 6
INSERT 0 1
postgres=# checkpoint ;
CHECKPOINT

[root@localhost 12892]# hexdump 16516
0000000 0000 0000 b558 0192 0000 0000 0020 1fc0
0000010 2000 2004 0000 0000 9fe0 0036 9fc0 003c
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0001fc0 06f4 0000 0000 0000 0000 0000 0000 0000
0001fd0 0002 0001 0802 0018 610d 6362 6564 0000
0001fe0 06f3 0000 0000 0000 0000 0000 0000 0000
0001ff0 0001 0001 0802 0018 6107 0062 0000 0000
0002000