以前部署公司BI項目例子,發現數據庫表都沒有設置主鍵、外鍵,一直覺得是模擬項目,不嚴謹要求的緣由。今天才知道數據倉庫原本就不設計主鍵和外鍵。這些約束在ETL編程的時候就該作好,保證在知足數據源約束的全部數據均可以流進數據倉庫。 數據庫
Primary Key is there ... but Enforcing the primary key constraint in database level is not required. If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table. Why not a Surrogate Key then? Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database. Enforcing Database Constraints When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).
不須要主鍵,由於該記錄各個維度的組合已經惟必定義了該記錄。 ide
不須要代理鍵,由於咱們用索引檢索數據,也用不到代理鍵關聯其餘表。 ui
不須要多餘的約束,由於咱們優先保證全部清洗出來的數據都可以流進倉庫。確保數據完備和一致是ETL編碼的事。 this