CRC

昨天同事作redis技術分享, 講到了使用CRC算法用於輔助定位key在redis分片中的哈希槽值。如今給本身掃盲下CRC算法, 直接從Wikipedia拷貝其產生的背景及定義,翻譯下。html

Cyclic redundancy check(循環冗餘校驗)
From Wikipedia, the free encyclopedia

------------


/* 介紹該算法目的 */
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. 
循環冗餘校驗是一個錯誤校驗碼,一般在網絡和存儲設備中 監測對原數據的意外修改(根據錯誤校驗碼斷定原始數據是否被破壞)

------------


Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. 
進入系統的數據塊會被附加一個簡短的校驗值,這個校驗值怎麼獲得的呢?——> 基於數據塊中的多項式除法的餘數

------------


On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction (see bitfilters).[1]
在檢索原始數據時, 會基於多項式除法重複計算, 在檢查值不匹配的狀況下, 能夠對損壞的數據糾正。CRCs除了檢測原始數據是否被破壞,還能夠用於糾錯。

------------


CRCs are so called because the check (data verification) value is a redundancy (it expands the message without adding information) and the algorithm is based on cyclic codes. CRCs are popular because they are simple to implement in binary hardware, easy to analyze mathematically, and particularly good at detecting common errors caused by noise in transmission channels. Because the check value has a fixed length, the function that generates it is occasionally used as a hash function.
之因此稱之爲CRCs,在於數據校驗碼是原始數據塊的冗餘值(它無需增長信息,只擴展了原始數據塊)而且冗餘值的算法基於循環代碼。CRCs算法容易被接受,爲啥呢 ——> 1. 該算法在二進制硬件中實現起來簡單,易於數學分析;2. 尤爲善於校驗 由噪音(啥噪音呢——> 消息通道之間傳輸信息而產生的噪音)引發的常見錯誤。通常來講, 校驗碼都有個固定的長度,生成校驗碼的函數有時候也用於哈希函數, 好比redis 肯定key 所在的哈希槽位置。

------------


The CRC was invented by W. Wesley Peterson in 1961; the 32-bit CRC function of Ethernet and many other standards is the work of several researchers and was published in 1975.

------------


/* CRC的自我介紹*/
Introduction
CRCs are based on the theory of cyclic error-correcting codes. The use of systematic cyclic codes, which encode messages by adding a fixed-length check value, for the purpose of error detection in communication networks, was first proposed by W. Wesley Peterson in 1961.[2] Cyclic codes are not only simple to implement but have the benefit of being particularly well suited for the detection of burst errors: contiguous sequences of erroneous data symbols in messages. This is important because burst errors are common transmission errors in many communication channels, including magnetic and optical storage devices. Typically an n-bit CRC applied to a data block of arbitrary length will detect any single error burst not longer than n bits and the fraction of all longer error bursts that it will detect is (1 − 2^−n).
CRCs算法的產生是基於循環錯誤糾正碼的思想。系統循環碼經過在原始數據塊(消息)中增長一段固定長度的校驗值以達到在網絡通訊中檢測錯誤的目的。循環碼不只易於實現並且也很適合對突發錯誤的監測,啥是突發錯誤呢 ——> 原始數據塊中的一段連續的錯誤數據標識。檢測到這種突發錯誤很重要,由於在不少通訊通道中(也包含磁性和光存儲設備通訊)突發錯誤就是一般的傳輸錯誤。一般一個做用於任意長度的數據塊的n-bit的CRC算法會檢測到任何單個不會長過n-bit的突發錯誤,而且該算法校驗的全部比n-bit更長的錯誤突發機率是 1-2^-n

------------


Specification of a CRC code requires definition of a so-called generator polynomial. This polynomial becomes the divisor in a polynomial long division, which takes the message as the dividend and in which the quotient is discarded and the remainder becomes the result. The important caveat is that the polynomial coefficients are calculated according to the arithmetic of a finite field, so the addition operation can always be performed bitwise-parallel (there is no carry between digits).
循環冗餘碼的規範須要一個叫作生成器多項式的定義。這個多項式成爲多項式長除法中的除數, 多項式長除法會把消息做爲被除數而且獲得的商會被丟棄餘數是結果(<消息或者叫作數據塊>對<定義的生成器多項式>取餘獲得的結果就是冗餘校驗碼)。值得注意的是多項係數會按照有限域算法,所以加法操做老是被按位並行執行(位之間無進位)。

------------


In practice, all commonly used CRCs employ the Galois field of two elements, GF(2). The two elements are usually called 0 and 1, comfortably matching computer architecture.
在實踐中, 全部經常使用的 CRCs 使用兩個元素的伽羅瓦場, GF (2)。這兩個元素一般被稱爲0和 1, 恰好匹配計算機體系結構。

------------


A CRC is called an n-bit CRC when its check value is n bits long. For a given n, multiple CRCs are possible, each with a different polynomial. Such a polynomial has highest degree n, which means it has n + 1 terms. In other words, the polynomial has a length of n + 1; its encoding requires n + 1 bits. Note that most polynomial specifications either drop the MSB or LSB, since they are always 1. The CRC and associated polynomial typically have a name of the form CRC-n-XXX as in the table below.
當校驗值是n位長度時,CRC會被稱做n位循環冗餘校驗。對於給定的n(位)可能組合多個CRCs, 每一個CRC都對應一個惟一的多項式。一個最高位爲n的多項式表示的是它有n+1項。換句話說, 多項式的長度爲n+1; 他的編碼須要n+1位。注意大多數的多項式規範會丟棄MSB(最高有效位) 或者 LSB(最低有效位)由於他們最高位或者最低位一直是1。CRC和其關聯的多項式一般會有一個諸如這種命名: CRC-n-XXX

------------


The simplest error-detection system, the parity bit, is in fact a 1-bit CRC: it uses the generator polynomial x + 1 (two terms), and has the name CRC-1.
最簡單的錯誤校驗系統, 奇偶校驗位其實是1位CRC:它使用的生成多項式 x + 1(2項),將它命名爲CRC-1。

------------

/* CRC的應用 */
Application
A CRC-enabled device calculates a short, fixed-length binary sequence, known as the check value or CRC, for each block of data to be sent or stored and appends it to the data, forming a codeword.
具備循環冗餘校驗的設備會計算出一段固定長度的二進制序列,稱這個二進制序列爲校驗值或者CRC, 遍歷每一個要被髮送或者存儲的數據塊,將計算出的校驗值附加到每一個數據塊中,造成一個碼字(代號)——> (block of data) + (check value) = codeword

------------


When a codeword is received or read, the device either compares its check value with one freshly calculated from the data block, or equivalently, performs a CRC on the whole codeword and compares the resulting check value with an expected residue constant.
當碼字被接受或者讀取時, 設備會比較碼字中的校驗碼和新計算出來的校驗碼,或者使用其餘對等的對比方式:對整個碼字執行CRC算法,將其與一個指望的常數比較。

------------


If the CRC values do not match, then the block contains a data error.
若是CRC校驗值不匹配,那麼證實該數據塊包含數據錯誤。

------------


The device may take corrective action, such as rereading the block or requesting that it be sent again. Otherwise, the data is assumed to be error-free (though, with some small probability, it may contain undetected errors; this is inherent in the nature of error-checking).[3]
當與校驗值匹配失敗時,設備可能會糾正錯誤,好比從新讀取數據塊或者請求發送方從新發送數據塊。若是匹配成功,被傳過來的數據塊被認爲沒有錯誤。

------------

/* 數據整合?! */
Data integrity
CRCs are specifically designed to protect against common types of errors on communication channels, where they can provide quick and reasonable assurance of the integrity of messages delivered. However, they are not suitable for protecting against intentional alteration of data.
循環冗餘校驗通常用於防止在通訊過程當中普通類型的錯誤,在網絡通訊場景中循環冗餘校驗能夠爲數據塊整合提供快速合理的保證。但是,循環冗餘校驗不能保證故意篡改數據。啥意思呢 ——> 只能最大限度保證數據的正確性, 而不能保證數據安全性。

------------


Firstly, as there is no authentication, an attacker can edit a message and recompute the CRC without the substitution being detected. When stored alongside the data, CRCs and cryptographic hash functions by themselves do not protect against intentional modification of data. Any application that requires protection against such attacks must use cryptographic authentication mechanisms, such as message authentication codes or digital signatures (which are commonly based on cryptographic hash functions).
首先,因爲沒有安全驗證, 黑客能夠編輯該消息而且從新計算CRC(校驗值),並不會監測到數據塊已經被修改。當校驗值被存儲在數據塊中時,CRCs和用於加密的哈希函數本身並不會防止數據被惡意篡改。任何須要防止數據被惡意篡改的應用必須使用加密驗證機制, 例如消息認證碼或者數字簽名(數字簽名一般基於機密的哈希函數)。

------------


Secondly, unlike cryptographic hash functions, CRC is an easily reversible function, which makes it unsuitable for use in digital signatures.[4]
第二,不像用於加密的哈希函數,循環冗餘校驗是一個很容易可逆的函數,這會使得它不適用於數字簽名。

------------


Thirdly, CRC is a linear function with a property that
![CRC](https://oscimg.oschina.net/oscnet/e2a8377c003a01f7649c6f6406a7726cf90.jpg "CRC")
as a result, even if the CRC is encrypted with a stream cipher that uses XOR as its combining operation (or mode of block cipher which effectively turns it into a stream cipher, such as OFB or CFB), both the message and the associated CRC can be manipulated without knowledge of the encryption key; this was one of the well-known design flaws of the Wired Equivalent Privacy (WEP) protocol.[5]
第三,CRC是一個具備一個特定屬性的線性函數。
所以,即便CRC被加密,消息和與其關聯的CRC也能夠在不知道加密key的狀況下備操做;這是有線對等隱私 (WEP) 協議的良好設計缺陷之一。
相關文章
相關標籤/搜索