By contrast, a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. The content of such spam may often vary in its details, which would render normal checksumming ineffective. The idea of fuzzy checksum was developed for detection of email spam by building up cooperative databases from multiple ISPs of email suspected to be spam. This feature generally increases the cost of computing the checksum. The checksum algorithms most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero.
This variant, too, detects any single-bit error, but the pro modular sum is used in SAE J1708.
To validate a message, the receiver adds all the words in the same manner, including the checksum if the result is not a word full of zeros, an error must have occurred. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/ n.Ī variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. Also swapping of two or more words will not be detected. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum.
To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum if the result is not a word consisting of n zeros, the receiver knows a transmission error occurred. The result is appended to the message as an extra word. The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or (XOR) of all those words. Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.Īlgorithms Parity byte or parity word For cryptographic systems with these two specific design goals, see HMAC.Ĭheck digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Checksums are used as cryptographic primitives in larger authentication algorithms. For instance, a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum. However, each of those concepts has different applications and therefore different design goals. This is especially true of cryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.Ĭhecksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. Depending on its design goals, a good checksum algorithm usually outputs a significantly different value, even for small changes made to the input. The procedure which generates this checksum is called a checksum function or checksum algorithm. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity. Effect of a typical checksum function (the Unix cksum utility)Ī checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage.