Hamming It Up
I started reading a chapter on error correcting codes last night. But as I skim through again today, it's pretty clear that none of it really got through to me. So let's work through an example. Couldn't hurt:
Let's run the (7,4) Hamming code end to end
The what? Shit, I remember enough from last night to know that I should understand what I just read. Let's step back and recap:
// basically this
class HammingCode {
private:
const int codeword_len;
const int message_len;
static int validate(int n, int k) {
if (n <= 0 || k < 0 || k >= n)
throw std::invalid_argument("0 <= k < n is a must");
const int r = n - k; // parity bits
if ((1u << r) < static_cast<unsigned>(n) + 1u)
throw std::invalid_argument("2^(n-k) must be >= n + 1");
return n;
}
public:
HammingCode(int n, int k): codeword_len(validate(n, k)), message_len(k) {};
};
Basically the first number tells us the length of the entire codeword (check and message bits). The second number (must be smaller than the first) is the length of the actual message.
And "must be smaller" is... weak. You can see a more rigorous constraint definition given above:
$$2^{n-k} \geq n + 1$$
Anyway, lots of preliminaries. How about the example:
Let's encode the message **1 0 1 1**. Drop those four bits into the data positions 3, 5, 6, 7:
position: 1 2 3 4 5 6 7
value: ? ? 1 ? 0 1 1
Oh f🙂ck off... what the hell are data positions?
Anything that isn't a power of two. Because those are the **check** positions (for reasons... we'll get back to that, I'm positive).
so 1, 2, 4, ..., etc being reserved, we get 3, 5, 7, ...,
What's the deal with powers of two?
They're just a single bit at some position:
1: 0 0 1
2: 0 1 0
4: 1 0 0
...
The "f🙂ck you" moment comes from how the checks are defined. And I'll spoil the suprise so we can talk through the explanation next: when a single bit gets corrupted, the checks "spell out" precisely which bit flipped.
Yeah, I've got some ques...
So checks follow the powers of two (1, 2, 4, ...), that single bit shifting up and down. They tie together every bit in the codeword sitting at a position that has the two's bit flipped. Example:
Check 1 (0 0 1) - gets pos1 (001), pos3(011), pos5(101), pos7 (111)
^
Where pos 1, 2, and 4 are the parity bits. By our rule, these appear in at most one check.
...dialtone
Ok, great. Yeah, so we now effectively have these flags telling us when something is wrong with their cohort of bits. We define 'wrong' like this:
Check 1: pos1 ⊕ pos3 ⊕ pos5 ⊕ pos7 = 0
The definition of XOR then implies:
pos1 = pos3 ⊕ pos5 ⊕ pos7
So we know how to compute our parity bit value for each cohort. Say:
pos1 = 1 ⊕ 0 ⊕ 1 => 0
We now have all the pieces needed for our checks. And remember this critical detail: parity bits serve a single check at most (powers of two - a single bit), but all other positions will be tracked by multiple checks: wherever the check 'index' overlaps with the position. pos5 will be tracked by checks 1 and 4, since 5 (1 0 1) is made up of 1 (0 0 1) and 4 (1 0 0). That means, when position 5 is corrupted, checks 1 and 4 will fail. And, crucially, what emerges when you string together the results of every check is the position itself (1 0 1) - 5.
This reminds me of a binary search. I don't know that it's really fair to characterize the process as such. But it appears that every failed check effectively halves the pool of possible corrupted bits (choosing between those with the nth bit set or not), before the culprit necessarily emerges from the narrowing.