This last hash function is also named after its inventor; it will generate up to 232 different hash values, but it requires more Pascal initialization code. In most other languages the initialization can be done statically in the program; so no initialization call, or activity is necessary.
In this hash like
pkphash the noise added to the argument comes from a randomized table. In pkphash it was a permuted array of characters, like the key in a character substitution cipher. For buzhash it is a set of 32-bit `random' aliases for each character. These aliases are made so that for every bit position exactly one half of the aliases have a one and the other half have a zero. These aliases are stored in the table buztbl, which is created in this format by the initialization process. Hashing is done by exclusive-oring together the random aliases for each character in the key. For each character each exclusive-or has exactly a 50% chance of inverting every bit. To make sure that repeated characters will add noise into a different spot in the developing hash value, h, it is rotated one bit for each character inserted.
This means that if keys commonly repeat after exactly 32 characters
the input set will not hash well.
When keys commonly repeat at exactly 32 characters,
one can correct this buzhash deficiency by either just
hashing the first 32 characters, or by adding a common
character, like blank, at the beginning of every key
(thus the repeat will happen after 33 characters).
The chance of running into this problem and having it affect you
is really small, so small it is worth forgetting.
Another more general solution is to switch alias tables after every
thirty-two characters to inject additional information.
But keys longer than thirty-two characters are so rare, that to
build this into the algorithm would slow every hash down for
such a small proportion of hash keys that it is not worth considering.