Prior Page     Next Page     This Chapter    Prior Chapter






RCS Hash

The first algorithm let us call the RCS hashing scheme, since it is one used in the Revision Control System, a Unix programming package, for keeping track of source code and its revisions. The RCS hashing scheme works only for short, less than 17 characters, ASCII strings. The RCS scheme can be written:


FUNCTION rcshash ( key: STRING, b: longint ): longint;
{ key is the string to hash
   b is number of hash values: [0..b-1] }
VAR
n, h, i: longint;
BEGIN
    n := LENGTH( key );
    h := 0;
    FOR i := 1 TO n DO
        h := BXOR( BSL( h, 2 ), ORD(key[i]) );



    h := BAND( h, $7fffffff );  /* get rid of random sign bit */
    IF ( b > 0 )
    THEN rcshash := h MOD b
    ELSE rcshash := h;

END;

For rcshash the integer b must be an odd number. Values of b evenly divisible by 2 will seldom have good hash distributions. The rcshash function does not do well on keys, longer than 16 characters. The developing hash value, h, is a 32 bit integer, every new character pushes data over two bits, so after 16 characters all data from the beginning of the key will just fall off the top end of the integer. Only the last 16 characters contribute to the hash value. For small ASCII character strings this hash function works fast and the code is small and compact.

So it is important to understand the output of a hash function in terms of the input. For RCS Hash, although it is fast the base hash value (the one used before the mod operation) is limited in value by the number of characters in the key. Until you have keys that have 16 characters in them, you do not get 31-bit hash values. In fact using this hash function for small words (under 5-chars) the base hash value never exceeds 16-bits.

As another example of how Pascal could be translated into Hermes machine code, here is the Hermes Assembly Language version of RCS Hash.


RCSHASH  PROC //(key: STRING, b: longint)
//               returns hash in R1



//               key is the null char terminated string
//               b is number of hash values: [0..b-1]

          ST   R15,SaveR15
          ST   R0,SaveR0
          ST   R2,SaveR2

          LA   R0,0             // clear R0
          LA   R1,0             // h := 0;
          LD   R2,4[R15]

NEXT:    LB   R0,0[R2]
          ADD  R0,=0=           // Set the condition code
          JZRO DONE             // Check for end of string
          ST   R0,tmp
          SHFT R1,2
          XOR  R1,tmp           // h:=BXOR(BSL(h,2),ORD(key[i]));
          JMP  NEXT

DONE:    AND  R1,=0x7fffffff=  // get rid of random sign bit

          L    R15,SaveR15
          L    R0,8[R15]
          ADD  R0,=0=           // Set Condition code
          JZRO EXIT

          L    R0,=0=
          DIV  R0,8[R15]        // rcshash := h MOD b

EXIT:    L    R0,SaveR0
          L    R2,SaveR2
          L    R15,SaveR15
          JSJ  8[R15]

tmp      CON 0
SaveR0   CON 0
SaveR2   CON 0
SaveR15  CON 0

          END  RCSHASH








Prior Page     Next Page     This Chapter    Prior Chapter


Copyright © 1995 Robert Uzgalis. All Rights Reserved.
Contact: buz@cs.aukuni.ac.nz