Java is strongly typed and as such it includes built-in hash functions which hash every base type and return a signed integer. By including the hash function as a part of the language API, Java took the responsibility for hashing away from the user. This puts much stronger requirements on the Java specification to define properties of the hash-function that programmers can use to do what they want.
Following is an evaluation of the current Java hash functions. There are three levels of criticism: language design, specification, and implementation. The design is implicit in the specification and implementation. I have used the December 1995 version of the Java specification. and I have used the Sun Java 1.0.1 Java development kit as ported to Linux as the basis of implementation criticism. But most of the criticism is directed not at the implementation but at the design and coincidentally the specification. The specification has many problems, and it is still evolving.
The current Java implementation of the string hash function
does not distribute hash values well. Here is an example of hash
values from some short strings.
|
These hash values should represent a random distribution drawn
from the integers. However instead they reflect a small,
tight, highly organized, portion of the integers. In
particular the hash values should not have any numeric
relationship with one another, nor should they reflect the
numeric relationship of the original keys. These hash
values represent a violation of both the first and second
criteria for library hash functions.
Using a better hash function, like the one in the appendix to
this paper, one might get something like the following:
|
Java uses 32-bit integers as the hash value range. Conceptually hash values from any class should be uniformly distributed across the largest integer domain available in the language. In Java this is
long
, which is specified as 64 bits.
No guidelines in Java are provided for creating hashing values from other hash values as a part of an abstract type. If I create a new type how do I build a hash value out of the hash values of its components? Clearly the components that should go into a hash value are the components which determine if one value of a type is equal to another value of the same type. But no guidance is given as how to combine hash values.
One last issue needs to be addressed. Java specifies a
computation more precisely than most languages so that across a
network with various implementations the same results are
obtained. For hashing this means that the Java language
specification or the application interface specification should
define a particular hashing method. However no hash method is
specified in any part of the language specification. This means
that the programmer can not depend on any behavior at all -- this
is clearly inadequate for programming anything.
Copyright © 1996 Robert Uzgalis,
All Rights Reserved.
Contact: Robert Uzgalis <buz@zis.com>
home
This paper was automatically translated from troff into html by
t2h.