|
|
There are two main programs
in this example. Both of them use the .h
files and the hash.cpp
file.
The main function for the first program is in hashmake.cpp. This program
builds a hash table from the data in the hash.txt
file. The hash table is placed in a file named hash.dat.
The second program has its main function in the hashread.cpp file. This
program allows the user to interactively look up data in the hash table. It
reads this data from the hash.dat
file, of course. Note that this example uses coalesced chaining (a linked list
inside the table with all items after the first in an overflow area of the
table).
Let's see how it works.
The file is organized as a binary file of
records.
Each record has an Info and a Next field.
The Next field will contain the record number
for the next record in the linked list (when we have to chain nodes together)
or a -1 as a fake NULL pointer.
Record 0 (the first record in the file) is
not used to hold node data at all. Instead, it holds in Info.KeyField the
number of items in the table. That way this count is stored permanently, along
with the file. When an empty table is first created, the entire file is filled
with empty nodes.
·
Thus, the special
record 0, followed by empty nodes 1 through MaxFile = 150 are written to the
file. These nodes are marked as empty by placing 0 in the KeyField locations.
(This assumes that 0 is not a normal KeyField value.)
·
The regular part
of the table includes nodes 1 through Prime = 101, and the overflow area is
from node 102 to node MaxFile = 150. A field of the HashTableClass object,
called OverflowIndex, is used to keep track of the next free record for
overflow use. (Thus OverflowIndex is initially set to 102. If it reaches 151,
the overflow area is full.)
The following is a drawing of a
HashtableClass object, showing the five data fields.
Note that the OpenMode field contains
the character r or w, indicating whether the table has been opened in read or
write mode. NodeSize is the number of bytes per node. Since this value is often
needed, it is convenient to store it in a field of the object. The DataFile
field is an fstream object containing information about the file where the
table's data is held. The details of an fstream object are beyond what we will
study here, but include information such as the current position in the file.
![[hash table]](hash_files/image003.jpg)
Note that when we open a hash table in write
mode, we really open the underlying data file, hash.dat for both input and output. We also
specify that it is a binary file. Depending on your compiler, you may need to
use ios::trunc
as was done here. This tells the program to truncate (erase) any existing file
by the same name when the open is carried out.
The data for this program is read from a text
file, hash.txt, that contains an integer (long) followed by a name (character
string), with these items on separate lines. To make it easy to read in the
strings, no spaces are used. Rather, underscores are used where spaces would normally
appear, as in Bill_Clinton, the name that goes with key value 7.
If you run the hashmake program from within the Visual C++
environment you need to be sure that the hash.txt file is in the project
directory for this program. If instead you want to run the program by double
clicking on the hashmake.exe
file in MyPrograms, then you want hash.txt to be in the same directory (usually
the Debug directory) that contains hashmake.exe.
The
hash function used is hash(key) =
(key % prime) + 1. The + 1 is there because record 0 is not used for table
data. Thus the hash function has to produce the integers from 1 to prime. When
inserting an item in the table, the item's key value is hashed to produce the
desired location. The record at that location is read and its Info.KeyField
examined to see if it is 0 (which indicates an unused record). If so, a record
containing the item is written to the file at this location. If, however, the
location produced by the hash function is in use, we insert the item at the
location given by OverflowIndex (unless OverflowIndex > MaxFile, which means
that the overflow area is full). The new record written to this location
contains item and a Next field containing the record number from the Next field
of the record at the location produced by the hash function. Finally, the
record at the location produced by the hash function is rewritten so as to
contain in its Next field OverflowIndex, the location of our newly inserted
node. OverflowIndex is then incremented, of course. Note that after writing the
above 2 records to the file, we have correctly linked the new record into the
linked list of records hashing to that same location, with the new record added
as the second record of the list.
The following drawing shows our hash table
after 4 items have been inserted. One item hashed to location 1 and was
inserted there. The other 3 items hashed to location 3, so that two of them had
to be placed in the overflow area. Note how these three are linked together in
a linked list via the Next field values. Also note how the number of items (4
in this case) is stored in the NumItems field in the object, as well as in node
0 in the file. (However, node 0 is only updated when we finish using the table.
This is so that we do not waste time repeatedly writing to node 0 just to
update that counter.)
![[hash table]](hash_files/image005.jpg)
Next, let's consider the second program in
this example. It allows us to interactively look up data from the hash table.
When a table is first opened for reading, the special record 0 is read in so
that the number of items in the table can be discovered by examining
Info.KeyField. To do a lookup of a key value, we first compute hash(key) and
read in the record with that number as its location. If the key in this record
matches, we are done. If not and the Next field of this record contains -1
(used as a fake NULL pointer), then this key is not in the table. Otherwise,
the Next field number is the location for the next record that hashes to this
location. We read this new record (which will be in the overflow area) and see
if the key matches the target key. If not, we check the Next field. If we have
-1 we report that the key cannot be found. Otherwise, we read the record whose
location is given by this Next field, check its key value, etc. The search ends
when we either find a matching key or get a -1 as the Next pointer.
Assignment.
1)
Modify this program so that it uses open addressing hash table using linear
probe. Determine how the file for the table should be organized for this new
method.
2)
Describe how you choose the initial hash function for the above?
3) Modify
your program which returns the number of collisions encountered when hashing
data[0], data[1], data[2], ...,data[n-1] into a hash table. and compare the
number of collisions of the two different functions.
4) What are the various advantages and disadvantages of
the various collision-resolution strategies?
Extra Credit: open addressing hash table using quadratic probe function