Indexing and hashing is used to speed up data retrieval in databases. Instead of scanning an entire file, indexes provide fast access paths to records using keys. When a file is modified, every index on the file must be updated.
Search Key
An attribute or set of attributes used to look up records.
Index File
A separate data structure that stores pairs (index records) of the form:
(search-key, pointer to record)
Much smaller than the original file.
Index-Sequential File
Ordered sequential file with a primary index.
B+ Tree Index File
Dynamic, self-balancing tree structure for indexing. An alternative to indexed-sequential files.
Automatically reorganizes after inserts/deletes (no file rebuild). Avoids block overflows. Keeps all keys in sorted order. Excellent for range and equality queries.
Slightly higher space and update overhead. Still, dominant indexing method in modern DBMS.
Index Evaluation Metrics
When comparing index types, measure:
- Access time: how quickly data is found.
- Insertion/deletion time: update performance.
- Space overhead: extra storage used by index.
- Supported access types: exact-match or range-based.
Dense Index
One index entry per search-key value. Points directly to the record. Faster lookups. More space. More maintenance cost.
Sparse Index
Indexes only some search-key values. Can only be used when data is physically ordered on key.
Less space. Easier maintenance. Slow lookups.
To find a key :
- Find the index entry with largest key .
- Scan sequentially in the file.
Ordered Index
Keys stored in sorted order.
Primary Index
Aka. clustering index. Order of index matches file order. Usually (but not always) based on the primary key. One per file. Sequential scan is efficient.
Secondary Index
Aka. non-clustering index. Based on non-ordering attribute(s). Can be many per file. Must be dense. Sequential scan is inefficient.
Hash Index
Keys mapped to buckets using a hash function.
Multilevel Index
When a sparse index is built from a index file. Named outer index and inner index respectively.
All indices must be updated during insert/delete.
Used when the inner index is too large to fit in memory.
Bitmap Index
A bitmap is an array of bits. Can only be used when the attribute has a small domain. One bitmap per attribute value.
Hashing
Static Hashing
Each record is stored in a bucket determined by a hash function:
- Each bucket is typically one disk block.
- Used for equality search (not range queries).
- Example:
h(DeptName) = sum(char codes) mod 10.
Bucket Overflow
Occurs when too few buckets or non-uniform data distribution.
Handling Overflow
- Overflow chaining: overflow buckets linked in a list.
- Called closed hashing.
- Open hashing (without overflow buckets) is unsuitable for DBMS use.
Hash Function Design
A good hash function:
- Distributes keys uniformly across buckets.
- Minimizes overflow chains.
- Avoids clustering of similar key values.
Poor hash functions map many keys to the same bucket → slow access.