Random Access Compression Engine
The IBM patented RACE implements an inverted approach when compared to traditional
approaches to compression. RACE uses variable-size chunks for the input, and produces
fixed-size chunks for the output.
This method enables an efficient and consistent way to index the compressed data because it
is stored in fixed-size containers (Figure 9-28).
Figure 9-28 Random Access Compression
Location-based compression
Both compression utilities and traditional storage systems compression compress data by
finding repetitions of bytes within the chunk that is being compressed. The compression ratio
of this chunk depends on how many repetitions can be detected within it. The number of
repetitions is affected by how much the bytes stored in the chunk are related to each other.
The relation between bytes is driven by the format of the object. For example, an office
document might contain textual information and an embedded drawing.
Because the chunking of the file is arbitrary, it has no concept of how the data is laid out
within the document. Therefore, a compressed chunk can be a mixture of the textual
information and part of the drawing. This process yields a lower compression ratio because
the different data types mixed together cause a suboptimal dictionary of repetitions. Which
means that fewer repetitions can be detected, because a repetition of bytes in a text object is
unlikely to be found in a drawing.
location-based compression
This traditional approach to data compression is also called
. The
data repetition detection is based on the location of data within the same chunk.
477
Chapter 9. Advanced features for storage efficiency