Vector Quantization In JaguarDB Vector Database
In vector databases, the storage of vectors constitutes a resource-intensive task, demanding a considerable allocation of memory. Given the constraints posed by memory capacities, the exploration of techniques to compress memory usage becomes an imperative pursuit. Among these techniques, vector quantization emerges as a valuable strategy designed to optimize memory utilization.
Vector quantization serves as a potent tool for addressing the memory challenge associated with storing vectors. It involves the process of mapping high-dimensional vectors to a reduced set of representative vectors. By doing so, the memory requirements are substantially curtailed, as the focus shifts from storing individual vectors to merely preserving the essence of the data through a condensed set of representatives.
This technique hinges on a trade-off between memory efficiency and data fidelity. While vector quantization does lead to a reduction in memory consumption, there exists a balance to be struck to ensure that the essential characteristics of the data are preserved without excessive loss. As AI applications continue to evolve and the need for efficient storage mechanisms grows, vector quantization stands as a valuable ally, offering a pragmatic solution to the intricate challenge of memory optimization in the context of storing vectors.
There are three quantization levels in JaguarDB: byte, short, and float. The process of quantizing input vectors yields efficient memory utilization within the system. While storing a float number demands 4 bytes, employing fewer bytes for storing vector components can yield substantial memory savings. When components are stored as signed integers, memory savings can reach 50%, while utilizing only a single byte for vector components can result in an impressive 75% reduction in memory usage. This approach is termed “short quantization level” for the utilization of signed integers and “byte quantization level” for the use of a single byte. The quantization of input vectors aligns with the level specified by the user during vector creation, optimizing memory consumption while maintaining data integrity.
With a byte (8-bit) quantization level, the number of quantized hyper cubes in a 1024-dimensional hyperspace is 2561024 which is already a large number and vector distribution would be sparse. With a short (16-bit) quantization level, the number of available hypercubes is even larger. In rare application scenarios, the vectors could be densely populated around clusters. A 16-bit quantization may provide higher resolution of differentiating vectors than an 8-bit quantization. It is a trade-off between storage size and accuracy in searching nearest neighbors. A byte quantization level may work properly for lower dimension vectors.
During the creation of a vector store, the second argument within the “vector()” field description, or key definition, offers the flexibility to incorporate multiple instances of “DISTANCE_INPUT_QUANTIZATION”. For instance, it can appear as a series of “cosine_fraction_byte, hamming_whole_short”. This allows users to specify multiple distance types and quantization levels, albeit limited to a single input type for the same distance and quantization level. Notably, distinct vector data stores are managed for each unique combination of the three types, ensuring the effective organization of data based on these parameters. Input type of data is either fraction (decimal numbers only) or whole (original data without any conversion).
Here is a list supported distance metrics and quantization levels.
In development sandbox environment, developers can choose a different quantization level for the same distance metric, for testing purposes. In production environment, only one quantization level is recommended because each keydef requires a separate instance of vector index. With a selected distance metric and quantization level, either “fraction” or “whole” can be selected, but no both.