JaguarDB Vector Database Multimodal Search: Integrating Vector Search and Exact Search

Jonathan Yue, PhD
2 min readAug 24, 2023

In various application scenarios, there arises a need for users to perform targeted queries on a dataset, ensuring that the retrieved data records not only adhere to certain criteria but also exhibit a certain level of similarity to a provided data sample. This intricate task demands the identification of vectors that are both closely related and satisfy specific prerequisites. With the innovative capabilities of JaguarDB, this complex process can be streamlined into a single step. Through the integration of similarity search alongside selective criteria, JaguarDB facilitates the discovery of nearest neighbors that fulfill predefined qualifications. This advanced functionality empowers users to seamlessly locate a subset of data records and subsequently assess their likeness to a reference vector, resulting in the assignment of similarity rankings. By encompassing both the aspects of similarity and tailored selection, this approach significantly mitigates the potential for inaccuracies, making it particularly well-suited for environments characterized by stringent business requirements.

JaguarDB’s unique amalgamation of similarity-based search and tailored qualification selection brings unprecedented efficiency to the intricate task of querying and comparison. Once a cohort of relevant data records is extracted, their alignment with a given vector is precisely evaluated, generating a hierarchy of similarity rankings. This integrated approach is instrumental in refining the matching process, ensuring that data records not only exhibit the desired attributes but also possess a designated degree of resemblance to a reference sample. This holistic functionality carries substantial benefits, especially in high-stakes scenarios where precision is paramount. By converging the twin challenges of similarity and criterion-based filtering, JaguarDB effectively minimizes the potential for inaccuracies, offering a robust solution for industries demanding precise data retrieval and analysis. Through this innovative approach, JaguarDB empowers users to navigate the complexities of data exploration with enhanced accuracy and confidence, establishing itself as a pivotal tool in the pursuit of data-driven excellence.

The following similarity search statement is extended with the “where clause” to filter the nearest neighbors of the input query vector:

select 
similarity(v, 'QUERY_VECTOR',
'topk=K,type=DISANCE_INPUT_QUANTIZATION')
from TABLE
where attribute1 = … and attribute2 = …;

For example:

select similarity(v, '0.1, 0.2, 0.3, 0.4, 0.5, 0.3, 0.1',
'topk=100,type=manhatten_fraction_float')
from vec1
where customer_region=’NE’ and marriage_status=’single’;

In this illustrative scenario, the foremost consideration involves the establishment of a topK records subset, containing a specified count of 100 records. Simultaneously, a distinct set of data is determined by the criteria enlisted in the “where” clause. However, the volume of this latter set could potentially deviate from the set limit of 100 records, either falling short or exceeding it. In the instance of a set size less than 100, every record within the selected subset is accompanied by a corresponding similarity value. Conversely, if the set size surpasses 100, there is a possibility that certain records may not possess an associated similarity value. To overcome this issue, users are recommended to increase the value of K, thereby amplifying their insights into the similarity assessment.

--

--

Jonathan Yue, PhD

Enthusiast on vector databases, AI, RAG, data science, consensus algorithms, distributed systems. Initiator and developer of the JaguarDB vector database