Unlock multimodal search at scale: Combine text & image power with Vertex AI

The way users search is evolving. When searching for a product, users might type in natural-sounding language or search with images. In return, they want tailored results that are specific to their query. To meet these demands, developers need robust multimodal search systems.

In this blog post, Google Cloud will share a powerful approach to build a multimodal search engine using Google Cloud’s Vertex AI platform. We’ll combine the strengths of Vertex AI Search and vector search, using an ensemble method with weighted Rank-Biased Reciprocal Rank (RRF). This approach allows for:

Why using a combined approach matters

Think about how you search for products online. Assume you want to search for queries such as “homes with a large backyard” or “white marble countertops”. Some of this information might be stored in text, while others might only be available in images. When you search for a product, you want the system to look through both modalities. 

One approach might be to ask a Large language model (LLM) to generate a text description of an image. But this can be cumbersome to manage over time and add latency for your users. Instead, we can leverage image embeddings and combine the search results with text data in Vertex AI Search. Together, this multimodal approach delivers: 

Google Cloud’s Vertex AI platform provides a comprehensive set of tools for building and deploying machine learning solutions, including powerful search capabilities:

Our ensemble approach: Text + image power

To create our multimodal search engine, we’ll use an ensemble approach that combines the strengths of Vertex AI Search and vector search for images:

  1. Text search with Vertex AI Search:
    • Index your product catalog data (names, descriptions, attributes) into a data store using agent builder.
    • When a user enters a text query, Vertex AI Search returns relevant products based on keyword matching, semantic understanding, and any custom ranking rules you’ve defined.
    • This also has capabilities to return facets which can further be used for filtering. 
    • You can even visualize how unstructured or complex documents are parsed and chunked
  2. Image search with vector embeddings:
    • Generate image embeddings for your products using multimodal embeddings API.
    • Store these embeddings in vector search.
    • When a user uploads an image or text, convert it to an embedding and query the vector database to find visually similar product images.
  3. Combining results with weighted RRF:
    • Rank-biased Reciprocal Rank (RRF): This metric measures the relevance of a ranked list by considering the position of the first relevant item. It favors lists where relevant items appear higher.
    • Weighted RRF: Assign weights to the text relevance score (from Vertex AI Search) and the image similarity score (from vector search). This allows you to adjust the importance of each modality (i.e. Vertex or Vector Search) in the final ranking.
    • Ensemble: Combine the text and image search results, re-rank them using the weighted RRF score, and present the blended list to the user.

To enhance the search experience, use Vertex AI Agent Builder Search’s faceting capabilities:

Why this approach works

This approach gives developers the best of both worlds by combining the rich features of Vertex AI Search (for example, the parsing pipeline) with the ability to directly utilize images as a query. It’s also flexible and customizable because it adjusts the weights in your RRF ensemble and tailors facets to your specific needs.

Above all, this approach gives your users what they need – the ability to search intuitively using text, images, or both, while offering dynamic filtering options for refined results.

Get started with multi-modal search

By leveraging the power of Vertex AI and combining text and image search with a robust ensemble method, you can build a highly effective and engaging search experience for your users. Get started: 

  1. Explore Vertex AI: Dive into the documentation and explore the capabilities of Vertex AI Search and embedding generation.
  2. Experiment with embeddings: Test different image embedding models and fine-tune them on your data if needed.
  3. Implement weighted RRF: Design your scoring function and experiment with different weights to optimize your search results.
  4. Natural language query understanding: Leverage the inbuilt capabilities of Vertex AI agent builder Search to generate filters on structured data to apply the same filters to Vector Search.
  5. Filters in vector search: Apply filters to your image embeddings to further give control to the users.

Related posts

Dataproc Serverless: Now faster, easier and smarter

by Cloud Ace Indonesia
4 months ago

How to build user authentication into your gen AI app-accessing database

by Cloud Ace Indonesia
7 months ago

8 ways to cut costs and drive profits using data and AI

by Cloud Ace Indonesia
2 years ago