AIPressRoom
Posts
Interactive search with Google Cloud and Elasticsearch

Interactive search with Google Cloud and Elasticsearch

October 30, 2023

The life of a search query

In this architecture, a search query happens in two broad phases.

1. Initial query processing and enrichment

A user initiates the process by submitting a query or question on the search page.

This query, along with other relevant metadata like general geolocation, session information, or any other data deemed significant by the retailer, is then forwarded to Elastic Cloud.

Elastic Cloud uses this query and metadata to perform search on the domain-specific customer data, gathering rich context information in the process:

Relevant real-time data from multiple internal company data-sources (ERP, CRM, WMS, BigQuery, GCS, etc) are continuously indexed into Elastic through its set of integrations, and related embeddings are created via your transformer model of choice or the built-in ELSER. Inference can be automatically applied to every data stream through ingest pipelines.
Embeddings are then stored as dense vectors in the Elastic’s Vector Database.
Using the search API, elasticsearch starts executing search among its indices with both text-search and vector search within a single call. Meanwhile it generates vectors on the user’s query text that it is receives.
Generated embeddings are compared via vector search to the dense vectors from previously ingested data with kNN (K-Nearest Neighbors Algorithm).

An output is produced combining both semantic and text-search results, with the Reciprocal Rank Fusion (RRF) hybrid ranking.

2. Produce results using gen AI

The original query and the newly obtained rich context information is forwarded to Vertex AI Conversation.Conversational AI is a collection of conversational AI tools, solutions and APIs, both for designers and developers. In this design, we will be using Dialogflow CX in Vertex AI Conversation for the conversational AI, and integrate with its API.

When using the Dialogflow CX API, your system needs to:

Build an agent.
Provide a user interface for end users.
Call the Dialogflow API for each conversational turn to send end-user input to the API.
Unless your agent responses are purely static (uncommon), host a webhook service to handle webhook-enabled fulfillment.

For more details on using the API, please refer to Dialogflow CX API Quickstart.

The Conversational AI module reaches the endpoints of the LLM model deployed in your Vertex AI tenant to generate the complete response in natural language, merging model knowledge with Elastic-provided private data. This is achieved by:

Selecting your favorite model from Model Garden
If needed, fine-tuning it on your domain tasks
Deploying the model to an endpoint in your Google Cloud project
Consuming the endpoint from the Dialogflow workflows

To manage data access control and ensure privacy, Vertex AI employs IAM for resource access management. You have the flexibility to regulate access at either the project or resource level. For more details, please refer to the Google documentation. Please refer to the section below for more details on this step.

Dialogflow make the chatbot experience actionable with conversational responses, providing relevant actions to the user depending on the context (for instance placing an order or navigate to content)

The response is relayed back to the user.

Elastic Cloud: Build context from enterprise data

When using generative AI, context windows help to pass additional, user-prompted, real-time, private data to the model at query-time, in parallel with the question you’re submitting. This enables users to receive better answers as output, based on the public knowledge that the LLM is trained on, but also in the space of the specific domain you provided. Gen AI’s effectiveness highly depends on input engineering, and context really improves quality of results.

Once the user submits their question via your website search box, Elasticsearch digs into your internal knowledge base, searches for related content and returns it for further processing on the awaiting generative model. Searching information inside your business, from multiple diverse data sources, is what Elasticsearch is designed for.

The post Interactive search with Google Cloud and Elasticsearch appeared first on AIPressRoom.