• AIPressRoom
  • Posts
  • Ask your documents: Document AI and PaLM2 for question answering

Ask your documents: Document AI and PaLM2 for question answering

Documents, physical or digital, contain a goldmine of information—assuming all that data and content can actually be leveraged to help employees to do their jobs. Internal IT and content management teams have long sought to provide knowledge workers with the ability to interact with a document — or better yet, a corpus of documents — without needing to manually dig through them.

This goal went unfulfilled for years because, prior to generative AI models such as PaLM 2, technologies struggled to provide the contextual understanding required to perform question-and-answering across different document types. Today, however, developers can build an “Ask your documents” tool for employees by leveraging Google Cloud Document AI, text embedding models, and PaLM 2. In this post, we’ll show you how.

Why use Document AI and PaLM2 to build a document Q&A application

Document Question-Answering (Document Q&A) involves extracting information from a given document to answer questions in natural language. The use cases applicable to this type of workflow cover a wide variety of industries and domains. For example:

  • Lawyers and legal professionals can use Document Q&A to search through legal documents, statutes, and case law to find relevant information and precedents for their cases.

  • Students and educators can benefit from Document Q&A to better understand concepts in research papers, textbooks, and educational materials.

  • IT support teams can employ Document Q&A to help resolve technical issues by quickly finding information from technical documentation and troubleshooting guides.

A retrieval Augmented Generation (RAG) can help you to generate more accurate and informative answers to questions by grounding responses in relevant information from a knowledge base, such as a vector store. For this task, Document AI OCR (optical character recognition) and PaLM provide powerful capabilities.

The solution and architecture proposed in this blog create a serverless and scalable framework for implementing a RAG-based architecture at scale. Here, we’ll focus on Q&A use cases for long documents.

High-level architecture

For the purpose of this post, we used Document AI, which provides high-quality, enterprise-ready AI document processing models. It’s a fully managed, scalable, and serverless solution capable of processing millions of documents without needing to spin up infrastructure.

More specifically, we used Enterprise Document OCR, a pre-trained model that extracts text and layout information from document files. We also used the textembedding-gecko model from Vertex AI to create a text embedding — a vector representation of text — with generative AI. Lastly, we leveraged PaLM2, specifically the Vertex AI text-bison foundation model, to answer questions on the embedding data store. Below is a diagram of the serverless architecture for Document Q&A with Document AI and PaLM2 generative AI foundation models: