image created by DALL-E

RAG Hello World for Vertex AI

Mark Ryan

--

Adapt existing LangChain tutorials to get a basic Vertex AI RAG setup

Since ChatGPT burst onto the world a year ago Retrieval Augmented Generation (RAG) has become a common use case for getting the most out of LLMs. With RAG, you can constrain LLMs to answer questions from a defined set of documents. It’s as easy as using LangChain to stitch together a solution that chunks up the documents you want to ground the responses in, generates embeddings for the chunks, stores the chunks and their embeddings in a vector store, uses the vector store to get matching chunks for your input prompt, and then uses those matching chunks to create a final prompt to send to the LLM. Simple, right? Well, maybe not so simple the first time.

The good news is that LangChain documentation includes two easy-to-follow tutorials that show you how to:

The bad news is that both these tutorials assume that you are using an OpenAI LLM. In order to create a Hello World RAG application using Vertex AI I adapted these two tutorials. In this article I will describe how I adapted the tutorials to get a simple RAG example that uses Vertex AI rather than OpenAI for the LLM and embedding portions of the application.

--

--

Mark Ryan

Technical writing manager at Google. Opinions expressed are my own.