Adapt existing LangChain tutorials to get a basic Vertex AI RAG setup
Since ChatGPT burst onto the world a year ago Retrieval Augmented Generation (RAG) has become a common use case for getting the most out of LLMs. With RAG, you can constrain LLMs to answer questions from a defined set of documents. It’s as easy as using LangChain to stitch together a solution that chunks up the documents you want to ground the responses in, generates embeddings for the chunks, stores the chunks and their embeddings in a vector store, uses the vector store to get matching chunks for your input prompt, and then uses those matching chunks to create a final prompt to send to the LLM. Simple, right? Well, maybe not so simple the first time.
The good news is that LangChain documentation includes two easy-to-follow tutorials that show you how to:
The bad news is that both these tutorials assume that you are using an OpenAI LLM. In order to create a Hello World RAG application using Vertex AI I adapted these two tutorials. In this article I will describe how I adapted the tutorials to get a simple RAG example that uses Vertex AI rather than OpenAI for the LLM and embedding portions of the application.