Integrating AWS Bedrock and MongoDB - Part 1 - RAG

Benjamin Wootton

Benjamin Wootton

Follow me on LinkedIn
Taylor Sainsbury

Taylor Sainsbury

Follow me on LinkedIn
Integrating AWS Bedrock and MongoDB - Part 1 - RAG

Introduction

In these articles we aim to demonstrate two different ways for integrating AWS Bedrock with MongoDB Atlas, the cloud hosted version of MongoDB.

Our aim is to explore and demonstrate how AI solutions such as chatbots, agents, or other automation processes could be built against MongoDB, and more generally against live transactional OLTP style databases.

In this first article, we will demonstrate how we can create a knowledge base in AWS Bedrock and use retrieval augmented generation (RAG) to access unstructured data which is vectorised and hosted in MongoDB Atlas. Our aim is to ask natural language questions in a bespoke frontend, through AWS Bedrock, and use data stored in MongoDB to enhance the answer.

In the next article we will demonstrate how we can give an AWS Bedrock autonomous agent access to a MongoDB database as a tool using a similar pattern to that shown in this article.

What is RAG?

Retrieval Augmented Generation (RAG) is a pattern where we give a large language model (LLM) access to an external dataset which can be used at inference time.

By doing this, a foundation model (or generalised LLM) can be enhanced with data that is specific to your business without any additional fine tuning or training. The LLM can then use this data for either reasoning or as part of its response.

A typical example use case for RAG might be a customer support chatbot, where the LLM is given access to support documents which can be used when forming responses. A typical interaction might look like this:

Please can you confirm your typical delivery times and which couriers you use?
Our orders are typically dispatched 24 hours after the order is placed and arrive with you within 2 business days.  The only courier we use is DHL.  

Why Bedrock and MongoDB?

We have been fans of AWS Bedrock since it’s early release as discussed here in December 2023. Its ability to switch between models through a single API, its abstractions such as knowledge bases and agents, and the reliability and security of AWS make it a very appealing enterprise platform for building and deploying AI solutions in this rapidly evolving space.

We chose MongoDB for this example because it's a great example of a cloud hosted modern OLTP style database which backs many modern applications. Rather that introducing a specialised vector database, we believe that many businesses could simply reuse their transactional database for vector search use cases, keeping their infrastructure simple.

Since Q2 2024 AWS Bedrock has initiated support for MongoDB integration for knowledge bases within the US regions. This means that Atlas Vector Search can be used to store your vectorised data for RAG purposes, which can then be easily connected into your AWS Bedrock powered applications. This integration is out of the box and accessible via a point and click UI within the AWS Bedrock console.

Architecture

Our architecture will involve storing documents on AWS S3 in PDF Format. We will define an AWS Bedrock knowledge base which will be responsible for vectorising these documents and storing the vectors in MongoDB Atlas. We will then run our frontend code on an EC2 instance which will connect via the AWS Bedrock API. Any calls to this API will include a knowledge base ID which will associate the LLM with the knowledge base. The architecture can be visualised as follows:

mongo2

Walkthrough

We will now walk through the process of configuring the RAG system using MongoDB with AWS Bedrock.

The dataset that we will use consists of a series of product reviews. You can download this zip file for a complete set of test data. 3 example product reviews are shown below:

Trendy Oversized Sweater
"Obsessed with this oversized sweater! It’s incredibly cozy and looks so stylish with leggings or skinny jeans. The knit pattern is beautiful, and it hasn’t pilled after multiple washes. Definitely my go-to for chilly days."
Stretchy Yoga Pants
"I love these yoga pants so much I ordered two more pairs! The high waistband stays in place during workouts, and the fabric is stretchy yet supportive. They’re squat-proof and have a hidden pocket, which is such a bonus. Highly recommend for fitness enthusiasts!"
Warm Down Jacket
"This down jacket has kept me toasty in freezing temperatures! It’s lightweight but incredibly warm, and the water-resistant material is perfect for snowy days. The zippered pockets are a great touch for keeping essentials safe. Worth every penny!"

Configure a Database and Collection in MongoDB Atlas

From within your MongoDB cluster, create a database and input the desired database name and collection name. We will use a set of product reviews as our example dataset so product_reviews woudl be a good name for the collection:

mongo3

Configure an Atlas Vector Search Index

Once your Database and Collection have been created you should now navigate to Atlas Search and create a vector search index.

mongo4

The vector search index is configured using JSON. Enter the following JSON for now:

{
 "fields": [
   {
     "type": "vector",
     "path": "embedding",
     "numDimensions": 1024,
     "similarity": "cosine"
   }
 ]
}

To explain this block, the type should always be vector, path should be the name you assign to the field where embeddings will be stored, numDimensions will be the number of dimensions used to vectorise and should relate to the size and complexity of your data, and similarity is the vector similarity function to use to search for top K-nearest neighbours.

Upload a Set of Documents to S3

Download this zip file which contains a number of product reviews stored in PDF format. Create a bucket on S3 to hold the files and make sure to note the URI. Upload the supplied documents to the bucket.

Be aware of the region in which you are creating your S3 bucket. We recommend a US region such as us-east-1 as this is where the MongoDB integration is supported.

Create an AWS Secrets Manager Secret

In order to connect to the MongoDB cluster you will need to store a credentials file of the following format into Secrets Manager. Insert your MongoDB username and password as appropriate.

{
        "username":"<your-cluster-username>",
        "password":"<your-cluster-password>"
}

Upload the document to AWS Secrets Manager and note the Secret ARN for future use.

Define the Knowledge Base in Bedrock

As previously mentioned MongoDB is only available to Bedrock in the US regions so you will need to be within a permitted region before creating your knowledge base.

Create an AWS Bedrock Knowledge Base using the AWS Console by navigating to Bedrock → Builder Tools → Knowledge Bases and hit Create knowledge base. This will take you through a series of steps to create and configuring your knowledge base.

mongo5

In Step 1 of the knowledge base configuration you will have to input a name, description, the data source target which should be Amazon S3, and assign an IAM role for Bedrock execution.

In Step 2 of the configuration you will add some config around the data source. Use the S3 URI that you noted from earlier.

In Step 3 this is where you will select your desired Embeddings model in which you should have a few choices between some Amazon and Cohere models to choose from. The vector dimension number should match the earlier configured Atlas Vector Index numDimensions.

Note that you might need to navigate to Foundation models → Base models within Bedrock to enable access to particular models.

In step 4, you will choose a vector dataabase to hold the embeddings, which is where we will choose MongoDB Atlas.

mongo6

In order to configure your Vector Database from the AWS Console you will need to input the following fields, all of which should be noted from previous steps:

  • Hostname
  • Database name
  • Collection name
  • Credentials secret ARN

In order to configure your Metadata Field Mapping you will need to input the following fields:

  • Vector search index name
  • Vector embedding field path
  • Text field path
  • Metadata field path

Once all the relevant fields have been populated you will be prompted to review and create your knowledge base. Your knowledge base will be ready for use when the status is marked as ‘Available’.

Note that you may have to amend the Network Access settings on your MongoDB Atlas Cluster to allow entry from AWS. Failing network access the knowledge base won’t successfully create.

Sync your Knowledge Base Data

Now that your knowledge base is created you need to sync the data. This is where Bedrock orchestrates the vectorisation of the S3 target data and then indexing it in your configured Atlas Vector Search Index. You will be able to find and select your data source within your Knowledge Base and hit Sync to run the process

mongo7

The initial sync process an take a minute or two, even with a relatively small set of documents.

Test The Integration Via The AWS Console

We can now test the integration within the AWS Bedrock console to check we have connectivity between Bedrock and MongoDB. In this intance, we asked for a recommendation for a coat based on the product reviews. The recommendation comes back having been sourced from our product reviews.

mongo9

Exposing The Knowledge Base Through A UI

Our final step is to integrate the solution into a bespoke frontend, website or application. In the example below, we show an example of interacting with the system via a conversational UI, but this could be any type of user experience.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.