This application uses AI and MongoDB for embeddings vector search using movies data.
This application performs semantic searches, returning results that are contextually similar to the input query rather than relying solely on keyword matching.
The general workflow for this application is illustrated in the diagrams below:
For more details, check out these resources:
- Article: What are Vector Databases? - by MongoDB
- January 2025 - Slides: Using MongoDB Atlas Vector Search for AI semantic search - by Leonardo Gomes
- April 2025 - Slides: Smarter Movie Picks with MongoDB Atlas Vector Search - by Leonardo Gomes
- JavaScript/TypeScript: JavaScript is a versatile programming language commonly used for web development. TypeScript is a superset of JavaScript that adds static types, enhancing code quality and maintainability.
- React: React is a JavaScript library created by Facebook. React is a User Interface (UI) library.
- Next.js: A React framework that enables server-side rendering and static site generation, providing a powerful toolset for building modern web applications.
- NestJS: A progressive Node.js framework for building efficient, reliable, and scalable server-side applications. It uses TypeScript and incorporates strong typing and modular architecture.
- REST API: A REST API is an application programming interface (API) that follows the design principles of the REST architectural style.
- OpenAI Embeddings API: An API provided by OpenAI to generate embeddings, which are numerical representations of text used for various AI applications, including semantic search and natural language processing.
- MongoDB Atlas: MongoDB Atlas simplifies cloud data hosting and management.
- MongoDB Atlas Vector Search: Use Atlas Vector Search to query data in Atlas based on semantic meaning, not just keywords. This enhances AI-powered applications with semantic, hybrid, and generative search, including RAG.
- MongoDB Driver for Node.js: The official MongoDB driver for Node.js, providing a native interface to interact with MongoDB databases from Node.js applications.
- MongoDB Compass: Compass is a free tool for querying, optimizing, and analyzing MongoDB data, offering insights and a drag-and-drop pipeline builder.
- Node.js version 23.6.1 or higher
- OpenAI account and API token
- MongoDB Atlas account
- Sign up or log in to your account at OpenAI.
- Go to API Keys.
- Use the "Create new secret key" button and copy the API key.
- Go to MongoDB Atlas.
- Create a new project (e.g., "Movies Flix") and use the "Create a Cluster" option:
- Select the Free tier (M0) and preload the sample dataset:
- Use the "Create Deployment" button and wait for the cluster to be created.
- Manage database access:
- Copy the database username and password.
- Navigate to SECURITY > Database Access to manage users.
- Navigate to SECURITY > Network Access to manage IP addresses (you may temporarily allow
0.0.0.0
for development).
- Connect to your database using MongoDB Compass:
- Navigate to the
sample_mflix.movies
collection, use theOpen MongoDB shell
button on the top-right, and run this command to clean up part of the data:
db["movies"].deleteMany({ $or: [{ "runtime": { $exists: false } }, { "genres": { $exists: false } }, { "plot": { $exists: false } }, { "directors": { $exists: false } }, { "poster": { $exists: false } }, { "cast": { $exists: false } }, { "languages": { $exists: false } }] })
- In MongoDB Atlas, go to Atlas Search > Go to Atlas Search:
- Click "Create Search Index" > Atlas Vector Search JSON Editor > Next:
- Database and Collection: Select
sample_mflix > movies
- Index Name:
vectorsearch
- Use the following JSON definition and click "Next" > "Create Vector Search Index":
{ "fields": [ { "numDimensions": 1536, // -> this is the number of dimensions from model text-embedding-ada-002 "path": "embeddings", // -> this is the field name in the database collection "similarity": "cosine", // -> one of the Similarity algorithms supported by MongoDB Atlas Vector Search "type": "vector" } ] }
- Run
cd server && cp -v .env.example .env
- Update the environment variables in the
.env
file. - If using nvm, run:
nvm use
- Run
npm install
- Run
npm run start:dev
- Run
cd client && cp -v .env.example .env
- Update the environment variables in the
.env
file. - If using nvm, run:
nvm use
- Run
npm install
- Run
npm run dev
To add embeddings data to the sample_mflix.movies
collection, run:
curl -X POST http://localhost:3001/movies/createEmbeddings -H "Content-Type: application/json" -d '{}'
This command uses the OpenAI embeddings API to populate the embeddings
field for each movie. The embedding data input includes:
- Type
- Title
- Plot
- Genres
- Cast
- Directors
- Languages
- Runtime
- IMDb (rating and votes)
After running this command, check the new embeddings
field using MongoDB Compass:
Check the status of your Atlas Search index:
- Go to http://localhost:3000:
- Enter a search query and hit the "Search" button. For example, searching for
dog hero movies
returns movies with descriptions containinggolden retriever
,puppies
, etc., due to vector similarity:
- Input Query: The user inputs a search query, such as "dog hero movies".
- Generate Embeddings: The application uses the OpenAI Embeddings API to convert the input query into a vector representation.
- API Request:
{ "input": "dog hero movies", "model": "text-embedding-ada-002" }
- API Response:
{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ /* vector data */] } ], "model": "text-embedding-ada-002", "usage": { "prompt_tokens": 3, "total_tokens": 3 } }
- Vector Search: The generated vector is used to query the MongoDB Atlas Vector Search index.
- MongoDB Query:
[ { "$vectorSearch": { "queryVector": [/* vector data */], "path": "embeddings", "numCandidates": 100, "index": "vectorsearch", "limit": 100 } } ]
- Retrieve Results: MongoDB returns the most relevant documents based on vector similarity, which are then displayed to the user.