Skip to content

This project enables semantic search of movies using natural language queries. It leverages the OpenAI Embeddings API to generate vector representations of movie descriptions and MongoDB Atlas Vector Search to perform efficient similarity searches based on user input.

License

Notifications You must be signed in to change notification settings

leogomesdev/moviesflix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movies Flix

This application uses AI and MongoDB for embeddings vector search using movies data.

This application performs semantic searches, returning results that are contextually similar to the input query rather than relying solely on keyword matching.

Movies page with results

System Architecture

The general workflow for this application is illustrated in the diagrams below:

Searching Based on a Given Prompt:

System architecture for searching

Generating Embedding Vectors for Existing Data:

System architecture to add embeddings to the existing dataset

For more details, check out these resources:

  1. Article: What are Vector Databases? - by MongoDB
  2. January 2025 - Slides: Using MongoDB Atlas Vector Search for AI semantic search - by Leonardo Gomes
  3. April 2025 - Slides: Smarter Movie Picks with MongoDB Atlas Vector Search - by Leonardo Gomes

Technologies

  • JavaScript/TypeScript: JavaScript is a versatile programming language commonly used for web development. TypeScript is a superset of JavaScript that adds static types, enhancing code quality and maintainability.
  • React: React is a JavaScript library created by Facebook. React is a User Interface (UI) library.
  • Next.js: A React framework that enables server-side rendering and static site generation, providing a powerful toolset for building modern web applications.
  • NestJS: A progressive Node.js framework for building efficient, reliable, and scalable server-side applications. It uses TypeScript and incorporates strong typing and modular architecture.
  • REST API: A REST API is an application programming interface (API) that follows the design principles of the REST architectural style.
  • OpenAI Embeddings API: An API provided by OpenAI to generate embeddings, which are numerical representations of text used for various AI applications, including semantic search and natural language processing.
  • MongoDB Atlas: MongoDB Atlas simplifies cloud data hosting and management.
  • MongoDB Atlas Vector Search: Use Atlas Vector Search to query data in Atlas based on semantic meaning, not just keywords. This enhances AI-powered applications with semantic, hybrid, and generative search, including RAG.
  • MongoDB Driver for Node.js: The official MongoDB driver for Node.js, providing a native interface to interact with MongoDB databases from Node.js applications.
  • MongoDB Compass: Compass is a free tool for querying, optimizing, and analyzing MongoDB data, offering insights and a drag-and-drop pipeline builder.

Requirements

  • Node.js version 23.6.1 or higher
  • OpenAI account and API token
  • MongoDB Atlas account

Setup Instructions

OpenAI Account Setup

  1. Sign up or log in to your account at OpenAI.
  2. Go to API Keys.
  3. Use the "Create new secret key" button and copy the API key.

MongoDB Cluster Setup

  1. Go to MongoDB Atlas.
  2. Create a new project (e.g., "Movies Flix") and use the "Create a Cluster" option: Create MongoDB Cluster
  3. Select the Free tier (M0) and preload the sample dataset: Select Free MongoDB Cluster and Preload sample dataset
  4. Use the "Create Deployment" button and wait for the cluster to be created.
  5. Manage database access:
    1. Copy the database username and password.
    2. Navigate to SECURITY > Database Access to manage users.
    3. Navigate to SECURITY > Network Access to manage IP addresses (you may temporarily allow 0.0.0.0 for development).
  6. Connect to your database using MongoDB Compass: After the success message, use the Connect button Using MongoDB Compass to connect to the cluster
  7. Navigate to the sample_mflix.movies collection, use the Open MongoDB shell button on the top-right, and run this command to clean up part of the data:
db["movies"].deleteMany({ $or: [{ "runtime": { $exists: false } }, { "genres": { $exists: false } }, { "plot": { $exists: false } }, { "directors": { $exists: false } }, { "poster": { $exists: false } }, { "cast": { $exists: false } }, { "languages": { $exists: false } }] })
  1. Drop other sample_* databases to save storage space.
  2. Refresh to check the number of documents: Using MongoDB Compass to explore the data

MongoDB Atlas Vector Search Index Setup

  1. In MongoDB Atlas, go to Atlas Search > Go to Atlas Search: Accessing MongoDB Atlas Search
  2. Click "Create Search Index" > Atlas Vector Search JSON Editor > Next:
  3. Database and Collection: Select sample_mflix > movies
  4. Index Name: vectorsearch
  5. Use the following JSON definition and click "Next" > "Create Vector Search Index":
    {
      "fields": [
      {
        "numDimensions": 1536, // -> this is the number of dimensions from model text-embedding-ada-002
        "path": "embeddings", // -> this is the field name in the database collection
        "similarity": "cosine", // -> one of the Similarity algorithms supported by MongoDB Atlas Vector Search
        "type": "vector"
      }
      ]
    }

Running the Application

Backend

  1. Run cd server && cp -v .env.example .env
  2. Update the environment variables in the .env file.
  3. If using nvm, run: nvm use
  4. Run npm install
  5. Run npm run start:dev

Frontend

  1. Run cd client && cp -v .env.example .env
  2. Update the environment variables in the .env file.
  3. If using nvm, run: nvm use
  4. Run npm install
  5. Run npm run dev

Creating Embeddings

To add embeddings data to the sample_mflix.movies collection, run:

curl -X POST http://localhost:3001/movies/createEmbeddings -H "Content-Type: application/json" -d '{}'

This command uses the OpenAI embeddings API to populate the embeddings field for each movie. The embedding data input includes:

  • Type
  • Title
  • Plot
  • Genres
  • Cast
  • Directors
  • Languages
  • Runtime
  • IMDb (rating and votes)

After running this command, check the new embeddings field using MongoDB Compass: Checking embeddings data with MongoDB Compass

Check the status of your Atlas Search index: Checking the index status with MongoDB Atlas Vector Search

Using the Application

  1. Go to http://localhost:3000: Initial screen of the app
  2. Enter a search query and hit the "Search" button. For example, searching for dog hero movies returns movies with descriptions containing golden retriever, puppies, etc., due to vector similarity: Results when searching for "dog hero movies"

How does it work?

  1. Input Query: The user inputs a search query, such as "dog hero movies".
  2. Generate Embeddings: The application uses the OpenAI Embeddings API to convert the input query into a vector representation.
    1. API Request:
      {
        "input": "dog hero movies",
        "model": "text-embedding-ada-002"
      }
    1. API Response:
    {
      "object": "list",
      "data": [
        {
          "object": "embedding",
          "index": 0,
          "embedding": [ /* vector data */]
        }
      ],
      "model": "text-embedding-ada-002",
      "usage": {
        "prompt_tokens": 3,
        "total_tokens": 3
      }
    }
  3. Vector Search: The generated vector is used to query the MongoDB Atlas Vector Search index.
    1. MongoDB Query:
    [
      {
        "$vectorSearch": {
          "queryVector": [/* vector data */],
          "path": "embeddings",
          "numCandidates": 100,
          "index": "vectorsearch",
          "limit": 100
        }
      }
    ]
  4. Retrieve Results: MongoDB returns the most relevant documents based on vector similarity, which are then displayed to the user.

About

This project enables semantic search of movies using natural language queries. It leverages the OpenAI Embeddings API to generate vector representations of movie descriptions and MongoDB Atlas Vector Search to perform efficient similarity searches based on user input.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •