Skip to content

Add Data Sync Plugin for External Database Replication #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Kunal-Darekar
Copy link

@Kunal-Darekar Kunal-Darekar commented Mar 19, 2025

/Fixes #72
/claim #72

Overview

This PR adds a new Data Sync Plugin that enables StarbaseDB to replicate data from external databases to the internal SQLite database. The plugin provides a robust, configurable solution for keeping data synchronized across different database systems.

Features

  • Automatic Synchronization: Periodically pulls data based on configurable intervals
  • Multiple Database Support: Works with PostgreSQL, MySQL, MongoDB, Turso, and Cloudflare D1
  • Incremental Updates: Only fetches new or updated records since the last sync
  • Schema Detection: Automatically detects and creates matching table schemas
  • Selective Sync: Configure which tables to synchronize
  • Batch Processing: Processes data in configurable batch sizes
  • Error Handling: Comprehensive error handling with detailed logging
  • API Methods: Programmatic control for sync operations

Implementation Details

  • Created a plugin architecture that integrates with StarbaseDB's core
  • Implemented database adapters for different external sources
  • Added metadata tracking for sync state management
  • Built comprehensive test suite with mocked database connections
  • Created detailed documentation with examples and troubleshooting guides

Files Added/Modified

  • plugins/data_sync/index.ts - Main plugin implementation
  • plugins/data_sync/index.test.ts - Test suite
  • plugins/data_sync/readme.md - Documentation
  • plugins/data_sync/demo/setup.sql - Demo database setup
  • plugins/data_sync/demo/test.ts - Demo implementation
  • plugins/data_sync/demo/wrangler.toml - Configuration example
  • vitest.config.ts - Updated test configuration
  • package.json - Added test script
  • tsconfig.json - Updated TypeScript configuration

Testing

  • Unit tests for all core functionality
  • Integration tests with mocked database connections
  • Manual testing with PostgreSQL database

Setup and Configuration

Prerequisites

  • StarbaseDB instance
  • External database (PostgreSQL, MySQL, MongoDB, etc.)

Installation Steps

  1. Start a PostgreSQL instance:

    docker run --name postgres-db -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres
  2. Configure the plugin in wrangler.toml:

    [plugins.data_sync]
    sync_interval = 5  # 5 minutes
    tables = ["users", "products", "orders"]
    batch_size = 100
    track_column = "created_at"
    enabled = true
  3. Set up environment variables for database credentials:

    # PostgreSQL configuration
    EXTERNAL_DB_TYPE = "postgresql"
    EXTERNAL_DB_HOST = "localhost"
    EXTERNAL_DB_PORT = "5432"
    EXTERNAL_DB_USER = "postgres"
    EXTERNAL_DB_PASS = "postgres"
    EXTERNAL_DB_DATABASE = "starbase_demo"
    EXTERNAL_DB_DEFAULT_SCHEMA = "public"
  4. Initialize test database:

    psql -U postgres -d postgres -c "CREATE DATABASE starbase_demo;"
    psql -U postgres -d starbase_demo -f plugins/data_sync/demo/setup.sql
  5. Test synchronization:

    npm run test:data-sync

Documentation

  • Added comprehensive readme with installation instructions
  • Included configuration examples for all supported databases
  • Added API documentation with examples
  • Included troubleshooting section with common issues and solutions

Future Improvements

  • Add support for more database types
  • Implement two-way synchronization
  • Add conflict resolution strategies
  • Improve performance for large datasets

- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
- Implement chunked processing for large database dumps
- Add R2 storage integration for dump files
- Configure export timeouts and breathing intervals
- Add environment variables for dump configuration

Resolves outerbase#93
This commit adds a new Data Sync Plugin that enables StarbaseDB to replicate data from external databases. Features include:
- Automatic synchronization with configurable intervals
- Support for PostgreSQL, MySQL, MongoDB and other databases
- Incremental updates with change tracking
- Selective table synchronization
- Comprehensive error handling
- Complete test suite and documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replicate data from external source to internal source with a Plugin
1 participant