Skip to content

Comprehensive Playwright scraper for Roblox Creator Exchange game intelligence and market analysis

Notifications You must be signed in to change notification settings

zebutron/ce-rblx-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Creator Exchange Roblox Scraper

An authenticated Playwright/Headless Chrome scraper for extracting game KPIs from the Roblox Creator Exchange platform.

Features

  • Authenticated Scraping: Supports both cookie and username/password authentication
  • Robust Error Handling: Built-in retry logic and comprehensive error reporting
  • Multiple Output Formats: JSON and CSV export capabilities
  • Performance Monitoring: Detailed metrics and logging
  • Rate Limiting: Respects platform limits to avoid blocking
  • Headless/Visible Mode: Run with or without browser UI for debugging

Installation

  1. Clone the repository:
git clone <repository-url>
cd ce-rblx-scraper
  1. Install dependencies:
npm install
  1. Install Playwright browsers:
npm run install-browsers
  1. Set up environment variables:
cp env.template .env
# Edit .env with your credentials

Configuration

Environment Variables

Create a .env file in the project root with the following variables:

# Authentication (choose one method)
ROBLOX_USERNAME=your_roblox_username
ROBLOX_PASSWORD=your_roblox_password
# OR
ROBLOX_COOKIE=your_roblox_roblosecurity_cookie

# Scraping Configuration
HEADLESS=true
TIMEOUT=30000
RETRY_ATTEMPTS=3
DELAY_BETWEEN_REQUESTS=2000

# Logging
LOG_LEVEL=info
LOG_FILE=logs/scraper.log

# Output
OUTPUT_FORMAT=json
OUTPUT_FILE=data/creator_exchange_data.json

Authentication Methods

Method 1: Cookie Authentication (Recommended)

  1. Log into Roblox in your browser
  2. Open Developer Tools (F12)
  3. Go to Application/Storage → Cookies → roblox.com
  4. Copy the value of the .ROBLOSECURITY cookie
  5. Set ROBLOX_COOKIE in your .env file

Method 2: Username/Password

Set both ROBLOX_USERNAME and ROBLOX_PASSWORD in your .env file.

Note: Cookie authentication is more reliable and bypasses 2FA/Captcha requirements.

Usage

Basic Usage

# Run full scrape
npm start

# Run test scrape (limited data)
npm test

# Run with visible browser (for debugging)
HEADLESS=false npm start

Programmatic Usage

import { ScraperApp } from './src/index.js';

const app = new ScraperApp();

try {
  const result = await app.run({
    testMode: false,
    maxPages: 5
  });
  
  console.log(`Scraped ${result.successfulRecords} games`);
  console.log(`Success rate: ${result.successRate.toFixed(2)}%`);
} catch (error) {
  console.error('Scraping failed:', error.message);
}

Data Structure

Game KPIs

The scraper extracts the following data for each game:

{
  "id": 123456789,
  "name": "Game Name",
  "creator": "Creator Name",
  "creatorId": 987654321,
  "url": "https://www.roblox.com/games/123456789/game-name",
  "thumbnail": "https://...",
  "metrics": {
    "rpv": 0.15,           // Revenue Per Visit
    "visits": 1500000,     // Total visits
    "likes": 45000,        // Thumbs up
    "dislikes": 2000,      // Thumbs down
    "favorites": 12000,    // Favorites count
    "rating": 4.2,         // Overall rating
    "playersOnline": 1500, // Current players
    "maxPlayers": 50,      // Max server size
    "likeRatio": 95.7,     // Calculated like percentage
    "engagementScore": 3.8 // Calculated engagement metric
  },
  "metadata": {
    "genre": "Adventure",
    "isSponsored": false,
    "isPremium": false,
    "created": "2023-01-15T10:00:00Z",
    "updated": "2023-12-01T15:30:00Z",
    "scrapedAt": "2023-12-15T09:45:00Z",
    "source": "creator-exchange"
  }
}

Scrape Results

Complete scraping sessions are saved with metadata:

{
  "sessionId": "scrape_1703501234567_abc123",
  "startTime": "2023-12-15T09:30:00Z",
  "endTime": "2023-12-15T09:45:00Z",
  "duration": 900000,
  "url": "https://creatorexchange.io/sorts?sort=top_rpv",
  "summary": {
    "totalRecords": 150,
    "successfulRecords": 147,
    "errorCount": 3,
    "successRate": 98.0
  },
  "games": [...], // Array of game objects
  "errors": [...] // Array of error objects
}

Output Files

  • JSON: data/creator_exchange_data.json - Complete structured data
  • CSV: data/creator_exchange_data.csv - Tabular format for analysis
  • Logs: logs/scraper.log - Detailed operation logs

Error Handling

The scraper includes comprehensive error handling:

  • Retry Logic: Automatic retries with exponential backoff
  • Authentication Recovery: Automatic re-authentication on session expiry
  • Graceful Degradation: Continues scraping even if some games fail
  • Detailed Logging: All errors are logged with context

Rate Limiting

Built-in rate limiting prevents overwhelming the target servers:

  • Default: 30 requests per minute
  • Configurable delay between requests
  • Automatic backoff on rate limit detection

Troubleshooting

Common Issues

  1. Authentication Failed

    • Verify your cookie/credentials are correct
    • Try using cookie authentication instead of username/password
    • Check if 2FA is enabled (use cookie method)
  2. No Games Found

    • The page structure may have changed
    • Check if you're accessing the correct URL
    • Run with HEADLESS=false to see what's happening
  3. Timeout Errors

    • Increase TIMEOUT value in .env
    • Check your internet connection
    • The target site may be slow
  4. Permission Errors

    • Ensure the data/ and logs/ directories are writable
    • Check file permissions

Debug Mode

Run with visible browser to debug issues:

HEADLESS=false LOG_LEVEL=debug npm start

Logs

Check the log file for detailed information:

tail -f logs/scraper.log

Development

Project Structure

src/
├── index.js      # Main entry point
├── config.js     # Configuration management
├── auth.js       # Authentication handling
├── scraper.js    # Core scraping logic
├── models.js     # Data models and utilities
├── logger.js     # Logging configuration
└── utils.js      # Utility functions

data/             # Output data files
logs/             # Log files

Adding New Features

  1. New KPI Fields: Update the GameKPI model in src/models.js
  2. Different URLs: Modify the extraction logic in src/scraper.js
  3. Export Formats: Add new methods to FileUtils in src/utils.js

Legal and Ethical Considerations

  • Respect robots.txt: Always check and respect the site's robots.txt
  • Rate Limiting: Don't overwhelm servers with requests
  • Terms of Service: Ensure compliance with platform terms
  • Data Usage: Use scraped data responsibly and ethically

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the logs for error details
  3. Open an issue with detailed information about the problem

Disclaimer: This tool is for educational and research purposes. Users are responsible for ensuring compliance with all applicable terms of service and legal requirements.

About

Comprehensive Playwright scraper for Roblox Creator Exchange game intelligence and market analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published