An authenticated Playwright/Headless Chrome scraper for extracting game KPIs from the Roblox Creator Exchange platform.
- Authenticated Scraping: Supports both cookie and username/password authentication
- Robust Error Handling: Built-in retry logic and comprehensive error reporting
- Multiple Output Formats: JSON and CSV export capabilities
- Performance Monitoring: Detailed metrics and logging
- Rate Limiting: Respects platform limits to avoid blocking
- Headless/Visible Mode: Run with or without browser UI for debugging
- Clone the repository:
git clone <repository-url>
cd ce-rblx-scraper
- Install dependencies:
npm install
- Install Playwright browsers:
npm run install-browsers
- Set up environment variables:
cp env.template .env
# Edit .env with your credentials
Create a .env
file in the project root with the following variables:
# Authentication (choose one method)
ROBLOX_USERNAME=your_roblox_username
ROBLOX_PASSWORD=your_roblox_password
# OR
ROBLOX_COOKIE=your_roblox_roblosecurity_cookie
# Scraping Configuration
HEADLESS=true
TIMEOUT=30000
RETRY_ATTEMPTS=3
DELAY_BETWEEN_REQUESTS=2000
# Logging
LOG_LEVEL=info
LOG_FILE=logs/scraper.log
# Output
OUTPUT_FORMAT=json
OUTPUT_FILE=data/creator_exchange_data.json
- Log into Roblox in your browser
- Open Developer Tools (F12)
- Go to Application/Storage → Cookies → roblox.com
- Copy the value of the
.ROBLOSECURITY
cookie - Set
ROBLOX_COOKIE
in your.env
file
Set both ROBLOX_USERNAME
and ROBLOX_PASSWORD
in your .env
file.
Note: Cookie authentication is more reliable and bypasses 2FA/Captcha requirements.
# Run full scrape
npm start
# Run test scrape (limited data)
npm test
# Run with visible browser (for debugging)
HEADLESS=false npm start
import { ScraperApp } from './src/index.js';
const app = new ScraperApp();
try {
const result = await app.run({
testMode: false,
maxPages: 5
});
console.log(`Scraped ${result.successfulRecords} games`);
console.log(`Success rate: ${result.successRate.toFixed(2)}%`);
} catch (error) {
console.error('Scraping failed:', error.message);
}
The scraper extracts the following data for each game:
{
"id": 123456789,
"name": "Game Name",
"creator": "Creator Name",
"creatorId": 987654321,
"url": "https://www.roblox.com/games/123456789/game-name",
"thumbnail": "https://...",
"metrics": {
"rpv": 0.15, // Revenue Per Visit
"visits": 1500000, // Total visits
"likes": 45000, // Thumbs up
"dislikes": 2000, // Thumbs down
"favorites": 12000, // Favorites count
"rating": 4.2, // Overall rating
"playersOnline": 1500, // Current players
"maxPlayers": 50, // Max server size
"likeRatio": 95.7, // Calculated like percentage
"engagementScore": 3.8 // Calculated engagement metric
},
"metadata": {
"genre": "Adventure",
"isSponsored": false,
"isPremium": false,
"created": "2023-01-15T10:00:00Z",
"updated": "2023-12-01T15:30:00Z",
"scrapedAt": "2023-12-15T09:45:00Z",
"source": "creator-exchange"
}
}
Complete scraping sessions are saved with metadata:
{
"sessionId": "scrape_1703501234567_abc123",
"startTime": "2023-12-15T09:30:00Z",
"endTime": "2023-12-15T09:45:00Z",
"duration": 900000,
"url": "https://creatorexchange.io/sorts?sort=top_rpv",
"summary": {
"totalRecords": 150,
"successfulRecords": 147,
"errorCount": 3,
"successRate": 98.0
},
"games": [...], // Array of game objects
"errors": [...] // Array of error objects
}
- JSON:
data/creator_exchange_data.json
- Complete structured data - CSV:
data/creator_exchange_data.csv
- Tabular format for analysis - Logs:
logs/scraper.log
- Detailed operation logs
The scraper includes comprehensive error handling:
- Retry Logic: Automatic retries with exponential backoff
- Authentication Recovery: Automatic re-authentication on session expiry
- Graceful Degradation: Continues scraping even if some games fail
- Detailed Logging: All errors are logged with context
Built-in rate limiting prevents overwhelming the target servers:
- Default: 30 requests per minute
- Configurable delay between requests
- Automatic backoff on rate limit detection
-
Authentication Failed
- Verify your cookie/credentials are correct
- Try using cookie authentication instead of username/password
- Check if 2FA is enabled (use cookie method)
-
No Games Found
- The page structure may have changed
- Check if you're accessing the correct URL
- Run with
HEADLESS=false
to see what's happening
-
Timeout Errors
- Increase
TIMEOUT
value in.env
- Check your internet connection
- The target site may be slow
- Increase
-
Permission Errors
- Ensure the
data/
andlogs/
directories are writable - Check file permissions
- Ensure the
Run with visible browser to debug issues:
HEADLESS=false LOG_LEVEL=debug npm start
Check the log file for detailed information:
tail -f logs/scraper.log
src/
├── index.js # Main entry point
├── config.js # Configuration management
├── auth.js # Authentication handling
├── scraper.js # Core scraping logic
├── models.js # Data models and utilities
├── logger.js # Logging configuration
└── utils.js # Utility functions
data/ # Output data files
logs/ # Log files
- New KPI Fields: Update the
GameKPI
model insrc/models.js
- Different URLs: Modify the extraction logic in
src/scraper.js
- Export Formats: Add new methods to
FileUtils
insrc/utils.js
- Respect robots.txt: Always check and respect the site's robots.txt
- Rate Limiting: Don't overwhelm servers with requests
- Terms of Service: Ensure compliance with platform terms
- Data Usage: Use scraped data responsibly and ethically
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review the logs for error details
- Open an issue with detailed information about the problem
Disclaimer: This tool is for educational and research purposes. Users are responsible for ensuring compliance with all applicable terms of service and legal requirements.