A batch analytics platform with a 3-layer data engineering pipeline (Raw → Staging → Analytics) that analyzes trending GitHub repositories across 3 programming languages (Python, TypeScript/Next.js, and Go) plus the Render ecosystem. Leverages Render Workflows' distributed task execution to process data in parallel, storing results in a dimensional model for high-performance analytics.
- Multi-Language Analysis: Tracks Python, TypeScript/Next.js, Go, and Render ecosystem repositories
- 3-Layer Data Pipeline: Raw ingestion → Staging validation → Analytics dimensional model
- Parallel Processing: 4 concurrent workflow tasks using Render Workflows SDK
- Render Ecosystem Spotlight: Dedicated showcase for Render-deployed projects (identified by
language='render') - Real-time Dashboard: Next.js 14 dashboard with analytics visualizations
- Hourly Updates: Automated cron job triggers workflow execution
Render Repos Identification: Repositories with render.yaml in their root directory are assigned language='render' (lowercase) in the database, allowing clean identification without a separate boolean flag. This simplifies queries and eliminates the need for maintaining dual identification logic.
graph TD
A[Cron Job Hourly] --> B[Workflow Orchestrator]
B --> C[Python Analyzer]
B --> D[TypeScript Analyzer]
B --> E[Go Analyzer]
B --> F[Render Ecosystem]
C --> G[Raw Layer JSONB]
D --> G
E --> G
F --> G
G --> H[Staging Layer Validated]
H --> I[Analytics Layer Fact/Dim]
I --> J[Next.js Dashboard]
Backend (Workflows)
- Python 3.11+
- Render Workflows SDK with
@taskdecorators - asyncpg for PostgreSQL
- aiohttp for async API calls
- GitHub REST API
Frontend (Dashboard)
- Next.js 14.2 (App Router)
- TypeScript
- Tailwind CSS
- Recharts for visualizations
- PostgreSQL (pg)
Infrastructure
- Render Workflows (task execution)
- Render Cron Job (hourly trigger)
- Render Web Service (Next.js dashboard)
- Render PostgreSQL (data storage)
trender/
├── workflows/
│ ├── workflow.py # Main workflow with @task decorators
│ ├── github_api.py # Async GitHub API client
│ ├── connections.py # Shared resource management
│ ├── render_detection.py # Render usage detection
│ ├── etl/
│ │ └── extract.py # Raw layer extraction
│ └── requirements.txt
├── trigger/
│ ├── trigger.py # Cron trigger script
│ └── requirements.txt
├── dashboard/
│ ├── app/ # Next.js App Router pages
│ ├── components/ # Reusable UI components
│ ├── lib/
│ │ ├── db.ts # Database utilities
│ │ └── formatters.ts # Data formatting helpers
│ └── package.json
├── database/
│ ├── schema/
│ │ ├── 01_raw_layer.sql
│ │ ├── 02_staging_layer.sql
│ │ ├── 03_analytics_layer.sql
│ │ └── 04_views.sql
│ └── init.sql
├── render.yaml
├── .env.example
└── README.md
If you've already completed the setup and just want to trigger a workflow run:
# Navigate to trigger directory
cd trigger
# Set environment variables
export RENDER_API_KEY=your_api_key
export RENDER_WORKFLOW_SLUG=trender-wf
# Install dependencies and run
pip install -r requirements.txt
python trigger.pyOr use the Render Dashboard: Workflows → trender-wf → Tasks → main_analysis_task → Run Task
- GitHub authentication (Personal Access Token or OAuth App - covered in step 2)
- Render account
- Node.js 18+ (for dashboard)
- Python 3.11+ (for workflows)
git clone <your-repo-url>
cd trenderTrender needs a GitHub access token to fetch repository data. You can choose between two authentication methods:
Best for: Individual developers, quick setup, local development
This is the simplest method - just create a token from GitHub settings.
cd workflows
pip install -r requirements.txt
python auth_setup.py- Open https://github.com/settings/tokens/new in your browser
- Configure the token:
- Note:
Trender Analytics Access - Expiration:
No expiration(or your preference) - Scopes:
- ✓
repo(Full control of private repositories) - ✓
read:org(Read org and team membership)
- ✓
- Note:
- Click "Generate token"
- Copy the token (starts with
ghp_orgithub_pat_) - Paste it into the terminal when prompted
The script will verify your token and display:
✅ SUCCESS! Your GitHub access token (PAT):
============================================================
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
============================================================
Add this to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxAdd the token to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx✅ Done! Skip to Step 3.
Best for: Team setups, production deployments, requiring user authorization flow
- Go to https://github.com/settings/developers
- Click "New OAuth App"
- Fill in the details:
- Application name:
Trender Analytics - Homepage URL:
http://localhost:3000 - Authorization callback URL:
http://localhost:8000/callback
- Application name:
- Click "Register application"
- Note your Client ID (starts with
Ov23orIv1.) - Click "Generate a new client secret" and save it
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_herecd workflows
pip install -r requirements.txt
python auth_setup.pyChoose option [2] for OAuth, then:
- The script starts a local server on port 8000
- Your browser opens to GitHub's authorization page
- Click "Authorize" to approve
- The script exchanges the auth code for a token
- Your
GITHUB_ACCESS_TOKENis displayed
Add the token to your .env file:
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- ✅ Tokens don't expire (unless you set expiration on PAT)
- ✅ Never commit tokens to version control (
.envis in.gitignore) - ✅ Token scopes:
repoandread:orgonly - ✅ Revoke access anytime at https://github.com/settings/tokens
⚠️ Treat tokens like passwords
PAT Issues:
- Token doesn't start with
ghp_: Classic tokens start withghp_, fine-grained tokens withgithub_pat_ - API returns 401: Token may be expired or revoked. Generate a new one.
- Rate limit errors: Ensure token has proper scopes selected
OAuth Issues:
- Port 8000 in use: Run
lsof -ti:8000 | xargs kill -9, then try again - "Redirect URI mismatch": Ensure callback URL in OAuth app is exactly
http://localhost:8000/callback - Browser doesn't open: Manually visit the URL shown in the terminal
- "Bad verification code": Code expires quickly. Run
python auth_setup.pyagain
Both Methods:
- Token verification fails: Check your internet connection
- Need to regenerate: Revoke old token at https://github.com/settings/tokens and generate new one
cp .env.example .env
# Edit .env with your credentialsYour .env file should now contain (from step 2):
If you used PAT (Option A):
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxIf you used OAuth (Option B):
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_here
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxOther required variables (add as you complete the setup):
DATABASE_URL: PostgreSQL connection string (from step 4)RENDER_API_KEY: Render API key (from https://dashboard.render.com/u/settings#api-keys)RENDER_WORKFLOW_SLUG:trender-wf(or your workflow slug from step 6)
- Go to Render Dashboard
- Create new PostgreSQL database:
- Name:
trender-db - Database Name:
trender - Plan:
basic_256mb(or higher for production)
- Name:
- Note the connection string for
DATABASE_URL
The db_setup.sh script provides a user-friendly way to initialize the database with connection checking, error handling, and colored output:
# Make the script executable (first time only)
chmod +x bin/db_setup.sh
# Run the setup script
# The script will automatically load DATABASE_URL from your .env file
./bin/db_setup.sh
# Or provide DATABASE_URL directly:
DATABASE_URL=YOUR_DATABASE_URL ./bin/db_setup.shExpected output:
🚀 Trender Database Setup
==========================
📄 Loading environment variables from .env file...
✓ Environment variables loaded
📁 Project root: /path/to/trender
🔍 Checking database connection...
✓ Database connection successful
📊 Initializing database schema...
Running: database/init.sql
Creating Raw Layer tables...
Creating Staging Layer tables...
Creating Analytics Layer tables...
Creating Analytics Views...
✅ Database setup completed successfully!
# Connect to your Render PostgreSQL instance and run the initialization script
cd database
psql $DATABASE_URL -f init.sqlIf you prefer to run the schema files one at a time:
cd database
psql $DATABASE_URL -f schema/01_raw_layer.sql
psql $DATABASE_URL -f schema/02_staging_layer.sql
psql $DATABASE_URL -f schema/03_analytics_layer.sql
psql $DATABASE_URL -f schema/04_views.sqlRaw Layer:
raw_github_repos: Stores complete GitHub API responses (JSONB format)raw_repo_metrics: Stores repository metrics (commits, issues, contributors)
Staging Layer:
stg_repos_validated: Cleaned and validated repository datastg_render_enrichment: Render-specific metadata (service types, complexity, categories)
Analytics Layer:
- Dimension tables:
dim_repositories: Repository master data with SCD Type 2 historydim_languages: Language metadatadim_render_services: Render service type reference data (web, worker, cron, etc.)
- Fact tables:
fact_repo_snapshots: Daily snapshots of repo metrics and momentum scoresfact_render_usage: Render service adoption by repository
Views:
analytics_trending_repos_current: Current top trending repos across all languagesanalytics_render_showcase: Render ecosystem showcase with enrichmentanalytics_language_rankings: Per-language rankings with Render adoption statsanalytics_render_services_adoption: Service type adoption statisticsanalytics_language_trends: Language-level aggregated statisticsanalytics_repo_history: Historical trends for charting
Total: 9 tables + 6 views
Check that all tables were created successfully:
psql $DATABASE_URL -c "\dt"You should see 9 tables across the raw, stg, dim, and fact prefixes.
If you're upgrading from an older version that had workflow execution tracking, run the cleanup script:
psql $DATABASE_URL -f database/cleanup_workflow_tracking.sqlThis removes the unused fact_workflow_executions table and analytics_workflow_performance view.
- "DATABASE_URL not set": Ensure you have a
.envfile withDATABASE_URLor export it in your shell - "Could not connect to database": Verify your
DATABASE_URLis correct and the Render PostgreSQL instance is active - Permission denied: Make sure you're using the connection string with full admin privileges
- Tables already exist: Drop the database and recreate it, or use
DROP TABLE IF EXISTSstatements - "No such file or directory" errors: Make sure you're running from the correct directory (use the
db_setup.shscript to avoid this issue)
The render.yaml file defines:
- Web Service: Next.js dashboard (
trender-dashboard) - Workflow: Main analytics pipeline (
trender-wf) - Cron Job: Hourly workflow trigger (
trender-analyzer-cron) - Database: PostgreSQL instance (
trender-db)
Deploy to Render:
- Push your code to GitHub
- In Render Dashboard, click "New +" → "Blueprint"
- Connect your GitHub repository
- Render will automatically detect and deploy all services from
render.yaml
Or use the Render CLI:
render blueprint launchAfter deploying via render.yaml, add your GitHub access token to the workflow service (trender-wf) in the Render Dashboard:
- Go to your
trender-wfworkflow in Render Dashboard - Navigate to Environment tab
- Add:
GITHUB_ACCESS_TOKEN: The token you generated in step 2 (starts withghp_orgho_orgithub_pat_)DATABASE_URL: Automatically connected from the database (no action needed)
Important: After adding the token, trigger a manual deploy:
- Click "Manual Deploy" → "Clear build cache & deploy"
- This ensures the environment variables are available to your workflow tasks
Note: You only need GITHUB_ACCESS_TOKEN in Render. If you used OAuth, you don't need to add GITHUB_CLIENT_ID or GITHUB_CLIENT_SECRET to Render.
There are three ways to trigger a workflow run to populate data:
The trigger/trigger.py script uses the Render SDK to trigger workflows programmatically:
cd trigger
# Install dependencies
pip install -r requirements.txt
# Set required environment variables
export RENDER_API_KEY=your_render_api_key
export RENDER_WORKFLOW_SLUG=trender-wf # Your workflow slug from Render dashboard
# Run the trigger script
python trigger.pyExpected output:
Triggering task: trender-wf/main-analysis-task
✓ Workflow triggered successfully at 2026-01-23 12:00:00
Task Run ID: run_abc123xyz
Initial Status: running
- Go to Render Dashboard
- Navigate to Workflows section
- Select your
trender-wfworkflow - Click on the "main-analysis-task" task
- Click "Run Task" button
- Monitor the task execution in real-time
If you have the Render CLI installed:
# Install Render CLI (if not already installed)
npm install -g @render-inc/cli
# Login to Render
render login
# Trigger the workflow
render workflows trigger trender-wf main-analysis-taskCheck the workflow status:
- Via Dashboard: Go to Workflows → trender-wf → View recent runs
- Via Script: The trigger script outputs the Task Run ID
- Via Database: Query the
dim_repositoriestable to see loaded data:
psql $DATABASE_URL -c "SELECT language, COUNT(*) as count FROM dim_repositories WHERE is_current = TRUE GROUP BY language;"Expected workflow completion time: 10-20 seconds for ~150 repositories across 3 languages + Render ecosystem
- "RENDER_API_KEY not set": Export your API key from Render Settings
- "Task not found": Verify your workflow slug is
trender-wfand that the workflow is deployed - "Connection refused": Check that
DATABASE_URLis correct and the database is running - Workflow fails: Check the Render dashboard logs under Workflows → trender-wf → Logs for detailed error messages
- "GITHUB_ACCESS_TOKEN not set": Ensure you added the token to the workflow service environment variables (step 7)
Once the workflow completes, access your dashboard at:
https://trender-dashboard.onrender.com
You should see:
- Top trending repositories across Python, TypeScript, and Go
- Render ecosystem projects
- Momentum scores and analytics
- Historical trends
- Stores complete GitHub API responses
- Tables:
raw_github_repos,raw_repo_metrics - Purpose: Audit trail and reprocessing capability
- Cleaned and validated data
- Tables:
stg_repos_validated,stg_render_enrichment - Business rules applied
- Render enrichment data: service types, complexity scores, categories, blueprint indicators
- Dimensions:
dim_repositories,dim_languages,dim_render_services - Facts:
fact_repo_snapshots,fact_render_usage - Views: Pre-aggregated analytics for dashboard
- Render analytics: Service adoption metrics, complexity distributions, blueprint quality indicators
The workflow consists of 4 main tasks decorated with @task:
main_analysis_task: Orchestrator that spawns parallel tasks and coordinates the ETL pipelinefetch_language_repos: Fetches and stores trending repos for Python, TypeScript, or Goanalyze_repo_batch: Analyzes repos in batches of 10, enriching with detailed metricsfetch_render_repos: Fetches Render ecosystem repositories using multi-strategy search
The orchestrator runs 4 parallel tasks (3 languages + 1 Render ecosystem), then aggregates results through the ETL pipeline (Extract from staging → Calculate scores → Load to analytics).
The fetch_render_repos task uses a 2-strategy approach to discover Render projects:
- Code Search for render.yaml - Uses GitHub's code search API to find repositories with
render.yamlin the root directory only. This ensures we only capture repos that are properly configured for Render deployment. Results are sorted by stars descending. - Topic Search - Finds community repos tagged with
render-blueprintstopic
This approach ensures accuracy (only repos with actual render.yaml files) and maximizes coverage of community projects. When a Render repo is found, the system:
- Fetches and parses the
render.yamlfile to extract service configurations - Calculates complexity scores based on number and type of services
- Categorizes projects (official, community, blueprint)
- Stores enrichment data in
stg_render_enrichmenttable - Populates
fact_render_usagefor service adoption analytics
- Momentum Score: Composite score combining:
- 50% Normalized Stars: Stars normalized within dataset (general repos vs Render repos scored separately)
- 50% Recency Score: Based on repository creation date
- 1.0 for repos ≤ 30 days old
- 0.75 for repos 31-60 days old
- 0.5 for repos 61-90 days old
- 0.0 for repos > 90 days old
- Note: Activity metrics (commits, issues, contributors) are collected but not used in scoring
cd workflows
pip install -r requirements.txt
python workflow.pycd dashboard
npm install
npm run dev
# Access at http://localhost:3000If you need to recreate or update the schema:
Option 1: Using the setup script (Recommended)
./bin/db_setup.shOption 2: Using psql directly
cd database
psql $DATABASE_URL -f init.sqlOption 3: Run individual schema files
cd database
psql $DATABASE_URL -f schema/01_raw_layer.sql
psql $DATABASE_URL -f schema/02_staging_layer.sql
psql $DATABASE_URL -f schema/03_analytics_layer.sql
psql $DATABASE_URL -f schema/04_views.sqlIf upgrading from an older version, apply the cleanup migration:
psql $DATABASE_URL -f database/cleanup_workflow_tracking.sqlTechnical:
- Process 150 repos across 3 languages + Render ecosystem in 10-20 seconds
- 4x parallel task execution (Python, TypeScript, Go, Render)
- 3-layer data pipeline with dimensional modeling (9 tables + 6 views)
- Accurate Render discovery using code search (only root-level render.yaml files)
Marketing:
- Showcase trending Render ecosystem projects (render.yaml repositories)
- Highlight momentum scores combining stars and recency
- Identify case study candidates with high engagement
- Track Render service adoption patterns (web, worker, cron, etc.)
MIT
Contributions welcome! Please open an issue or submit a pull request.