# Provider Model Synchronization Guide This guide explains how to use the provider model synchronization system to automatically fetch and update model catalogs from provider APIs. ## Overview The synchronization system consists of three main components: 1. **Provider API Configuration** (`models_api` in providers.json) 2. **Web UI Sync Button** (Manual sync per provider) 3. **Batch Sync Script** (Automated sync for all providers) ## Provider API Configuration ### Schema Each provider can have a `models_api` configuration: ```json { "id": "openrouter", "models_api": { "endpoints": [ { "url": "https://openrouter.ai/api/v1/models", "endpoint_type": "CHAT_COMPLETIONS", "format": "OPENAI", "transformer": "openrouter" } ], "enabled": true, "update_frequency": "realtime", "last_synced": "2025-01-15T10:30:00.000Z" } } ``` ### Fields - **`endpoints`**: Array of API endpoints to fetch models from - `url`: Full API endpoint URL - `endpoint_type`: Type of models (CHAT_COMPLETIONS, EMBEDDINGS, etc.) - `format`: API format (OPENAI, ANTHROPIC, GEMINI) - `transformer`: Optional custom transformer name (openrouter, aihubmix) - **`enabled`**: Whether sync is enabled for this provider - **`update_frequency`**: Suggested sync frequency - `realtime`: Aggregators that change frequently (OpenRouter, AIHubMix) - `daily`: Most official providers - `weekly`: Stable providers - `manual`: Manual sync only - **`last_synced`**: ISO timestamp of last successful sync (auto-updated) ## Setup ### Environment Variables Most providers require API keys to list their models. Configure your API keys: 1. **Copy the example file:** ```bash cd packages/catalog cp .env.example .env ``` 2. **Edit `.env` and add your API keys:** ```bash # Official Providers OPENAI_API_KEY=sk-... GROQ_API_KEY=gsk_... TOGETHER_API_KEY=... # China Aggregators DEEPSEEK_API_KEY=... SILICON_API_KEY=... ``` 3. **Keep `.env` secure:** - Never commit `.env` to git (already in `.gitignore`) - Use different keys for development and production - Rotate keys periodically ### API Key Format Each provider has a corresponding environment variable: | Provider ID | Environment Variable | Example Format | |------------|---------------------|----------------| | openai | `OPENAI_API_KEY` | `sk-...` | | groq | `GROQ_API_KEY` | `gsk_...` | | deepseek | `DEEPSEEK_API_KEY` | `sk-...` | | silicon | `SILICON_API_KEY` | `sk-...` | | together | `TOGETHER_API_KEY` | `...` | | mistral | `MISTRAL_API_KEY` | `...` | | perplexity | `PERPLEXITY_API_KEY` | `pplx-...` | See `.env.example` for the complete list. ## Usage ### Method 1: Web UI (Per Provider) 1. Open the provider management page (`/providers`) 2. Find a provider with `models_api` enabled 3. Click the **Sync** button in the Actions column 4. Wait for the sync to complete (toast notification will show progress) 5. Review the statistics (fetched, new models, overrides) **Features:** - Real-time progress feedback - Detailed statistics - Manual trigger control - Per-provider sync **Use Cases:** - Testing new provider configurations - Emergency updates for specific providers - Validating API changes ### Method 2: Batch Sync Script (All Providers) Run the batch sync script to sync all providers at once: ```bash cd packages/catalog npm run sync:all ``` **Features:** - Syncs all providers with `models_api.enabled = true` - Skips OpenRouter and AIHubMix (use dedicated import scripts) - Adds delays to avoid rate limiting - Comprehensive progress logging - Summary statistics **Use Cases:** - Scheduled updates (cron jobs, CI/CD) - Initial bulk import - Regular maintenance updates **Output Example:** ``` ============================================================ Batch Provider Model Sync ============================================================ Loading data files... Loaded: - 51 providers - 604 models - 120 overrides Providers to sync: 49 Skipping: openrouter, aihubmix (authoritative sources) API Keys Status: ✓ Found: 12 ✗ Missing: 37 Providers without API keys (will likely fail): - cherryin (env: CHERRYIN_API_KEY) - silicon (env: SILICON_API_KEY) ... To configure API keys: 1. Copy .env.example to .env 2. Fill in your API keys 3. Re-run this script [deepseek] Syncing models... - Fetching from https://api.deepseek.com/v1/models ✓ Fetched 3 models + Adding 1 new models to models.json + Generated 2 new overrides ... ============================================================ Sync Summary ============================================================ Total providers: 49 ✓ Successful: 47 ✗ Failed: 2 Statistics: - Total models fetched: 520 - New models added: 45 - Overrides generated: 178 - Overrides merged: 12 ✓ Batch sync completed ============================================================ ``` ## How It Works ### Data Flow ``` Provider API → Transformer → ModelConfig ↓ Compare with models.json ↓ ┌──────────────────┴─────────────────┐ ↓ ↓ New Model Existing Model ↓ ↓ Add to models.json Generate Override ↓ Merge with existing ↓ Save to overrides.json ``` ### Override Generation The system automatically generates overrides for **all models** supported by a provider, even if identical to the base model. This serves two purposes: 1. **Provider Support Tracking**: Mark which providers support which models 2. **Difference Recording**: Record any differences from the base model **Override Types:** 1. **Empty Override** (identical models): ```json { "provider_id": "groq", "model_id": "llama-3.1-8b", "priority": 0 } ``` This marks that the provider supports the model with no differences. 2. **Override with Differences**: ```json { "provider_id": "provider-x", "model_id": "gpt-4", "priority": 0, "pricing": { "input": { "per_million_tokens": 5.0, "currency": "USD" }, "output": { "per_million_tokens": 15.0, "currency": "USD" } }, "limits": { "context_window": 32000 } } ``` **Priority System:** - `priority < 100`: Auto-generated overrides (replaced on sync) - `priority >= 100`: Manual overrides (preserved during sync) ### Merge Strategy When syncing: 1. **New Models**: Added directly to `models.json` 2. **Existing Models with Differences**: Override created/updated in `overrides.json` 3. **Manual Overrides**: Preserved (priority >= 100) 4. **Auto Overrides**: Replaced with latest data (priority < 100) ## Transformers ### Built-in Transformers 1. **OpenAI-compatible** (default): Standard OpenAI API format - Used by most providers (deepseek, groq, together, etc.) - Handles `{ data: [...] }` responses - Basic capability inference 2. **OpenRouter**: Custom transformer for OpenRouter aggregator - Normalizes model IDs to lowercase - Extracts provider from model ID format (`openai/gpt-4`) - Advanced capability inference from supported_parameters - Pricing conversion (per-token → per-million) 3. **AIHubMix**: Custom transformer for AIHubMix aggregator - Normalizes model IDs to lowercase - Parses CSV fields (types, features, input_modalities) - Capability mapping (thinking → REASONING, etc.) - Provider extraction from model ID ### Adding Custom Transformers To add a custom transformer: 1. Create `src/utils/importers/{provider}/transformer.ts` 2. Implement `ITransformer` interface 3. Update sync endpoint to use your transformer 4. Add transformer name to provider config Example: ```typescript import type { ModelConfig } from '../../../schemas' import type { ITransformer } from '../base/base-transformer' export class CustomTransformer implements ITransformer { extractModels(response: any): CustomModel[] { // Extract models from API response } transform(apiModel: CustomModel): ModelConfig { // Transform to internal format } } ``` ## Best Practices ### 1. Authoritative Sources OpenRouter and AIHubMix are treated as **authoritative sources** because: - They aggregate models from multiple providers - They have custom transformers with advanced logic - They should be imported using dedicated scripts: ```bash npm run import:openrouter npm run import:aihubmix ``` ### 2. Sync Frequency Recommended sync frequencies: | Provider Type | Frequency | Reason | |--------------|-----------|--------| | Aggregators | Daily | Models change frequently | | Official APIs | Weekly | Stable, infrequent updates | | Beta/Experimental | Manual | May have unstable APIs | ### 3. API Keys Most providers require API keys for model listing: **For Batch Script:** - Configure in `.env` file (see Setup section above) - Script will automatically use the appropriate key for each provider - Missing keys will trigger warnings but won't stop the sync **For Web UI:** - Currently uses same `.env` file (server-side) - Future enhancement: API key input field in UI ### 4. Rate Limiting The batch script includes: - 1-second delay between providers - Error handling to continue on failures - Retry logic (future enhancement) ### 5. Manual Overrides To create manual overrides that won't be replaced: 1. Set `priority >= 100` in `overrides.json` 2. Add reason field to document why it's manual 3. These will be preserved during sync Example: ```json { "provider_id": "custom-provider", "model_id": "special-model", "priority": 100, "reason": "Custom pricing negotiated with provider", "pricing": { "input": { "per_million_tokens": 1.0, "currency": "USD" }, "output": { "per_million_tokens": 2.0, "currency": "USD" } } } ``` ## Troubleshooting ### Provider Sync Fails 1. Check if `models_api.enabled = true` 2. Verify API endpoint URL is accessible 3. Check if API key is required 4. Review transformer compatibility ### Models Not Appearing 1. Check if model IDs are normalized to lowercase 2. Verify transformer is extracting models correctly 3. Check console logs for transformation errors ### Overrides Not Generated 1. Verify model exists in base `models.json` 2. Check if differences actually exist (pricing, capabilities, etc.) 3. Review merge strategy settings ## Future Enhancements - [ ] API key management in Web UI - [ ] Scheduled sync (cron-style) - [ ] Sync history and audit log - [ ] Conflict resolution UI - [ ] Retry logic with exponential backoff - [ ] Webhook notifications - [ ] Differential sync (only changed models) - [ ] Provider-specific transformers registry