cherry-studio/packages/catalog/docs/SYNC_GUIDE.md
suyao 5b009769c3
feat: add utility functions for merging models and providers, including deep merge capabilities
- Implemented mergeObjects function to smartly merge objects, preserving existing values and allowing for configurable overwrite options.
- Added mergeModelsList and mergeProvidersList functions to handle merging of model and provider lists, respectively, with case-insensitive ID matching.
- Introduced preset merge strategies for common use cases.
- Created a new API route for syncing provider models, handling data import and merge operations.
- Developed ModelEditForm and ProviderEditForm components for editing model and provider details, respectively, with form validation and state management.
- Added UI components for labels, selects, and notifications to enhance user experience.
2025-12-24 01:29:07 +08:00

11 KiB

Provider Model Synchronization Guide

This guide explains how to use the provider model synchronization system to automatically fetch and update model catalogs from provider APIs.

Overview

The synchronization system consists of three main components:

  1. Provider API Configuration (models_api in providers.json)
  2. Web UI Sync Button (Manual sync per provider)
  3. Batch Sync Script (Automated sync for all providers)

Provider API Configuration

Schema

Each provider can have a models_api configuration:

{
  "id": "openrouter",
  "models_api": {
    "endpoints": [
      {
        "url": "https://openrouter.ai/api/v1/models",
        "endpoint_type": "CHAT_COMPLETIONS",
        "format": "OPENAI",
        "transformer": "openrouter"
      }
    ],
    "enabled": true,
    "update_frequency": "realtime",
    "last_synced": "2025-01-15T10:30:00.000Z"
  }
}

Fields

  • endpoints: Array of API endpoints to fetch models from

    • url: Full API endpoint URL
    • endpoint_type: Type of models (CHAT_COMPLETIONS, EMBEDDINGS, etc.)
    • format: API format (OPENAI, ANTHROPIC, GEMINI)
    • transformer: Optional custom transformer name (openrouter, aihubmix)
  • enabled: Whether sync is enabled for this provider

  • update_frequency: Suggested sync frequency

    • realtime: Aggregators that change frequently (OpenRouter, AIHubMix)
    • daily: Most official providers
    • weekly: Stable providers
    • manual: Manual sync only
  • last_synced: ISO timestamp of last successful sync (auto-updated)

Setup

Environment Variables

Most providers require API keys to list their models. Configure your API keys:

  1. Copy the example file:

    cd packages/catalog
    cp .env.example .env
    
  2. Edit .env and add your API keys:

    # Official Providers
    OPENAI_API_KEY=sk-...
    GROQ_API_KEY=gsk_...
    TOGETHER_API_KEY=...
    
    # China Aggregators
    DEEPSEEK_API_KEY=...
    SILICON_API_KEY=...
    
  3. Keep .env secure:

    • Never commit .env to git (already in .gitignore)
    • Use different keys for development and production
    • Rotate keys periodically

API Key Format

Each provider has a corresponding environment variable:

Provider ID Environment Variable Example Format
openai OPENAI_API_KEY sk-...
groq GROQ_API_KEY gsk_...
deepseek DEEPSEEK_API_KEY sk-...
silicon SILICON_API_KEY sk-...
together TOGETHER_API_KEY ...
mistral MISTRAL_API_KEY ...
perplexity PERPLEXITY_API_KEY pplx-...

See .env.example for the complete list.

Usage

Method 1: Web UI (Per Provider)

  1. Open the provider management page (/providers)
  2. Find a provider with models_api enabled
  3. Click the Sync button in the Actions column
  4. Wait for the sync to complete (toast notification will show progress)
  5. Review the statistics (fetched, new models, overrides)

Features:

  • Real-time progress feedback
  • Detailed statistics
  • Manual trigger control
  • Per-provider sync

Use Cases:

  • Testing new provider configurations
  • Emergency updates for specific providers
  • Validating API changes

Method 2: Batch Sync Script (All Providers)

Run the batch sync script to sync all providers at once:

cd packages/catalog
npm run sync:all

Features:

  • Syncs all providers with models_api.enabled = true
  • Skips OpenRouter and AIHubMix (use dedicated import scripts)
  • Adds delays to avoid rate limiting
  • Comprehensive progress logging
  • Summary statistics

Use Cases:

  • Scheduled updates (cron jobs, CI/CD)
  • Initial bulk import
  • Regular maintenance updates

Output Example:

============================================================
Batch Provider Model Sync
============================================================

Loading data files...

Loaded:
  - 51 providers
  - 604 models
  - 120 overrides

Providers to sync: 49
Skipping: openrouter, aihubmix (authoritative sources)

API Keys Status:
  ✓ Found: 12
  ✗ Missing: 37

Providers without API keys (will likely fail):
  - cherryin            (env: CHERRYIN_API_KEY)
  - silicon             (env: SILICON_API_KEY)
  ...

To configure API keys:
  1. Copy .env.example to .env
  2. Fill in your API keys
  3. Re-run this script

[deepseek] Syncing models...
  - Fetching from https://api.deepseek.com/v1/models
    ✓ Fetched 3 models
  + Adding 1 new models to models.json
  + Generated 2 new overrides

...

============================================================
Sync Summary
============================================================

Total providers: 49
  ✓ Successful: 47
  ✗ Failed: 2

Statistics:
  - Total models fetched: 520
  - New models added: 45
  - Overrides generated: 178
  - Overrides merged: 12

✓ Batch sync completed
============================================================

How It Works

Data Flow

Provider API → Transformer → ModelConfig
                                 ↓
                    Compare with models.json
                                 ↓
              ┌──────────────────┴─────────────────┐
              ↓                                     ↓
        New Model                            Existing Model
              ↓                                     ↓
    Add to models.json                    Generate Override
                                                    ↓
                                          Merge with existing
                                                    ↓
                                            Save to overrides.json

Override Generation

The system automatically generates overrides for all models supported by a provider, even if identical to the base model. This serves two purposes:

  1. Provider Support Tracking: Mark which providers support which models
  2. Difference Recording: Record any differences from the base model

Override Types:

  1. Empty Override (identical models):

    {
      "provider_id": "groq",
      "model_id": "llama-3.1-8b",
      "priority": 0
    }
    

    This marks that the provider supports the model with no differences.

  2. Override with Differences:

    {
      "provider_id": "provider-x",
      "model_id": "gpt-4",
      "priority": 0,
      "pricing": {
        "input": { "per_million_tokens": 5.0, "currency": "USD" },
        "output": { "per_million_tokens": 15.0, "currency": "USD" }
      },
      "limits": {
        "context_window": 32000
      }
    }
    

Priority System:

  • priority < 100: Auto-generated overrides (replaced on sync)
  • priority >= 100: Manual overrides (preserved during sync)

Merge Strategy

When syncing:

  1. New Models: Added directly to models.json
  2. Existing Models with Differences: Override created/updated in overrides.json
  3. Manual Overrides: Preserved (priority >= 100)
  4. Auto Overrides: Replaced with latest data (priority < 100)

Transformers

Built-in Transformers

  1. OpenAI-compatible (default): Standard OpenAI API format

    • Used by most providers (deepseek, groq, together, etc.)
    • Handles { data: [...] } responses
    • Basic capability inference
  2. OpenRouter: Custom transformer for OpenRouter aggregator

    • Normalizes model IDs to lowercase
    • Extracts provider from model ID format (openai/gpt-4)
    • Advanced capability inference from supported_parameters
    • Pricing conversion (per-token → per-million)
  3. AIHubMix: Custom transformer for AIHubMix aggregator

    • Normalizes model IDs to lowercase
    • Parses CSV fields (types, features, input_modalities)
    • Capability mapping (thinking → REASONING, etc.)
    • Provider extraction from model ID

Adding Custom Transformers

To add a custom transformer:

  1. Create src/utils/importers/{provider}/transformer.ts
  2. Implement ITransformer interface
  3. Update sync endpoint to use your transformer
  4. Add transformer name to provider config

Example:

import type { ModelConfig } from '../../../schemas'
import type { ITransformer } from '../base/base-transformer'

export class CustomTransformer implements ITransformer<CustomModel> {
  extractModels(response: any): CustomModel[] {
    // Extract models from API response
  }

  transform(apiModel: CustomModel): ModelConfig {
    // Transform to internal format
  }
}

Best Practices

1. Authoritative Sources

OpenRouter and AIHubMix are treated as authoritative sources because:

  • They aggregate models from multiple providers
  • They have custom transformers with advanced logic
  • They should be imported using dedicated scripts:
    npm run import:openrouter
    npm run import:aihubmix
    

2. Sync Frequency

Recommended sync frequencies:

Provider Type Frequency Reason
Aggregators Daily Models change frequently
Official APIs Weekly Stable, infrequent updates
Beta/Experimental Manual May have unstable APIs

3. API Keys

Most providers require API keys for model listing:

For Batch Script:

  • Configure in .env file (see Setup section above)
  • Script will automatically use the appropriate key for each provider
  • Missing keys will trigger warnings but won't stop the sync

For Web UI:

  • Currently uses same .env file (server-side)
  • Future enhancement: API key input field in UI

4. Rate Limiting

The batch script includes:

  • 1-second delay between providers
  • Error handling to continue on failures
  • Retry logic (future enhancement)

5. Manual Overrides

To create manual overrides that won't be replaced:

  1. Set priority >= 100 in overrides.json
  2. Add reason field to document why it's manual
  3. These will be preserved during sync

Example:

{
  "provider_id": "custom-provider",
  "model_id": "special-model",
  "priority": 100,
  "reason": "Custom pricing negotiated with provider",
  "pricing": {
    "input": { "per_million_tokens": 1.0, "currency": "USD" },
    "output": { "per_million_tokens": 2.0, "currency": "USD" }
  }
}

Troubleshooting

Provider Sync Fails

  1. Check if models_api.enabled = true
  2. Verify API endpoint URL is accessible
  3. Check if API key is required
  4. Review transformer compatibility

Models Not Appearing

  1. Check if model IDs are normalized to lowercase
  2. Verify transformer is extracting models correctly
  3. Check console logs for transformation errors

Overrides Not Generated

  1. Verify model exists in base models.json
  2. Check if differences actually exist (pricing, capabilities, etc.)
  3. Review merge strategy settings

Future Enhancements

  • API key management in Web UI
  • Scheduled sync (cron-style)
  • Sync history and audit log
  • Conflict resolution UI
  • Retry logic with exponential backoff
  • Webhook notifications
  • Differential sync (only changed models)
  • Provider-specific transformers registry