mirror of https://github.com/CherryHQ/cherry-studio.git synced 2026-01-01 09:49:03 +08:00

feat: add utility functions for merging models and providers, including deep merge capabilities

- Implemented mergeObjects function to smartly merge objects, preserving existing values and allowing for configurable overwrite options.
- Added mergeModelsList and mergeProvidersList functions to handle merging of model and provider lists, respectively, with case-insensitive ID matching.
- Introduced preset merge strategies for common use cases.
- Created a new API route for syncing provider models, handling data import and merge operations.
- Developed ModelEditForm and ProviderEditForm components for editing model and provider details, respectively, with form validation and state management.
- Added UI components for labels, selects, and notifications to enhance user experience.

2025-12-24 01:29:07 +08:00

11 KiB

Raw Blame History

Provider Model Synchronization Guide

This guide explains how to use the provider model synchronization system to automatically fetch and update model catalogs from provider APIs.

Overview

The synchronization system consists of three main components:

Provider API Configuration (models_api in providers.json)
Web UI Sync Button (Manual sync per provider)
Batch Sync Script (Automated sync for all providers)

Provider API Configuration

Schema

Each provider can have a models_api configuration:

{
  "id": "openrouter",
  "models_api": {
    "endpoints": [
      {
        "url": "https://openrouter.ai/api/v1/models",
        "endpoint_type": "CHAT_COMPLETIONS",
        "format": "OPENAI",
        "transformer": "openrouter"
      }
    ],
    "enabled": true,
    "update_frequency": "realtime",
    "last_synced": "2025-01-15T10:30:00.000Z"
  }
}

Fields

endpoints: Array of API endpoints to fetch models from
- url: Full API endpoint URL
- endpoint_type: Type of models (CHAT_COMPLETIONS, EMBEDDINGS, etc.)
- format: API format (OPENAI, ANTHROPIC, GEMINI)
- transformer: Optional custom transformer name (openrouter, aihubmix)
enabled: Whether sync is enabled for this provider
update_frequency: Suggested sync frequency
- realtime: Aggregators that change frequently (OpenRouter, AIHubMix)
- daily: Most official providers
- weekly: Stable providers
- manual: Manual sync only
last_synced: ISO timestamp of last successful sync (auto-updated)

Setup

Environment Variables

Most providers require API keys to list their models. Configure your API keys:

Copy the example file:

cd packages/catalog
cp .env.example .env

Edit .env and add your API keys:

# Official Providers
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
TOGETHER_API_KEY=...

# China Aggregators
DEEPSEEK_API_KEY=...
SILICON_API_KEY=...

Keep .env secure:
- Never commit .env to git (already in .gitignore)
- Use different keys for development and production
- Rotate keys periodically

API Key Format

Each provider has a corresponding environment variable:

Provider ID	Environment Variable	Example Format
openai	`OPENAI_API_KEY`	`sk-...`
groq	`GROQ_API_KEY`	`gsk_...`
deepseek	`DEEPSEEK_API_KEY`	`sk-...`
silicon	`SILICON_API_KEY`	`sk-...`
together	`TOGETHER_API_KEY`	`...`
mistral	`MISTRAL_API_KEY`	`...`
perplexity	`PERPLEXITY_API_KEY`	`pplx-...`

See .env.example for the complete list.

Usage

Method 1: Web UI (Per Provider)

Open the provider management page (/providers)
Find a provider with models_api enabled
Click the Sync button in the Actions column
Wait for the sync to complete (toast notification will show progress)
Review the statistics (fetched, new models, overrides)

Features:

Real-time progress feedback
Detailed statistics
Manual trigger control
Per-provider sync

Use Cases:

Testing new provider configurations
Emergency updates for specific providers
Validating API changes

Method 2: Batch Sync Script (All Providers)

Run the batch sync script to sync all providers at once:

cd packages/catalog
npm run sync:all

Features:

Syncs all providers with models_api.enabled = true
Skips OpenRouter and AIHubMix (use dedicated import scripts)
Adds delays to avoid rate limiting
Comprehensive progress logging
Summary statistics

Use Cases:

Scheduled updates (cron jobs, CI/CD)
Initial bulk import
Regular maintenance updates

Output Example:

============================================================
Batch Provider Model Sync
============================================================

Loading data files...

Loaded:
  - 51 providers
  - 604 models
  - 120 overrides

Providers to sync: 49
Skipping: openrouter, aihubmix (authoritative sources)

API Keys Status:
  ✓ Found: 12
  ✗ Missing: 37

Providers without API keys (will likely fail):
  - cherryin            (env: CHERRYIN_API_KEY)
  - silicon             (env: SILICON_API_KEY)
  ...

To configure API keys:
  1. Copy .env.example to .env
  2. Fill in your API keys
  3. Re-run this script

[deepseek] Syncing models...
  - Fetching from https://api.deepseek.com/v1/models
    ✓ Fetched 3 models
  + Adding 1 new models to models.json
  + Generated 2 new overrides

...

============================================================
Sync Summary
============================================================

Total providers: 49
  ✓ Successful: 47
  ✗ Failed: 2

Statistics:
  - Total models fetched: 520
  - New models added: 45
  - Overrides generated: 178
  - Overrides merged: 12

✓ Batch sync completed
============================================================

How It Works

Data Flow

Provider API → Transformer → ModelConfig
                                 ↓
                    Compare with models.json
                                 ↓
              ┌──────────────────┴─────────────────┐
              ↓                                     ↓
        New Model                            Existing Model
              ↓                                     ↓
    Add to models.json                    Generate Override
                                                    ↓
                                          Merge with existing
                                                    ↓
                                            Save to overrides.json

Override Generation

The system automatically generates overrides for all models supported by a provider, even if identical to the base model. This serves two purposes:

Provider Support Tracking: Mark which providers support which models
Difference Recording: Record any differences from the base model

Override Types:

Empty Override (identical models):
```
{
  "provider_id": "groq",
  "model_id": "llama-3.1-8b",
  "priority": 0
}
```
This marks that the provider supports the model with no differences.

Override with Differences:

{
  "provider_id": "provider-x",
  "model_id": "gpt-4",
  "priority": 0,
  "pricing": {
    "input": { "per_million_tokens": 5.0, "currency": "USD" },
    "output": { "per_million_tokens": 15.0, "currency": "USD" }
  },
  "limits": {
    "context_window": 32000
  }
}

Priority System:

priority < 100: Auto-generated overrides (replaced on sync)
priority >= 100: Manual overrides (preserved during sync)

Merge Strategy

When syncing:

New Models: Added directly to models.json
Existing Models with Differences: Override created/updated in overrides.json
Manual Overrides: Preserved (priority >= 100)
Auto Overrides: Replaced with latest data (priority < 100)

Transformers

Built-in Transformers

OpenAI-compatible (default): Standard OpenAI API format
- Used by most providers (deepseek, groq, together, etc.)
- Handles { data: [...] } responses
- Basic capability inference
OpenRouter: Custom transformer for OpenRouter aggregator
- Normalizes model IDs to lowercase
- Extracts provider from model ID format (openai/gpt-4)
- Advanced capability inference from supported_parameters
- Pricing conversion (per-token → per-million)
AIHubMix: Custom transformer for AIHubMix aggregator
- Normalizes model IDs to lowercase
- Parses CSV fields (types, features, input_modalities)
- Capability mapping (thinking → REASONING, etc.)
- Provider extraction from model ID

Adding Custom Transformers

To add a custom transformer:

Create src/utils/importers/{provider}/transformer.ts
Implement ITransformer interface
Update sync endpoint to use your transformer
Add transformer name to provider config

Example:

import type { ModelConfig } from '../../../schemas'
import type { ITransformer } from '../base/base-transformer'

export class CustomTransformer implements ITransformer<CustomModel> {
  extractModels(response: any): CustomModel[] {
    // Extract models from API response
  }

  transform(apiModel: CustomModel): ModelConfig {
    // Transform to internal format
  }
}

Best Practices

1. Authoritative Sources

OpenRouter and AIHubMix are treated as authoritative sources because:

They aggregate models from multiple providers
They have custom transformers with advanced logic

They should be imported using dedicated scripts:

npm run import:openrouter
npm run import:aihubmix

2. Sync Frequency

Recommended sync frequencies:

Provider Type	Frequency	Reason
Aggregators	Daily	Models change frequently
Official APIs	Weekly	Stable, infrequent updates
Beta/Experimental	Manual	May have unstable APIs

3. API Keys

Most providers require API keys for model listing:

For Batch Script:

Configure in .env file (see Setup section above)
Script will automatically use the appropriate key for each provider
Missing keys will trigger warnings but won't stop the sync

For Web UI:

Currently uses same .env file (server-side)
Future enhancement: API key input field in UI

4. Rate Limiting

The batch script includes:

1-second delay between providers
Error handling to continue on failures
Retry logic (future enhancement)

5. Manual Overrides

To create manual overrides that won't be replaced:

Set priority >= 100 in overrides.json
Add reason field to document why it's manual
These will be preserved during sync