Add comprehensive technical documentation for the OCR system architecture, covering: - Layered architecture design - Provider system implementation - Data flow and type system - Configuration management - Error handling and security - Development guidelines The documentation was automatically generated based on code analysis and reflects the current implementation state.
8.8 KiB
Note
This technical documentation was automatically generated by Claude Code based on analysis of the current OCR implementation in the codebase. The content reflects the architecture as of the current branch state.
OCR Architecture
Overview
Cherry Studio's OCR (Optical Character Recognition) system is a modular, extensible architecture designed to support multiple OCR providers and file types. The architecture follows a layered approach with clear separation of concerns between data access, business logic, and provider implementations.
Architecture Layers
The OCR architecture follows a strict layered approach where each layer only communicates with adjacent layers:
1. API Layer
Location: src/main/data/api/handlers/
- IPC Bridge: Secure communication between main and renderer processes
- Type Safety: Zod schemas for request/response validation
- Error Handling: Centralized error propagation
- Entry Point: All OCR operations enter through this layer
2. OCR Service Layer
Location: src/main/services/ocr/
- OcrService: Main business logic orchestrator and central coordinator
- Provider Registry: Manages registered OCR providers
- Data Integration: Direct interaction with data layer for provider management
- Lifecycle Management: Handles provider initialization and disposal
- Validation: Ensures provider availability and data integrity
- Orchestration: Coordinates between providers and data services
3. Provider Services Layer
Location: src/main/services/ocr/builtin/
- Base Service: Abstract
OcrBaseServicedefines common interface - Data Independence: No direct database interactions, relies on injected data
- Built-in Providers:
TesseractService: Local Tesseract.js implementationSystemOcrService: Platform-specific system OCRPpocrService: PaddleOCR integrationOvOcrService: Intel OpenVINO (NPU) OCR
- Pure OCR Logic: Focus solely on OCR processing capabilities
4. Data Layer
Location: src/main/data/db/schemas/ocr/, src/main/data/repositories/
- Database Schema: Uses Drizzle ORM with SQLite database
- Repository Pattern:
OcrProviderRepositoryhandles all database operations - Provider Storage: Stores provider configurations in
ocr_providertable - JSON Configuration: Polymorphic
configfield stores provider-specific settings - Data Access: Exclusively accessed by OCR Service layer
5. Frontend Layer
Location: src/renderer/src/services/ocr/, src/renderer/src/hooks/ocr/
- Service Facade: Frontend OCR service for IPC communication
- React Hooks: Custom hooks for OCR operations and state management
- Configuration UI: Settings pages for provider configuration
Data Flow
graph TD
A[Frontend UI] --> B[OcrService Renderer]
B --> C[API Layer - IPC Handler]
C --> D[OCR Service Layer]
D --> E[Data Layer - Provider Repository]
D --> F[Provider Services Layer]
F --> G[OCR Processing]
G --> H[Result]
H --> F
F --> D
D --> C
C --> B
B --> A
style D fill:#e1f5fe
style F fill:#f3e5f5
style E fill:#e8f5e8
Provider System
Provider Registration
- Built-in Providers: Automatically registered on service initialization
- Custom Providers: Support for extensible provider system
- Configuration: Each provider has its own configuration schema
Provider Capabilities
interface OcrProviderCapabilityRecord {
image?: boolean // Image file OCR support
pdf?: boolean // PDF file OCR support (future)
}
Configuration Architecture
- Polymorphic Config: JSON-based configuration adapts to provider needs
- Type Safety: Zod schemas validate provider-specific configurations
- Runtime Validation: Configuration validation before OCR operations
Type System
Core Types
OcrProvider: Base provider interfaceOcrParams: OCR operation parametersOcrResult: Standardized OCR result formatSupportedOcrFile: File types supported for OCR
Business Types
OcrProviderBusiness: Domain-level provider representation- Operations: Create, Update, Replace, Delete operations
- Queries: List providers with filtering options
Provider-Specific Types
- TesseractConfig: Language selection, model paths
- SystemOcrConfig: Language preferences
- PaddleOCRConfig: API endpoints, authentication
- OpenVINOConfig: Device selection, model paths
Built-in Providers
Tesseract OCR
- Engine: Tesseract.js
- Languages: Multi-language support with automatic download
- Configuration: Language selection, cache management
- Performance: Worker pooling for concurrent processing
System OCR
- Windows: Windows Media Foundation OCR
- macOS: Vision framework OCR
- Linux: Platform-specific implementations
- Features: Native performance, system integration
PaddleOCR
- Deployment: Remote API integration
- Languages: Chinese, English, and mixed language support
- Configuration: API endpoints and authentication
Intel OpenVINO OCR
- Hardware: NPU acceleration support
- Performance: Optimized for Intel hardware
- Use Case: High-performance OCR scenarios
Configuration Management
Database Schema
CREATE TABLE ocr_provider (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
capabilities TEXT NOT NULL, -- JSON
config TEXT NOT NULL, -- JSON
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
Provider Defaults
- Initial Configuration: Defined in
packages/shared/config/ocr.ts - Migration System: Automatic provider initialization on startup
- User Customization: Runtime configuration updates
Error Handling
Error Categories
- Provider Errors: OCR engine failures, missing dependencies
- Configuration Errors: Invalid settings, missing parameters
- File Errors: Unsupported formats, corrupted files
- System Errors: Resource exhaustion, permissions
Error Propagation
- Logging: Centralized logging with context
- User Feedback: Translated error messages
- Recovery: Graceful fallback options
Performance Considerations
Resource Management
- Worker Disposal: Proper cleanup of OCR workers
- Memory Management: Limits on file sizes and concurrent operations
- Caching: Model and result caching where applicable
Optimization
- Lazy Loading: Providers initialized on demand
- Concurrent Processing: Multiple workers for parallel operations
- Hardware Acceleration: NPU and GPU support where available
Security
Input Validation
- File Type Checking: Strict validation of supported formats
- Size Limits: Protection against resource exhaustion
- Path Validation: Prevention of path traversal attacks
Configuration Security
- API Key Storage: Secure storage of sensitive configuration
- Validation: Runtime validation of configuration parameters
- Sandboxing: Isolated execution of OCR operations
Extension Points
Custom Providers
- Interface: Implement
OcrBaseServicefor new providers - Registration: Dynamic provider registration system
- Configuration: Extensible configuration schemas
File Type Support
- Handlers: Modular file type processors
- Capabilities: Declarative provider capabilities
- Future Support: PDF, document formats planned
Migration Strategy
Legacy System
- Data Migration: Automatic migration from old configuration formats
- Compatibility: Backward compatibility during transition
- Testing: Comprehensive test coverage for migration paths
Future Enhancements
- PDF Support: Planned extension to document OCR
- Cloud Providers: API-based OCR services integration
- AI Enhancement: Post-processing and accuracy improvements
Development Guidelines
Adding New Providers
- Create provider service extending
OcrBaseService - Define provider-specific configuration schema
- Register provider in
OcrService - Add configuration UI components
- Include comprehensive tests
Warning
Provider services should never directly access the data layer. All data operations must go through the OCR Service layer to maintain proper separation of concerns.
Configuration Changes
- Update provider configuration schema
- Add migration logic for existing configurations
- Update UI validation and error handling
- Test with various configuration scenarios
Warning
Always validate configuration changes before saving to the database. Use Zod schemas for runtime validation to prevent corrupted provider configurations.
Testing
- Unit Tests: Provider implementation testing
- Integration Tests: End-to-end OCR workflows
- Performance Tests: Resource usage and timing
- Error Scenarios: Comprehensive error handling testing