🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
Go to file
Phantom 0af5a85f67
feat: Image OCR (#9409)
* build: 添加 tesseract.js 及其类型定义依赖

* feat(ocr): 添加OCR类型定义文件以支持OCR功能扩展

* feat(ocr): 添加 Tesseract OCR 提供程序配置

* feat(ocr): 添加Tesseract.js的logo

* refactor(settings): 重构文档预处理设置模块结构

将PreprocessSettings重命名为DocProcessSettings并调整文件结构
更新相关路由和组件引用以保持功能一致性

* refactor(config): 重命名OCR_PROVIDER_CONFIG为BUILTIN_OCR_PROVIDERS以更准确描述用途

* refactor(ocr): 更改文件名

* refactor(ocr): 将获取OCR提供商logo的功能移动到utils目录

将getOcrProviderLogo函数从config/ocr.ts移动到utils/ocr.ts,保持功能集中

* refactor(ocr): 重构OCR配置结构以支持默认提供者

将内置OCR提供者数组重构为单独定义的常量,并添加默认OCR提供者映射。这提高了代码的可维护性并支持未来扩展。

* feat(store): 添加OCR状态管理切片

实现OCR提供商的增删改查功能,使用Redux Toolkit管理OCR相关状态

* feat(types): 添加图片文件类型守卫函数

添加 ImageFileMetadata 类型和 isImageFile 类型守卫函数,用于检查文件是否为图片类型

* feat(ocr): 添加对OCR支持文件类型的类型定义和校验函数

添加SupportedOcrFileType类型和isSupportedOcrFileType校验函数
添加SupportedOcrFile类型和isSupportedOcrFile校验函数

* feat(ocr): 添加OCR功能支持

实现基于Tesseract的OCR功能,包括文件类型检查、服务接口和IPC通信
新增OCR相关类型定义和服务实现

* refactor(OcrService): 更新日志上下文为'main:OcrService'

* feat(ocr): 添加OCR服务基础功能

实现OCR服务的基础功能,通过调用window.api.ocr接口处理支持的文件类型

* feat(store): 添加ocr模块到redux store

* feat(ocr): 添加OCR功能支持及文件类型校验

添加OCR功能钩子useOcr,支持图片文件识别
添加不支持文件类型的错误提示国际化文案

* refactor(ocr): 重命名updatePreprocessProvider为updateOcrProvider以保持命名一致性

* feat(ocr): 添加设置图片OCR提供商的功能

* refactor(ocr): 统一OCR类型导入路径

将所有OCR相关类型从'@renderer/types/ocr'改为从'@renderer/types'或'@types'导入
优化DEFAULT_OCR_PROVIDER类型定义

* feat(store): 更新持久化存储版本并添加OCR配置迁移

添加137版本迁移逻辑,初始化OCR提供者和默认图像提供者配置

* feat(ocr): 添加OCR服务设置界面及提供商选择功能

实现OCR服务设置界面,包含图片OCR提供商的选择功能
修复ocr.ts中imageProvider的类型定义
添加相关国际化文本

* fix(ocr): 添加图像大小检查并优化错误处理

检查图像文件大小是否超过50MB限制
使用buffer读取文件替代直接路径识别
简化错误处理逻辑,直接抛出原始错误

* feat(OCR服务): 支持base64字符串作为OCR输入

扩展tesseractOcr函数以接受base64字符串或图像文件作为输入

* build: 将 tesseract.js 从 devDependencies 移至 dependencies

确保生产环境能正确使用 tesseract.js 功能

* refactor(ocr): 将Tesseract服务文件移动到tesseract子目录并更新配置

* refactor(TesseractService): 添加日志记录并更新worker配置

添加loggerService用于记录worker日志,并更新createWorker配置以使用自定义logger

* feat(i18n): 添加OCR功能的多语言支持

* refactor(preload): 移动OCR类型定义到共享类型文件

将OCR相关的类型定义(OcrProvider, OcrResult, SupportedOcrFile)从渲染进程类型文件移动到共享类型文件@types,以提高代码复用性和维护性

* refactor(ocr): 修改tesseractOcr返回完整识别结果而非仅文本

返回完整识别结果以便后续处理使用更多OCR信息,同时简化imageOcr中的条件判断逻辑

* fix(ocr): 修复文件类型与OCR提供者能力不匹配时的错误抛出位置

将错误抛出语句移至else分支

* refactor(ocr): 简化 DEFAULT_OCR_PROVIDER 的类型定义

* fix(ocr): 改进OCR处理中的消息管理和错误处理

在useOcr钩子中统一管理OCR处理的消息提示,并完善错误处理逻辑
移除TranslatePage中重复的消息管理代码,简化OCR处理流程

* feat(i18n): 添加OCR相关的错误和状态翻译文本

* fix(useOcr): 修复未支持文件类型错误抛出位置

将不支持的OCR文件类型错误抛出逻辑移至条件判断内

* refactor(ocr): ocrImage实现使用OcrService并更新日志上下文

将ocrImage函数从useOcr钩子移动到OcrService中,提高代码复用性
更新日志服务上下文从'main'改为'renderer'以更准确反映模块位置

* style(TabContainer): 移除多余的空行并保持代码整洁

* refactor(ocr): 简化OCR文件类型检查逻辑

使用现有的isImageFile函数替代冗余的类型检查逻辑,提高代码复用性

* fix: 将迁移错误日志从136更新为137

* feat(ocr): enhance Tesseract service with language support and worker management

- Added support for multiple Tesseract languages: Chinese (Simplified and Traditional) and English.
- Refactored Tesseract worker management into a class for better encapsulation and reuse.
- Introduced methods to dynamically determine language path based on IP country and manage worker lifecycle.

* update cn url

* support cn data

* change to asyn

* use register design mode

* add type

* use bind function

* refactor(ipc): 简化OCR处理程序参数

* refactor(ocr): 修改ocrProviderCapabilityRecord类型定义

允许只定义部分能力

* refactor(ocr): 将Tesseract相关配置移至服务内部

将语言列表和下载URL常量从共享配置移至Tesseract服务内部
使用常量定义图片大小阈值以提高可读性

* refactor(ocr): 统一使用 SupportedOcrFile 类型替换 FileMetadata

更新 OCR 服务及其 Tesseract 实现,使用 SupportedOcrFile 类型替代原有的 FileMetadata 类型,以提高类型安全性和一致性。同时在 OcrService 中添加重复注册的警告日志。

* refactor(ocr): 重构OCR类型定义以支持模型和API配置

将OCR提供者配置拆分为独立类型,增加模型能力记录和API配置类型检查
添加OCR处理程序类型定义,为未来扩展提供更好的类型支持

* refactor(OcrService): 移除重复的OcrHandler类型定义

已在@types中定义OcrHandler类型,移除重复定义以提高代码一致性

* refactor(ocr): 将OcrService移动到ocr目录下并更新引用路径

* feat(ocr): 添加OCR API客户端工厂及示例实现

实现OCR API客户端工厂模式,支持根据不同提供商创建对应的客户端
新增OcrBaseApiClient作为基础类,提供通用功能
添加OcrExampleApiClient作为示例实现
修改OcrService以使用新的客户端工厂

* refactor(ocr): 添加日志记录以跟踪OCR文件处理

在OCR服务中添加日志记录功能,便于跟踪文件处理过程

* fix(deps): 更新 tesseract.js 依赖并添加补丁文件

修复 tesseract.js 类型定义问题并添加语言常量支持

* refactor(ocr): 移除注释掉的tesseract语言映射代码

使用Tesseract.js的LanguageCode类型替代硬编码的语言列表,提高类型安全性

* feat(ocr): 添加 Tesseract OCR 配置类型

* refactor(OCR设置): 重命名OcrImageProviderSettings为OcrImageSettings并优化代码结构

* refactor(ocr): 将 Tesseract 相关类型移动到文件底部以改善代码组织

* feat(ocr): 添加 Tesseract OCR 提供者类型检查函数

* feat(ocr): 添加更新OCR提供者配置的功能

* feat: 添加OCR提供者钩子函数

实现useOcrProvider钩子用于获取和更新OCR提供者配置

* refactor(ocr): 修改removeOcrProvider参数为字符串id

简化removeOcrProvider方法的参数类型,直接使用字符串id进行过滤,提高代码简洁性

* refactor(ocr): 将内置OCR提供者从数组改为映射结构

重构OCR配置模块,使用映射结构存储内置OCR提供者以便于扩展和维护

* refactor(ocr): 将BUILTIN_OCR_PROVIDERS改为只读数组

使用Object.freeze确保数组不可变,提高代码安全性

* feat(ocr): 添加OCR提供者管理功能并改进错误处理

添加useOcrProviders钩子用于管理OCR提供者的添加和删除
当内置OCR提供者不存在时自动恢复默认配置
改进错误提示信息并增加国际化支持

* Revert "refactor(ocr): 将BUILTIN_OCR_PROVIDERS改为只读数组"

This reverts commit f23e37941a.

* feat(ocr): 为Tesseract OCR添加多语言支持配置

添加对简体中文、繁体中文和英文的语言支持配置,扩展OCR功能以满足多语言识别需求

* refactor(types): 将Tesseract.LanguageCode重命名为TesseractLangCode以提高可读性

* feat(OCR设置): 添加OCR提供商设置组件及状态管理

新增OCR提供商设置组件,支持显示当前选择的OCR提供商信息
在OCR图片设置中添加状态管理,同步提供商选择到父组件
添加Tesseract OCR设置组件,支持多语言选择(暂不可用)

* fix(DocProcessSettings): 修复OCR语言选择默认值问题

* feat(i18n): 添加OCR提供商相关错误和警告的翻译

* fix(ocr): 将 Tesseract 语言配置类型改为部分

* fix(ocr): 修复ocrImage函数未使用await导致的问题

* fix(ocr): 修复迁移配置中ocr状态的初始化方式

将分散的属性赋值改为对象整体赋值,避免潜在的属性丢失问题

* chore: 移除不再使用的@types/tesseract.js依赖

* refactor(OCR设置): 添加错误边界处理并移除无用注释

在OCR设置组件中添加ErrorBoundary以处理潜在错误
移除OcrTesseractSettings中的TODO注释

* build: 添加 sharp 依赖以支持图片处理功能

* refactor(ocr): 添加OCR图像预处理功能并优化TesseractService

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* refactor(ocr): 移除独立的灰度处理模块并改进预处理流程

将灰度处理功能直接集成到OCR预处理中,不再需要单独的image模块
添加normalise和threshold处理以提升OCR识别效果

* improve image preprocess

---------

Co-authored-by: beyondkmp <beyondkmp@gmail.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2025-08-26 00:13:24 +08:00
.github chore(ci): refine pr ci steps (#9429) 2025-08-22 22:52:03 +08:00
.husky chore(pre-commit): add pre-commit hook to enforce code style (#3351) 2025-03-15 11:09:11 +08:00
.vscode chore(vscode): improve VSCode launch configurations for debugging (#9483) 2025-08-25 10:46:45 +08:00
.yarn feat: Image OCR (#9409) 2025-08-26 00:13:24 +08:00
build feat(installer): add architecture compatibility check to NSIS installer (#8587) 2025-07-28 21:41:05 +08:00
docs feat(translate): brand new translate feature (#8513) 2025-08-11 13:33:31 +08:00
packages feat: Image OCR (#9409) 2025-08-26 00:13:24 +08:00
resources feat: add code tools (#9043) 2025-08-12 11:54:38 +08:00
scripts feat(translate): brand new translate feature (#8513) 2025-08-11 13:33:31 +08:00
src feat: Image OCR (#9409) 2025-08-26 00:13:24 +08:00
tests refactor: Unified Logger / 统一日志管理 (#8207) 2025-07-18 09:40:56 +08:00
.editorconfig style: set eol to lf, code formatting (#7923) 2025-07-08 09:50:33 +08:00
.env.example fix: support gpt-5 (#8945) 2025-08-10 14:27:26 +08:00
.git-blame-ignore-revs chore: git blame ignore (#7925) 2025-07-08 14:23:55 +08:00
.gitattributes style: set eol to lf, code formatting (#7923) 2025-07-08 09:50:33 +08:00
.gitignore chore: add CLAUDE.local.md to .gitignore 2025-08-06 19:43:46 +08:00
.npmrc opt: optimise local dev with fixed yarn (#3456) 2025-03-19 13:18:11 +08:00
.prettierignore feat: nutstore integration (#3461) 2025-03-25 11:40:11 +08:00
.prettierrc Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
.yarnrc.yml chore: upgrade yarn version to v4.9.1 2025-05-22 15:48:20 +08:00
CLAUDE.md feat: add OpenAI o3 model support with enhanced tool calling (#8253) 2025-08-04 23:19:21 +08:00
CODE_OF_CONDUCT.md docs: update documentation for a more inclusive environment and added japanese and chinese documentation 2024-10-25 00:09:01 +08:00
CONTRIBUTING.md docs: add testplan md (#7854) 2025-07-05 17:19:25 +08:00
dev-app-update.yml feat: add after-build script for renaming files and updating latest.yml 2025-04-14 17:14:45 +08:00
electron-builder.yml chore: release v1.5.7-rc.1 2025-08-19 17:38:24 +08:00
electron.vite.config.ts Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
eslint.config.mjs Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
LICENSE Update LICENSE (#4744) 2025-04-13 08:00:41 +08:00
package.json feat: Image OCR (#9409) 2025-08-26 00:13:24 +08:00
playwright.config.ts test: more unit tests (#5130) 2025-05-26 16:50:26 +08:00
README.md refactor: match provider and model using a consistent method (#7933) 2025-07-23 10:45:09 +08:00
SECURITY.md refactor: model list and health check (#7997) 2025-07-21 15:57:08 +08:00
tsconfig.json Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
tsconfig.node.json Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
tsconfig.web.json chore(tsconfig): adjust the path order (#8769) 2025-08-02 00:05:15 +08:00
vitest.config.ts Revert "feat(cherry-store): add cherry store (#8683)" 2025-08-06 14:29:55 +08:00
yarn.lock feat: Image OCR (#9409) 2025-08-26 00:13:24 +08:00

banner

English | 中文 | Official Site | Documents | Development | Feedback

Featured|HelloGitHub kangfenmao%2Fcherry-studio | Trendshift Cherry Studio - AI Chatbots, AI Desktop Client | Product Hunt

🍒 Cherry Studio

Cherry Studio is a desktop client that supports multiple LLM providers, available on Windows, Mac and Linux.

👏 Join Telegram GroupDiscord | QQ Group(575014769)

❤️ Like Cherry Studio? Give it a star 🌟 or Sponsor to support the development!

🌠 Screenshot

🌟 Key Features

  1. Diverse LLM Provider Support:
  • ☁️ Major LLM Cloud Services: OpenAI, Gemini, Anthropic, and more
  • 🔗 AI Web Service Integration: Claude, Peplexity, Poe, and others
  • 💻 Local Model Support with Ollama, LM Studio
  1. AI Assistants & Conversations:
  • 📚 300+ Pre-configured AI Assistants
  • 🤖 Custom Assistant Creation
  • 💬 Multi-model Simultaneous Conversations
  1. Document & Data Processing:
  • 📄 Supports Text, Images, Office, PDF, and more
  • ☁️ WebDAV File Management and Backup
  • 📊 Mermaid Chart Visualization
  • 💻 Code Syntax Highlighting
  1. Practical Tools Integration:
  • 🔍 Global Search Functionality
  • 📝 Topic Management System
  • 🔤 AI-powered Translation
  • 🎯 Drag-and-drop Sorting
  • 🔌 Mini Program Support
  • ⚙️ MCP(Model Context Protocol) Server
  1. Enhanced User Experience:
  • 🖥️ Cross-platform Support for Windows, Mac, and Linux
  • 📦 Ready to Use - No Environment Setup Required
  • 🎨 Light/Dark Themes and Transparent Window
  • 📝 Complete Markdown Rendering
  • 🤲 Easy Content Sharing

📝 Roadmap

We're actively working on the following features and improvements:

  1. 🎯 Core Features
  • Selection Assistant with smart content selection enhancement
  • Deep Research with advanced research capabilities
  • Memory System with global context awareness
  • Document Preprocessing with improved document handling
  • MCP Marketplace for Model Context Protocol ecosystem
  1. 🗂 Knowledge Management
  • Notes and Collections
  • Dynamic Canvas visualization
  • OCR capabilities
  • TTS (Text-to-Speech) support
  1. 📱 Platform Support
  • HarmonyOS Edition (PC)
  • Android App (Phase 1)
  • iOS App (Phase 1)
  • Multi-Window support
  • Window Pinning functionality
  1. 🔌 Advanced Features
  • Plugin System
  • ASR (Automatic Speech Recognition)
  • Assistant and Topic Interaction Refactoring

Track our progress and contribute on our project board.

Want to influence our roadmap? Join our GitHub Discussions to share your ideas and feedback!

🌈 Theme

Welcome PR for more themes

🤝 Contributing

We welcome contributions to Cherry Studio! Here are some ways you can contribute:

  1. Contribute Code: Develop new features or optimize existing code.
  2. Fix Bugs: Submit fixes for any bugs you find.
  3. Maintain Issues: Help manage GitHub issues.
  4. Product Design: Participate in design discussions.
  5. Write Documentation: Improve user manuals and guides.
  6. Community Engagement: Join discussions and help users.
  7. Promote Usage: Spread the word about Cherry Studio.

Refer to the Branching Strategy for contribution guidelines

Getting Started

  1. Fork the Repository: Fork and clone it to your local machine.
  2. Create a Branch: For your changes.
  3. Submit Changes: Commit and push your changes.
  4. Open a Pull Request: Describe your changes and reasons.

For more detailed guidelines, please refer to our Contributing Guide.

Thank you for your support and contributions!

🔧 Developer Co-creation Program

We are launching the Cherry Studio Developer Co-creation Program to foster a healthy and positive-feedback loop within the open-source ecosystem. We believe that great software is built collaboratively, and every merged pull request breathes new life into the project.

We sincerely invite you to join our ranks of contributors and shape the future of Cherry Studio with us.

Contributor Rewards Program

To give back to our core contributors and create a virtuous cycle, we have established the following long-term incentive plan.

The inaugural tracking period for this program will be Q3 2025 (July, August, September). Rewards for this cycle will be distributed on October 1st.

Within any tracking period (e.g., July 1st to September 30th for the first cycle), any developer who contributes more than 30 meaningful commits to any of Cherry Studio's open-source projects on GitHub will be eligible for the following benefits:

  • Cursor Subscription Sponsorship: Receive a $70 USD credit or reimbursement for your Cursor subscription, making AI your most efficient coding partner.
  • Unlimited Model Access: Get unlimited API calls for the DeepSeek and Qwen models.
  • Cutting-Edge Tech Access: Enjoy occasional perks, including API access to models like Claude, Gemini, and OpenAI, keeping you at the forefront of technology.

Growing Together & Future Plans

A vibrant community is the driving force behind any sustainable open-source project. As Cherry Studio grows, so will our rewards program. We are committed to continuously aligning our benefits with the best-in-class tools and resources in the industry. This ensures our core contributors receive meaningful support, creating a positive cycle where developers, the community, and the project grow together.

Moving forward, the project will also embrace an increasingly open stance to give back to the entire open-source community.

How to Get Started?

We look forward to your first Pull Request!

You can start by exploring our repositories, picking up a good first issue, or proposing your own enhancements. Every commit is a testament to the spirit of open source.

Thank you for your interest and contributions.

Let's build together.

🏢 Enterprise Edition

Building on the Community Edition, we are proud to introduce Cherry Studio Enterprise Edition—a privately-deployable AI productivity and management platform designed for modern teams and enterprises.

The Enterprise Edition addresses core challenges in team collaboration by centralizing the management of AI resources, knowledge, and data. It empowers organizations to enhance efficiency, foster innovation, and ensure compliance, all while maintaining 100% control over their data in a secure environment.

Core Advantages

  • Unified Model Management: Centrally integrate and manage various cloud-based LLMs (e.g., OpenAI, Anthropic, Google Gemini) and locally deployed private models. Employees can use them out-of-the-box without individual configuration.
  • Enterprise-Grade Knowledge Base: Build, manage, and share team-wide knowledge bases. Ensures knowledge retention and consistency, enabling team members to interact with AI based on unified and accurate information.
  • Fine-Grained Access Control: Easily manage employee accounts and assign role-based permissions for different models, knowledge bases, and features through a unified admin backend.
  • Fully Private Deployment: Deploy the entire backend service on your on-premises servers or private cloud, ensuring your data remains 100% private and under your control to meet the strictest security and compliance standards.
  • Reliable Backend Services: Provides stable API services and enterprise-grade data backup and recovery mechanisms to ensure business continuity.

Online Demo

🚧 Public Beta Notice

The Enterprise Edition is currently in its early public beta stage, and we are actively iterating and optimizing its features. We are aware that it may not be perfectly stable yet. If you encounter any issues or have valuable suggestions during your trial, we would be very grateful if you could contact us via email to provide feedback.

🔗 Cherry Studio Enterprise

Version Comparison

Feature Community Edition Enterprise Edition
Open Source Yes Partially released to customers
Cost Free for Personal Use / Commercial License Buyout / Subscription Fee
Admin Backend ● Centralized Model Access
Employee Management
● Shared Knowledge Base
Access Control
Data Backup
Server Dedicated Private Deployment

Get the Enterprise Edition

We believe the Enterprise Edition will become your team's AI productivity engine. If you are interested in Cherry Studio Enterprise Edition and would like to learn more, request a quote, or schedule a demo, please feel free to contact us.

🔗 Related Projects

  • one-api: LLM API management and distribution system supporting mainstream models like OpenAI, Azure, and Anthropic. Features a unified API interface, suitable for key management and secondary distribution.

  • ublacklist: Blocks specific sites from appearing in Google search results

🚀 Contributors



📊 GitHub Stats

Stats

Star History

Star History Chart