From 0af5a85f673977aa64db67e7c42491f6e2f9666f Mon Sep 17 00:00:00 2001 From: Phantom <59059173+EurFelux@users.noreply.github.com> Date: Tue, 26 Aug 2025 00:13:24 +0800 Subject: [PATCH] feat: Image OCR (#9409) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * build: 添加 tesseract.js 及其类型定义依赖 * feat(ocr): 添加OCR类型定义文件以支持OCR功能扩展 * feat(ocr): 添加 Tesseract OCR 提供程序配置 * feat(ocr): 添加Tesseract.js的logo * refactor(settings): 重构文档预处理设置模块结构 将PreprocessSettings重命名为DocProcessSettings并调整文件结构 更新相关路由和组件引用以保持功能一致性 * refactor(config): 重命名OCR_PROVIDER_CONFIG为BUILTIN_OCR_PROVIDERS以更准确描述用途 * refactor(ocr): 更改文件名 * refactor(ocr): 将获取OCR提供商logo的功能移动到utils目录 将getOcrProviderLogo函数从config/ocr.ts移动到utils/ocr.ts,保持功能集中 * refactor(ocr): 重构OCR配置结构以支持默认提供者 将内置OCR提供者数组重构为单独定义的常量,并添加默认OCR提供者映射。这提高了代码的可维护性并支持未来扩展。 * feat(store): 添加OCR状态管理切片 实现OCR提供商的增删改查功能,使用Redux Toolkit管理OCR相关状态 * feat(types): 添加图片文件类型守卫函数 添加 ImageFileMetadata 类型和 isImageFile 类型守卫函数,用于检查文件是否为图片类型 * feat(ocr): 添加对OCR支持文件类型的类型定义和校验函数 添加SupportedOcrFileType类型和isSupportedOcrFileType校验函数 添加SupportedOcrFile类型和isSupportedOcrFile校验函数 * feat(ocr): 添加OCR功能支持 实现基于Tesseract的OCR功能,包括文件类型检查、服务接口和IPC通信 新增OCR相关类型定义和服务实现 * refactor(OcrService): 更新日志上下文为'main:OcrService' * feat(ocr): 添加OCR服务基础功能 实现OCR服务的基础功能,通过调用window.api.ocr接口处理支持的文件类型 * feat(store): 添加ocr模块到redux store * feat(ocr): 添加OCR功能支持及文件类型校验 添加OCR功能钩子useOcr,支持图片文件识别 添加不支持文件类型的错误提示国际化文案 * refactor(ocr): 重命名updatePreprocessProvider为updateOcrProvider以保持命名一致性 * feat(ocr): 添加设置图片OCR提供商的功能 * refactor(ocr): 统一OCR类型导入路径 将所有OCR相关类型从'@renderer/types/ocr'改为从'@renderer/types'或'@types'导入 优化DEFAULT_OCR_PROVIDER类型定义 * feat(store): 更新持久化存储版本并添加OCR配置迁移 添加137版本迁移逻辑,初始化OCR提供者和默认图像提供者配置 * feat(ocr): 添加OCR服务设置界面及提供商选择功能 实现OCR服务设置界面,包含图片OCR提供商的选择功能 修复ocr.ts中imageProvider的类型定义 添加相关国际化文本 * fix(ocr): 添加图像大小检查并优化错误处理 检查图像文件大小是否超过50MB限制 使用buffer读取文件替代直接路径识别 简化错误处理逻辑,直接抛出原始错误 * feat(OCR服务): 支持base64字符串作为OCR输入 扩展tesseractOcr函数以接受base64字符串或图像文件作为输入 * build: 将 tesseract.js 从 devDependencies 移至 dependencies 确保生产环境能正确使用 tesseract.js 功能 * refactor(ocr): 将Tesseract服务文件移动到tesseract子目录并更新配置 * refactor(TesseractService): 添加日志记录并更新worker配置 添加loggerService用于记录worker日志,并更新createWorker配置以使用自定义logger * feat(i18n): 添加OCR功能的多语言支持 * refactor(preload): 移动OCR类型定义到共享类型文件 将OCR相关的类型定义(OcrProvider, OcrResult, SupportedOcrFile)从渲染进程类型文件移动到共享类型文件@types,以提高代码复用性和维护性 * refactor(ocr): 修改tesseractOcr返回完整识别结果而非仅文本 返回完整识别结果以便后续处理使用更多OCR信息,同时简化imageOcr中的条件判断逻辑 * fix(ocr): 修复文件类型与OCR提供者能力不匹配时的错误抛出位置 将错误抛出语句移至else分支 * refactor(ocr): 简化 DEFAULT_OCR_PROVIDER 的类型定义 * fix(ocr): 改进OCR处理中的消息管理和错误处理 在useOcr钩子中统一管理OCR处理的消息提示,并完善错误处理逻辑 移除TranslatePage中重复的消息管理代码,简化OCR处理流程 * feat(i18n): 添加OCR相关的错误和状态翻译文本 * fix(useOcr): 修复未支持文件类型错误抛出位置 将不支持的OCR文件类型错误抛出逻辑移至条件判断内 * refactor(ocr): ocrImage实现使用OcrService并更新日志上下文 将ocrImage函数从useOcr钩子移动到OcrService中,提高代码复用性 更新日志服务上下文从'main'改为'renderer'以更准确反映模块位置 * style(TabContainer): 移除多余的空行并保持代码整洁 * refactor(ocr): 简化OCR文件类型检查逻辑 使用现有的isImageFile函数替代冗余的类型检查逻辑,提高代码复用性 * fix: 将迁移错误日志从136更新为137 * feat(ocr): enhance Tesseract service with language support and worker management - Added support for multiple Tesseract languages: Chinese (Simplified and Traditional) and English. - Refactored Tesseract worker management into a class for better encapsulation and reuse. - Introduced methods to dynamically determine language path based on IP country and manage worker lifecycle. * update cn url * support cn data * change to asyn * use register design mode * add type * use bind function * refactor(ipc): 简化OCR处理程序参数 * refactor(ocr): 修改ocrProviderCapabilityRecord类型定义 允许只定义部分能力 * refactor(ocr): 将Tesseract相关配置移至服务内部 将语言列表和下载URL常量从共享配置移至Tesseract服务内部 使用常量定义图片大小阈值以提高可读性 * refactor(ocr): 统一使用 SupportedOcrFile 类型替换 FileMetadata 更新 OCR 服务及其 Tesseract 实现,使用 SupportedOcrFile 类型替代原有的 FileMetadata 类型,以提高类型安全性和一致性。同时在 OcrService 中添加重复注册的警告日志。 * refactor(ocr): 重构OCR类型定义以支持模型和API配置 将OCR提供者配置拆分为独立类型,增加模型能力记录和API配置类型检查 添加OCR处理程序类型定义,为未来扩展提供更好的类型支持 * refactor(OcrService): 移除重复的OcrHandler类型定义 已在@types中定义OcrHandler类型,移除重复定义以提高代码一致性 * refactor(ocr): 将OcrService移动到ocr目录下并更新引用路径 * feat(ocr): 添加OCR API客户端工厂及示例实现 实现OCR API客户端工厂模式,支持根据不同提供商创建对应的客户端 新增OcrBaseApiClient作为基础类,提供通用功能 添加OcrExampleApiClient作为示例实现 修改OcrService以使用新的客户端工厂 * refactor(ocr): 添加日志记录以跟踪OCR文件处理 在OCR服务中添加日志记录功能,便于跟踪文件处理过程 * fix(deps): 更新 tesseract.js 依赖并添加补丁文件 修复 tesseract.js 类型定义问题并添加语言常量支持 * refactor(ocr): 移除注释掉的tesseract语言映射代码 使用Tesseract.js的LanguageCode类型替代硬编码的语言列表,提高类型安全性 * feat(ocr): 添加 Tesseract OCR 配置类型 * refactor(OCR设置): 重命名OcrImageProviderSettings为OcrImageSettings并优化代码结构 * refactor(ocr): 将 Tesseract 相关类型移动到文件底部以改善代码组织 * feat(ocr): 添加 Tesseract OCR 提供者类型检查函数 * feat(ocr): 添加更新OCR提供者配置的功能 * feat: 添加OCR提供者钩子函数 实现useOcrProvider钩子用于获取和更新OCR提供者配置 * refactor(ocr): 修改removeOcrProvider参数为字符串id 简化removeOcrProvider方法的参数类型,直接使用字符串id进行过滤,提高代码简洁性 * refactor(ocr): 将内置OCR提供者从数组改为映射结构 重构OCR配置模块,使用映射结构存储内置OCR提供者以便于扩展和维护 * refactor(ocr): 将BUILTIN_OCR_PROVIDERS改为只读数组 使用Object.freeze确保数组不可变,提高代码安全性 * feat(ocr): 添加OCR提供者管理功能并改进错误处理 添加useOcrProviders钩子用于管理OCR提供者的添加和删除 当内置OCR提供者不存在时自动恢复默认配置 改进错误提示信息并增加国际化支持 * Revert "refactor(ocr): 将BUILTIN_OCR_PROVIDERS改为只读数组" This reverts commit f23e37941abba4fcc703b31e955b67bff565c432. * feat(ocr): 为Tesseract OCR添加多语言支持配置 添加对简体中文、繁体中文和英文的语言支持配置,扩展OCR功能以满足多语言识别需求 * refactor(types): 将Tesseract.LanguageCode重命名为TesseractLangCode以提高可读性 * feat(OCR设置): 添加OCR提供商设置组件及状态管理 新增OCR提供商设置组件,支持显示当前选择的OCR提供商信息 在OCR图片设置中添加状态管理,同步提供商选择到父组件 添加Tesseract OCR设置组件,支持多语言选择(暂不可用) * fix(DocProcessSettings): 修复OCR语言选择默认值问题 * feat(i18n): 添加OCR提供商相关错误和警告的翻译 * fix(ocr): 将 Tesseract 语言配置类型改为部分 * fix(ocr): 修复ocrImage函数未使用await导致的问题 * fix(ocr): 修复迁移配置中ocr状态的初始化方式 将分散的属性赋值改为对象整体赋值,避免潜在的属性丢失问题 * chore: 移除不再使用的@types/tesseract.js依赖 * refactor(OCR设置): 添加错误边界处理并移除无用注释 在OCR设置组件中添加ErrorBoundary以处理潜在错误 移除OcrTesseractSettings中的TODO注释 * build: 添加 sharp 依赖以支持图片处理功能 * refactor(ocr): 添加OCR图像预处理功能并优化TesseractService Co-authored-by: Qwen-Coder * refactor(ocr): 移除独立的灰度处理模块并改进预处理流程 将灰度处理功能直接集成到OCR预处理中,不再需要单独的image模块 添加normalise和threshold处理以提升OCR识别效果 * improve image preprocess --------- Co-authored-by: beyondkmp Co-authored-by: Qwen-Coder --- .../tesseract.js-npm-6.0.1-2562a7e46d.patch | 348 +++++++++++++++ package.json | 5 +- packages/shared/IpcChannel.ts | 5 +- src/main/ipc.ts | 4 + src/main/services/ocr/OcrService.ts | 34 ++ .../ocr/tesseract/TesseractService.ts | 82 ++++ src/main/utils/ocr.ts | 29 ++ src/preload/index.ts | 7 + .../assets/images/providers/Tesseract.js.png | Bin 0 -> 23940 bytes src/renderer/src/config/ocr.ts | 32 ++ src/renderer/src/config/ocrProviders.ts | 12 - src/renderer/src/hooks/useOcr.ts | 54 +++ src/renderer/src/hooks/useOcrProvider.ts | 84 ++++ src/renderer/src/i18n/locales/en-us.json | 34 ++ src/renderer/src/i18n/locales/ja-jp.json | 34 ++ src/renderer/src/i18n/locales/ru-ru.json | 34 ++ src/renderer/src/i18n/locales/zh-cn.json | 34 ++ src/renderer/src/i18n/locales/zh-tw.json | 34 ++ src/renderer/src/i18n/translate/el-gr.json | 34 ++ src/renderer/src/i18n/translate/es-es.json | 34 ++ src/renderer/src/i18n/translate/fr-fr.json | 34 ++ src/renderer/src/i18n/translate/pt-pt.json | 34 ++ .../DocProcessSettings/OcrImageSettings.tsx | 62 +++ .../OcrProviderSettings.tsx | 52 +++ .../DocProcessSettings/OcrSettings.tsx | 42 ++ .../OcrTesseractSettings.tsx | 51 +++ .../PreprocessProviderSettings.tsx} | 0 .../PreprocessSettings.tsx} | 8 +- .../settings/DocProcessSettings/index.tsx | 18 + .../src/pages/settings/SettingsPage.tsx | 8 +- src/renderer/src/services/ocr/OcrService.ts | 23 + .../ocr/clients/OcrApiClientFactory.ts | 28 ++ .../services/ocr/clients/OcrBaseApiClient.ts | 43 ++ .../ocr/clients/OcrExampleApiClient.ts | 15 + src/renderer/src/store/index.ts | 6 +- src/renderer/src/store/migrate.ts | 13 + src/renderer/src/store/ocr.ts | 61 +++ src/renderer/src/types/file.ts | 13 + src/renderer/src/types/index.ts | 2 + src/renderer/src/types/ocr.ts | 142 +++++++ src/renderer/src/utils/ocr.ts | 12 + yarn.lock | 398 +++++++++++++++++- 42 files changed, 1972 insertions(+), 27 deletions(-) create mode 100644 .yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch create mode 100644 src/main/services/ocr/OcrService.ts create mode 100644 src/main/services/ocr/tesseract/TesseractService.ts create mode 100644 src/main/utils/ocr.ts create mode 100644 src/renderer/src/assets/images/providers/Tesseract.js.png create mode 100644 src/renderer/src/config/ocr.ts delete mode 100644 src/renderer/src/config/ocrProviders.ts create mode 100644 src/renderer/src/hooks/useOcr.ts create mode 100644 src/renderer/src/hooks/useOcrProvider.ts create mode 100644 src/renderer/src/pages/settings/DocProcessSettings/OcrImageSettings.tsx create mode 100644 src/renderer/src/pages/settings/DocProcessSettings/OcrProviderSettings.tsx create mode 100644 src/renderer/src/pages/settings/DocProcessSettings/OcrSettings.tsx create mode 100644 src/renderer/src/pages/settings/DocProcessSettings/OcrTesseractSettings.tsx rename src/renderer/src/pages/settings/{PreprocessSettings/PreprocessSettings.tsx => DocProcessSettings/PreprocessProviderSettings.tsx} (100%) rename src/renderer/src/pages/settings/{PreprocessSettings/index.tsx => DocProcessSettings/PreprocessSettings.tsx} (90%) create mode 100644 src/renderer/src/pages/settings/DocProcessSettings/index.tsx create mode 100644 src/renderer/src/services/ocr/OcrService.ts create mode 100644 src/renderer/src/services/ocr/clients/OcrApiClientFactory.ts create mode 100644 src/renderer/src/services/ocr/clients/OcrBaseApiClient.ts create mode 100644 src/renderer/src/services/ocr/clients/OcrExampleApiClient.ts create mode 100644 src/renderer/src/store/ocr.ts create mode 100644 src/renderer/src/types/ocr.ts create mode 100644 src/renderer/src/utils/ocr.ts diff --git a/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch b/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch new file mode 100644 index 0000000000..0cb156ee99 --- /dev/null +++ b/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch @@ -0,0 +1,348 @@ +diff --git a/src/constants/languages.d.ts b/src/constants/languages.d.ts +new file mode 100644 +index 0000000000000000000000000000000000000000..6a2ba5086187622b8ca8887bcc7406018fba8a89 +--- /dev/null ++++ b/src/constants/languages.d.ts +@@ -0,0 +1,43 @@ ++/** ++ * Languages with existing tesseract traineddata ++ * https://tesseract-ocr.github.io/tessdoc/Data-Files#data-files-for-version-400-november-29-2016 ++ */ ++ ++// Define the language codes as string literals ++type LanguageCode = ++ | 'afr' | 'amh' | 'ara' | 'asm' | 'aze' | 'aze_cyrl' | 'bel' | 'ben' | 'bod' | 'bos' ++ | 'bul' | 'cat' | 'ceb' | 'ces' | 'chi_sim' | 'chi_tra' | 'chr' | 'cym' | 'dan' | 'deu' ++ | 'dzo' | 'ell' | 'eng' | 'enm' | 'epo' | 'est' | 'eus' | 'fas' | 'fin' | 'fra' ++ | 'frk' | 'frm' | 'gle' | 'glg' | 'grc' | 'guj' | 'hat' | 'heb' | 'hin' | 'hrv' ++ | 'hun' | 'iku' | 'ind' | 'isl' | 'ita' | 'ita_old' | 'jav' | 'jpn' | 'kan' | 'kat' ++ | 'kat_old' | 'kaz' | 'khm' | 'kir' | 'kor' | 'kur' | 'lao' | 'lat' | 'lav' | 'lit' ++ | 'mal' | 'mar' | 'mkd' | 'mlt' | 'msa' | 'mya' | 'nep' | 'nld' | 'nor' | 'ori' ++ | 'pan' | 'pol' | 'por' | 'pus' | 'ron' | 'rus' | 'san' | 'sin' | 'slk' | 'slv' ++ | 'spa' | 'spa_old' | 'sqi' | 'srp' | 'srp_latn' | 'swa' | 'swe' | 'syr' | 'tam' | 'tel' ++ | 'tgk' | 'tgl' | 'tha' | 'tir' | 'tur' | 'uig' | 'ukr' | 'urd' | 'uzb' | 'uzb_cyrl' ++ | 'vie' | 'yid'; ++ ++// Define the language keys as string literals ++type LanguageKey = ++ | 'AFR' | 'AMH' | 'ARA' | 'ASM' | 'AZE' | 'AZE_CYRL' | 'BEL' | 'BEN' | 'BOD' | 'BOS' ++ | 'BUL' | 'CAT' | 'CEB' | 'CES' | 'CHI_SIM' | 'CHI_TRA' | 'CHR' | 'CYM' | 'DAN' | 'DEU' ++ | 'DZO' | 'ELL' | 'ENG' | 'ENM' | 'EPO' | 'EST' | 'EUS' | 'FAS' | 'FIN' | 'FRA' ++ | 'FRK' | 'FRM' | 'GLE' | 'GLG' | 'GRC' | 'GUJ' | 'HAT' | 'HEB' | 'HIN' | 'HRV' ++ | 'HUN' | 'IKU' | 'IND' | 'ISL' | 'ITA' | 'ITA_OLD' | 'JAV' | 'JPN' | 'KAN' | 'KAT' ++ | 'KAT_OLD' | 'KAZ' | 'KHM' | 'KIR' | 'KOR' | 'KUR' | 'LAO' | 'LAT' | 'LAV' | 'LIT' ++ | 'MAL' | 'MAR' | 'MKD' | 'MLT' | 'MSA' | 'MYA' | 'NEP' | 'NLD' | 'NOR' | 'ORI' ++ | 'PAN' | 'POL' | 'POR' | 'PUS' | 'RON' | 'RUS' | 'SAN' | 'SIN' | 'SLK' | 'SLV' ++ | 'SPA' | 'SPA_OLD' | 'SQI' | 'SRP' | 'SRP_LATN' | 'SWA' | 'SWE' | 'SYR' | 'TAM' | 'TEL' ++ | 'TGK' | 'TGL' | 'THA' | 'TIR' | 'TUR' | 'UIG' | 'UKR' | 'URD' | 'UZB' | 'UZB_CYRL' ++ | 'VIE' | 'YID'; ++ ++// Create a mapped type to ensure each key maps to its specific value ++type LanguagesMap = { ++ [K in LanguageKey]: LanguageCode; ++}; ++ ++// Declare the exported constant with the specific type ++export const LANGUAGES: LanguagesMap; ++ ++// Export the individual types for use in other modules ++export type { LanguageCode, LanguageKey, LanguagesMap }; +\ No newline at end of file +diff --git a/src/index.d.ts b/src/index.d.ts +index 1f5a9c8094fe4de7983467f9efb43bdb4de535f2..16dc95cf68663673e37e189b719cb74897b7735f 100644 +--- a/src/index.d.ts ++++ b/src/index.d.ts +@@ -1,31 +1,74 @@ ++// Import the languages types ++import { LanguagesMap } from "./constants/languages"; ++ ++/// ++ + declare namespace Tesseract { +- function createScheduler(): Scheduler +- function createWorker(langs?: string | string[] | Lang[], oem?: OEM, options?: Partial, config?: string | Partial): Promise +- function setLogging(logging: boolean): void +- function recognize(image: ImageLike, langs?: string, options?: Partial): Promise +- function detect(image: ImageLike, options?: Partial): any ++ function createScheduler(): Scheduler; ++ function createWorker( ++ langs?: LanguageCode | LanguageCode[] | Lang[], ++ oem?: OEM, ++ options?: Partial, ++ config?: string | Partial ++ ): Promise; ++ function setLogging(logging: boolean): void; ++ function recognize( ++ image: ImageLike, ++ langs?: LanguageCode, ++ options?: Partial ++ ): Promise; ++ function detect(image: ImageLike, options?: Partial): any; ++ ++ // Export languages constant ++ const languages: LanguagesMap; ++ ++ type LanguageCode = import("./constants/languages").LanguageCode; ++ type LanguageKey = import("./constants/languages").LanguageKey; + + interface Scheduler { +- addWorker(worker: Worker): string +- addJob(action: 'recognize', ...args: Parameters): Promise +- addJob(action: 'detect', ...args: Parameters): Promise +- terminate(): Promise +- getQueueLen(): number +- getNumWorkers(): number ++ addWorker(worker: Worker): string; ++ addJob( ++ action: "recognize", ++ ...args: Parameters ++ ): Promise; ++ addJob( ++ action: "detect", ++ ...args: Parameters ++ ): Promise; ++ terminate(): Promise; ++ getQueueLen(): number; ++ getNumWorkers(): number; + } + + interface Worker { +- load(jobId?: string): Promise +- writeText(path: string, text: string, jobId?: string): Promise +- readText(path: string, jobId?: string): Promise +- removeText(path: string, jobId?: string): Promise +- FS(method: string, args: any[], jobId?: string): Promise +- reinitialize(langs?: string | Lang[], oem?: OEM, config?: string | Partial, jobId?: string): Promise +- setParameters(params: Partial, jobId?: string): Promise +- getImage(type: imageType): string +- recognize(image: ImageLike, options?: Partial, output?: Partial, jobId?: string): Promise +- detect(image: ImageLike, jobId?: string): Promise +- terminate(jobId?: string): Promise ++ load(jobId?: string): Promise; ++ writeText( ++ path: string, ++ text: string, ++ jobId?: string ++ ): Promise; ++ readText(path: string, jobId?: string): Promise; ++ removeText(path: string, jobId?: string): Promise; ++ FS(method: string, args: any[], jobId?: string): Promise; ++ reinitialize( ++ langs?: string | Lang[], ++ oem?: OEM, ++ config?: string | Partial, ++ jobId?: string ++ ): Promise; ++ setParameters( ++ params: Partial, ++ jobId?: string ++ ): Promise; ++ getImage(type: imageType): string; ++ recognize( ++ image: ImageLike, ++ options?: Partial, ++ output?: Partial, ++ jobId?: string ++ ): Promise; ++ detect(image: ImageLike, jobId?: string): Promise; ++ terminate(jobId?: string): Promise; + } + + interface Lang { +@@ -34,43 +77,43 @@ declare namespace Tesseract { + } + + interface InitOptions { +- load_system_dawg: string +- load_freq_dawg: string +- load_unambig_dawg: string +- load_punc_dawg: string +- load_number_dawg: string +- load_bigram_dawg: string +- } +- +- type LoggerMessage = { +- jobId: string +- progress: number +- status: string +- userJobId: string +- workerId: string ++ load_system_dawg: string; ++ load_freq_dawg: string; ++ load_unambig_dawg: string; ++ load_punc_dawg: string; ++ load_number_dawg: string; ++ load_bigram_dawg: string; + } +- ++ ++ type LoggerMessage = { ++ jobId: string; ++ progress: number; ++ status: string; ++ userJobId: string; ++ workerId: string; ++ }; ++ + interface WorkerOptions { +- corePath: string +- langPath: string +- cachePath: string +- dataPath: string +- workerPath: string +- cacheMethod: string +- workerBlobURL: boolean +- gzip: boolean +- legacyLang: boolean +- legacyCore: boolean +- logger: (arg: LoggerMessage) => void, +- errorHandler: (arg: any) => void ++ corePath: string; ++ langPath: string; ++ cachePath: string; ++ dataPath: string; ++ workerPath: string; ++ cacheMethod: string; ++ workerBlobURL: boolean; ++ gzip: boolean; ++ legacyLang: boolean; ++ legacyCore: boolean; ++ logger: (arg: LoggerMessage) => void; ++ errorHandler: (arg: any) => void; + } + interface WorkerParams { +- tessedit_pageseg_mode: PSM +- tessedit_char_whitelist: string +- tessedit_char_blacklist: string +- preserve_interword_spaces: string +- user_defined_dpi: string +- [propName: string]: any ++ tessedit_pageseg_mode: PSM; ++ tessedit_char_whitelist: string; ++ tessedit_char_blacklist: string; ++ preserve_interword_spaces: string; ++ user_defined_dpi: string; ++ [propName: string]: any; + } + interface OutputFormats { + text: boolean; +@@ -88,36 +131,36 @@ declare namespace Tesseract { + debug: boolean; + } + interface RecognizeOptions { +- rectangle: Rectangle +- pdfTitle: string +- pdfTextOnly: boolean +- rotateAuto: boolean +- rotateRadians: number ++ rectangle: Rectangle; ++ pdfTitle: string; ++ pdfTextOnly: boolean; ++ rotateAuto: boolean; ++ rotateRadians: number; + } + interface ConfigResult { +- jobId: string +- data: any ++ jobId: string; ++ data: any; + } + interface RecognizeResult { +- jobId: string +- data: Page ++ jobId: string; ++ data: Page; + } + interface DetectResult { +- jobId: string +- data: DetectData ++ jobId: string; ++ data: DetectData; + } + interface DetectData { +- tesseract_script_id: number | null +- script: string | null +- script_confidence: number | null +- orientation_degrees: number | null +- orientation_confidence: number | null ++ tesseract_script_id: number | null; ++ script: string | null; ++ script_confidence: number | null; ++ orientation_degrees: number | null; ++ orientation_confidence: number | null; + } + interface Rectangle { +- left: number +- top: number +- width: number +- height: number ++ left: number; ++ top: number; ++ width: number; ++ height: number; + } + enum OEM { + TESSERACT_ONLY, +@@ -126,28 +169,36 @@ declare namespace Tesseract { + DEFAULT, + } + enum PSM { +- OSD_ONLY = '0', +- AUTO_OSD = '1', +- AUTO_ONLY = '2', +- AUTO = '3', +- SINGLE_COLUMN = '4', +- SINGLE_BLOCK_VERT_TEXT = '5', +- SINGLE_BLOCK = '6', +- SINGLE_LINE = '7', +- SINGLE_WORD = '8', +- CIRCLE_WORD = '9', +- SINGLE_CHAR = '10', +- SPARSE_TEXT = '11', +- SPARSE_TEXT_OSD = '12', +- RAW_LINE = '13' ++ OSD_ONLY = "0", ++ AUTO_OSD = "1", ++ AUTO_ONLY = "2", ++ AUTO = "3", ++ SINGLE_COLUMN = "4", ++ SINGLE_BLOCK_VERT_TEXT = "5", ++ SINGLE_BLOCK = "6", ++ SINGLE_LINE = "7", ++ SINGLE_WORD = "8", ++ CIRCLE_WORD = "9", ++ SINGLE_CHAR = "10", ++ SPARSE_TEXT = "11", ++ SPARSE_TEXT_OSD = "12", ++ RAW_LINE = "13", + } + const enum imageType { + COLOR = 0, + GREY = 1, +- BINARY = 2 ++ BINARY = 2, + } +- type ImageLike = string | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement +- | CanvasRenderingContext2D | File | Blob | Buffer | OffscreenCanvas; ++ type ImageLike = ++ | string ++ | HTMLImageElement ++ | HTMLCanvasElement ++ | HTMLVideoElement ++ | CanvasRenderingContext2D ++ | File ++ | Blob ++ | (typeof Buffer extends undefined ? never : Buffer) ++ | OffscreenCanvas; + interface Block { + paragraphs: Paragraph[]; + text: string; +@@ -179,7 +230,7 @@ declare namespace Tesseract { + text: string; + confidence: number; + baseline: Baseline; +- rowAttributes: RowAttributes ++ rowAttributes: RowAttributes; + bbox: Bbox; + } + interface Paragraph { diff --git a/package.json b/package.json index 472326ee65..5613c74d3f 100644 --- a/package.json +++ b/package.json @@ -79,6 +79,7 @@ "officeparser": "^4.2.0", "os-proxy-config": "^1.1.2", "selection-hook": "^1.0.11", + "tesseract.js": "patch:tesseract.js@npm%3A6.0.1#~/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch", "turndown": "7.2.0" }, "devDependencies": { @@ -257,6 +258,7 @@ "remove-markdown": "^0.6.2", "rollup-plugin-visualizer": "^5.12.0", "sass": "^1.88.0", + "sharp": "^0.34.3", "shiki": "^3.9.1", "strict-url-sanitise": "^0.0.1", "string-width": "^7.2.0", @@ -296,7 +298,8 @@ "pdf-parse@npm:1.1.1": "patch:pdf-parse@npm%3A1.1.1#~/.yarn/patches/pdf-parse-npm-1.1.1-04a6109b2a.patch", "pkce-challenge@npm:^4.1.0": "patch:pkce-challenge@npm%3A4.1.0#~/.yarn/patches/pkce-challenge-npm-4.1.0-fbc51695a3.patch", "undici": "6.21.2", - "vite": "npm:rolldown-vite@latest" + "vite": "npm:rolldown-vite@latest", + "tesseract.js@npm:*": "patch:tesseract.js@npm%3A6.0.1#~/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch" }, "packageManager": "yarn@4.9.1", "lint-staged": { diff --git a/packages/shared/IpcChannel.ts b/packages/shared/IpcChannel.ts index 56ebfb3d58..f35db50bc6 100644 --- a/packages/shared/IpcChannel.ts +++ b/packages/shared/IpcChannel.ts @@ -281,5 +281,8 @@ export enum IpcChannel { TRACE_ADD_STREAM_MESSAGE = 'trace:addStreamMessage', // CodeTools - CodeTools_Run = 'code-tools:run' + CodeTools_Run = 'code-tools:run', + + // OCR + OCR_ocr = 'ocr:ocr' } diff --git a/src/main/ipc.ts b/src/main/ipc.ts index 2183c30831..3d72b67390 100644 --- a/src/main/ipc.ts +++ b/src/main/ipc.ts @@ -30,6 +30,7 @@ import { openTraceWindow, setTraceWindowTitle } from './services/NodeTraceServic import NotificationService from './services/NotificationService' import * as NutstoreService from './services/NutstoreService' import ObsidianVaultService from './services/ObsidianVaultService' +import { ocrService } from './services/ocr/OcrService' import { proxyManager } from './services/ProxyManager' import { pythonService } from './services/PythonService' import { FileServiceManager } from './services/remotefile/FileServiceManager' @@ -709,4 +710,7 @@ export function registerIpc(mainWindow: BrowserWindow, app: Electron.App) { // CodeTools ipcMain.handle(IpcChannel.CodeTools_Run, codeToolsService.run) + + // OCR + ipcMain.handle(IpcChannel.OCR_ocr, (_, ...args: Parameters) => ocrService.ocr(...args)) } diff --git a/src/main/services/ocr/OcrService.ts b/src/main/services/ocr/OcrService.ts new file mode 100644 index 0000000000..6ac8c311e3 --- /dev/null +++ b/src/main/services/ocr/OcrService.ts @@ -0,0 +1,34 @@ +import { loggerService } from '@logger' +import { BuiltinOcrProviderIds, OcrHandler, OcrProvider, OcrResult, SupportedOcrFile } from '@types' + +import { tesseractService } from './tesseract/TesseractService' + +const logger = loggerService.withContext('OcrService') + +export class OcrService { + private registry: Map = new Map() + + register(providerId: string, handler: OcrHandler): void { + if (this.registry.has(providerId)) { + logger.warn(`Provider ${providerId} has existing handler. Overwrited.`) + } + this.registry.set(providerId, handler) + } + + unregister(providerId: string): void { + this.registry.delete(providerId) + } + + public async ocr(file: SupportedOcrFile, provider: OcrProvider): Promise { + const handler = this.registry.get(provider.id) + if (!handler) { + throw new Error(`Provider ${provider.id} is not registered`) + } + return handler(file) + } +} + +export const ocrService = new OcrService() + +// Register built-in providers +ocrService.register(BuiltinOcrProviderIds.tesseract, tesseractService.ocr.bind(tesseractService)) diff --git a/src/main/services/ocr/tesseract/TesseractService.ts b/src/main/services/ocr/tesseract/TesseractService.ts new file mode 100644 index 0000000000..d2ba6d2ed8 --- /dev/null +++ b/src/main/services/ocr/tesseract/TesseractService.ts @@ -0,0 +1,82 @@ +import { loggerService } from '@logger' +import { getIpCountry } from '@main/utils/ipService' +import { loadOcrImage } from '@main/utils/ocr' +import { MB } from '@shared/config/constant' +import { ImageFileMetadata, isImageFile, OcrResult, SupportedOcrFile } from '@types' +import { app } from 'electron' +import fs from 'fs' +import path from 'path' +import Tesseract, { createWorker, LanguageCode } from 'tesseract.js' + +const logger = loggerService.withContext('TesseractService') + +// config +const MB_SIZE_THRESHOLD = 50 +const tesseractLangs = ['chi_sim', 'chi_tra', 'eng'] satisfies LanguageCode[] +enum TesseractLangsDownloadUrl { + CN = 'https://gitcode.com/beyondkmp/tessdata/releases/download/4.1.0/', + GLOBAL = 'https://github.com/tesseract-ocr/tessdata/raw/main/' +} + +export class TesseractService { + private worker: Tesseract.Worker | null = null + + async getWorker(): Promise { + if (!this.worker) { + // for now, only support limited languages + this.worker = await createWorker(tesseractLangs, undefined, { + langPath: await this._getLangPath(), + cachePath: await this._getCacheDir(), + gzip: false, + logger: (m) => logger.debug('From worker', m) + }) + } + return this.worker + } + + async imageOcr(file: ImageFileMetadata): Promise { + const worker = await this.getWorker() + const stat = await fs.promises.stat(file.path) + if (stat.size > MB_SIZE_THRESHOLD * MB) { + throw new Error(`This image is too large (max ${MB_SIZE_THRESHOLD}MB)`) + } + const buffer = await loadOcrImage(file) + const result = await worker.recognize(buffer) + return { text: result.data.text } + } + + async ocr(file: SupportedOcrFile): Promise { + if (!isImageFile(file)) { + throw new Error('Only image files are supported currently') + } + return this.imageOcr(file) + } + + private async _getLangPath(): Promise { + const country = await getIpCountry() + return country.toLowerCase() === 'cn' ? TesseractLangsDownloadUrl.CN : TesseractLangsDownloadUrl.GLOBAL + } + + private async _getCacheDir(): Promise { + const cacheDir = path.join(app.getPath('userData'), 'tesseract') + // use access to check if the directory exists + if ( + !(await fs.promises + .access(cacheDir, fs.constants.F_OK) + .then(() => true) + .catch(() => false)) + ) { + await fs.promises.mkdir(cacheDir, { recursive: true }) + } + return cacheDir + } + + async dispose(): Promise { + if (this.worker) { + await this.worker.terminate() + this.worker = null + } + } +} + +export const tesseractService = new TesseractService() diff --git a/src/main/utils/ocr.ts b/src/main/utils/ocr.ts new file mode 100644 index 0000000000..b0079f2a50 --- /dev/null +++ b/src/main/utils/ocr.ts @@ -0,0 +1,29 @@ +import { ImageFileMetadata } from '@types' +import { readFile } from 'fs/promises' +import sharp from 'sharp' + +const preprocessImage = async (buffer: Buffer) => { + return await sharp(buffer) + .grayscale() // 转为灰度 + .normalize() + .sharpen() + .threshold(100) // 可能需要根据具体图片调整 + .png({ quality: 100 }) + .toBuffer() +} + +/** + * 加载并预处理OCR图像 + * @param file - 图像文件元数据 + * @returns 预处理后的图像Buffer + * @throws {Error} 当文件不存在或无法读取时抛出错误;当图像预处理失败时抛出错误 + * + * 预处理步骤: + * 1. 读取图像文件 + * 2. 转换为灰度图 + * 3. 后续可扩展其他预处理步骤 + */ +export const loadOcrImage = async (file: ImageFileMetadata): Promise => { + const buffer = await readFile(file.path) + return await preprocessImage(buffer) +} diff --git a/src/preload/index.ts b/src/preload/index.ts index 1059826224..af4803fd50 100644 --- a/src/preload/index.ts +++ b/src/preload/index.ts @@ -17,9 +17,12 @@ import { MemoryConfig, MemoryListOptions, MemorySearchOptions, + OcrProvider, + OcrResult, Provider, S3Config, Shortcut, + SupportedOcrFile, ThemeMode, WebDavConfig } from '@types' @@ -406,6 +409,10 @@ const api = { env: Record, options?: { autoUpdateToLatest?: boolean } ) => ipcRenderer.invoke(IpcChannel.CodeTools_Run, cliTool, model, directory, env, options) + }, + ocr: { + ocr: (file: SupportedOcrFile, provider: OcrProvider): Promise => + ipcRenderer.invoke(IpcChannel.OCR_ocr, file, provider) } } diff --git a/src/renderer/src/assets/images/providers/Tesseract.js.png b/src/renderer/src/assets/images/providers/Tesseract.js.png new file mode 100644 index 0000000000000000000000000000000000000000..d60b9b68780650003097cf24063bbc4d4293d6fe GIT binary patch literal 23940 zcmeEt^;=Y5)bGKOK^g%`DM{%RB!}*j?ozs>o1r_DR=T^SyGuF*Y3WW8>3i^d?{oix z_qUhlVdlW>z4pqzK5OkgVM+>;=qN-e005v%ONqS$fG6PN6X4n7nqW$<;5YHbm7%S<^Zm^@|zdv@NBxF{A>gz7ggu(puef}Ca0S#VhPY(b>5*- zE(2?j#Mdw=?5FID+7TTuRDCU6&+WO@40Ox}i{yt9N{H{B@?5fw#jlhZeCE1$@!n#P z%!6!4dHnW>*PhPOi|Ay3F;C7ee*ZatIr)ch=P=}9^9VDb;7}k$)g^ zjf#-|@7XIix2wC~kcW20XaQ_Jm+~+silHG;nnX1<@AEh9U9@K69&6?mt=G-h!eKyt z$5KaZ)NAFRTrC8drSwF=2jMh3+qa*TPA?VL?OyK1;I#fY-@QkwLm+PGr+q)#$ry=T z`!0SH6pAq~F;RWkHF`PGV38eL5+AE8iu61vv1P2NI*tB_t8XtlwkD+s2mQ#7=#UTFr6%mjZ%pSf+dpO$S zs9TAA{_)!wERFf{pk>FKqPKS2jSZ%!!U=@$h0BzDRVVrG{ob-6M5EqA&=?;=r*z*) z|C|jn0@|mX2J}}wor`T2pBKv41wm}QIp}tpT!MYQ9V~rN`MHEo>Y9o;ndRhh zfxwFi?xNcj#h7s{>+tq#785CQpEDt=I1!Fw z0zo+oP+Rei`jk&JvAX?RB%Bmf53Za(sQxbh*r{^)KHYSW>M_bFE7<5uwgTi|4_K+z zEUoA@{65YPsu1|3q2E0%9S#PS<}>Y-o4mwXb3d2%t6lBAjqef0n34%xtj?bwBNC@e zt?v2-zu?rn!+X+P1rIx^uz}+LXj@ZkaFNDTxY<}NMv0dFJUSNM+c*f&GLuQ+l?IKp zKQv(VvXT0tZsLdCrR$~fY84?!{U>j}!aFdG97McOL{wvE%}ZQ->zSdp1bYvApBr+t z_o#G4aKV5K#7e1VP#UwU`Ow_nj25;afrzn^O5~Xzj0yoiey15KeZ_WlAe2bE$T@uU zk2Rm$nUqC`f7E5T2;HMXrG+)arJ7f!Ryi(v+xw?<4vbfeetzP6f>N^Q&ISWH6jxPs z_t|~kFB&I)Ml>?4KOQ))c6qS0Ko<9aR-Ji|wyJv9ryAcw$sx zI#1E?PqT$&9Urb~&2!dgVmf7RsR+3Dq9DHv5P1=Q)Y@~GaM1U>f!dOhaC+!y8{6uk z>cK%}9?=A5kU!Zp$VhRIdc`8MnzN#lgtkTGlN&lUQy;aLhR||+18xGRC2G@LcGnaS z%5P=T3NVs}5?|Hv@ou!cu2Z zdX>St>iTTFl?{b6`(DKA8qt3$5v({hKeEdYbT^kYsai}h@WriLyLrjsns zrwZ?t-EF3>!3?T=k%op;lBagHid(sBwz=qo z2A#?flEhrW%Rln=6z6jXnRXHe84@vSJTWQ*`d!7Qo9nM0Kg5T#R( z?n`WM;yHpc{gq>PKD)aZ{8jKTgbMfD>$IbLlOAk}POMNbQ#9*PzL&D=>(CRyLK1A_ zO{LJrRcOtQ#)g=G^#Wk9mrowWh^`yU7W4(NP6*pTSmV)y0|og??m}M3r#+o0%ucko z07G8S!#V8lCj)sj=O;YhaYz_VfWAL3QHbKX=xnJfri4x;HTG`i&K%{fk(D`t1=Q|{ z3?$qy-4T;wA?kWRIDmR%smy^JLlqe`6BDi-D{CQ6#dBv$j>k^BUlbd6Zv{Pjr zLMR7SstWD#S1k5pB^QOOM)jfcY=k-GWEHug65?ioyFU}q4p{sS)_ZIcYh!>$)HsAy z^hrj6skrT>$zCpq5GvD97{*xMuDRKNo;XTE6L6*Bof`pC39;$mXu*ModsIFnamaFM zHEHI+8|*Wp%cuC13SvTc(%2*t*3wZncCdw?fLySc(x<;%XlopwcvE6^zE4ZvghCBb zf4vAkiFH+%-r^z>3u$MbkWeC8kWexQ@|DoCs6>o%odY-C(UK(-Tn}=aCG%R~OaP3` zIkWs9H6-)8qy#a*#E_{jvnvh`e)G?YLZe@|@`IeHcc_E15mwh50*9csd(Z z{yrM=&j`N|+7PA(=7P|oxI`TEj(0WwN1ZY1uHS;d6>>lfy9n4K5EiX%{qT4EotmKxAR zK;5XmI3s#~>X(W6J4ieXgF@3_==f2Yzb{QQ?5aCsV}JYLrm-+ZUh#e|iYpF>W~N6X zboQHtZgII_k-V(cIDk$8y5#Y?bS>%WB%7~2*x>FQXg9%T?9T55K;rKf!m-YXgvQT_ zdPT&`6+yxT4iusy4R|dJk~){)jrVfjWg`?)q*e>dqq zThsKK^Hah&40}PvtLcwm;RvxemWpp9ABlxg++0`xT|eE7l1sD3L5DlM8)-x}c#1E; zq2ZQK93gI}74$?AvZEB1h(<9M+mT^1{iEh@s(Nq?i{zG|$DGSFK6};yB?YY*hC1 z4bs;&ry)P*Epq1R3(&>IAy_zTTv*jv8k=2WA-{Cqt-r)89kKvsjiiP@9dqBCh<9O1 ztM72r(-XnrSVbSnhlj1=sd-;UG1%4YbYY;c&Wm)`!%U=}DZV6kKI3@%p>HV2zPzkESV--XQ8SBv@dCZWd5Er9 zGO|x^-5)Wd2EoPO1spya%R@vv5X;H}P5+AD%2Lzt=Oqi&9Kd^LA&-;IL7GBzMA4L1 zK{w$CT_sX$Kosq=8$W@&3d*9rf7HPf5hdsv`*4edaCUb;IdA;*!s~_r=s4uVE*diY zp{9!SH8Hs9(;f{yY7D0KD?SQ;{ANGy<5A*axu7)`NwLkWULFG7NbLAK+JeA^^AcHz zk`C3rWL! z43D168D+8T_!iM*ieC|*J?8aw&?>=r!^WU~)<)1I{oM+ehF1>l?k_{ha}skPjp}>s zNs^w?)m_brGwt~hnD-#|NE8Rg%URY>4a^UHK!-QA!UjhkQN}0mf%)tYbD~_D*9YZ; zU*`D=lfXtGJi^8&NO3s)nBGIE2 zxmdi-HH1wfh?Z`InGHejVEVvLrJrCbwR-+HGL%UwZ9_T~I_^7EVlEcj3yBjEU$`UZ zHKbyG%lR!lfd##TL{)hql3ms7H{s^rmBc<0QI?t%OelJ$V$fXplCby~fvZ8aSO_AU zLGVJv4h*6sLMNg8`F;}gw|s@@xz9x3EuIm<$Q0UJ8%5l%N62|g{|bg!opJ*8FJvai zBk8TJrPmd`*f66C16mwLMW+Cdq}k9y6yU{gyKkCf+tF{bMFmzwIe|<3Ar;&rIyG@Z zZ2Kp<3KB7$k}s7GV%71DGadOD}2i8@}yD{Bf_xNfoZp^&Lns&hnwc7tyQzHQQ@ z5f$Nac2In#LTdpW(wEt4J2T=>;DCXi#XI4rGRz6=qlhyFJ1VV)+fM#PC&D`!E$)F& z51B4Je6X=QjB!Tc%vg~CXEA)x=iqRo8&7w=ah0Ge1O zeY4^WIs?S1=PL4!3iRKDyuK}$;2tIE;HUQ$zBb1fM0`XYyKBL_eM8wi6GnOt=FmrJ zzM^1>;y<#+EZu7xqE@rpA<*Afj7mGpd6~J&? zS&Qpgbi;?--+5EXv9)~yziWJ_fZN-1Joo17wl@$Iwi4G*UsN#7&#gG| zGXZ$aeQy)$J}^JCe-nmL^>6hyuO|FGG~{)5#q`<9_n!(DP@r&*u4uT3D-R_(o-<__ zMiYpqGiEQVqQCs&1=l6UA~qSD*34|vSOBoj;DtyVT+4JzN8F(pi3ixl3`FpYTsA6W zYC>$&y^X-}tB`8P7m8`BGk~_X_VZ)bqx0VjlBi?Iw_XhrPS3vA9HhdVvJo`Z=&F@1 zh3H3OJCKUv7Vc0>_SBfJ6F7klE@lA`l2$Jt(N6e*fcD)_L#Z7l7Oe)(c}zPmB47F{ zk=|+PUNnFW11uAt?`F39gXl%YDb(;q7={KOm_5-&UoonfN$tG$@lS&;sZ3Ccu3#m^1~HLuD)EK5lJbc2J1~y)AOWSkUl<-FXY4#- zPK6W`!Xy*G0SyT29QQVaG^q8@E2SViB3hUJURojUEZ8LRxab|}lLXj<$rwnYBGr2Q zFeE^6F6GlZCD%L_EcDeqd_D?7wP$AItE-d5WPofBZ)xSJktj%%@gOAoz2q-rs_o3sRTN z9uc3%04H2lO?|57=soEfkwD8v7zVMzr=83e(DF#4Px5ZdXei$?I-f1gNDEs?fcs9S zUZ^e)cG)J^{q%qpE;X;*iez@cW1ovR@Es|lz>o(euKF8ihoSW0$ZQWHxasI8tr`9O zI6J&itKZ$;-JVu)QcjhhLF#6~Ps2Sw20zv3iIGD$aC}u!lRwx7RZ?E4S>5NjjKq9y zH)ix>M@dZDn{wqY5gUX~M54w`bfQnlFpuk|2g%|_^{uNi5aQLj-R-1KacOzY@qah= zf{;r2IV*gkl?%>o4yebE#uoZ05_92_z6iqfey%%h%m4wnEGCbJ{g$&Sn@j@wj)$d~ z3fH@Bq<`_%jZ|RG+NQUD2i#3`p`IVes``}~vC!4laK-TZal!9TZWrDOpehM(u*|&3+nBvN7$#su zc-f8rg0h(BGdLeUbd&bx;MgNu^qIQI8Cd%jb}T>f+b5(7urysU`1*^oKEHRHb${iH zh%8|DtDcGg`>zWEr6npvYM?A=#KWWc%MRA|s-u;>Md3;OP`x@08hiSJkHmd;3+&UU z3XHR)y~J@+SlsGvdMcF%m+i^@^+p8K8&>$v!56;1h7Le3yIUEXq?94Ei}rwB`4{s( zwIGKiA;>Lcv!0DgL>Gte8KeVwtE(4}2)6R_md)g*XIScKi2BLVXm%GBo6@iL6yr+K z_p$AE6>_lCiD8Q;3^Ha?)7kEstrgcRz;Zsq?_Y$6%cm4b-7<6MRk?7K2m0#Yc|rbD zo?HBE;h6E{bYj}C8BpGGh;rwx!FHt~O1(hj*d2~2SPhK67MhjO5+|VO_L%jZr8cne zm2}9#3K}RjX{3|UeqHjcgo;4Z*TkK_&AJqr<^U)M_Xo8;X+A41HKUYrPkSxE?dbbs zNpRLQM2cNpDc*}jOWQXcU%|3OJVTA}n*h39aa}Vt5!?YfK2g&8vGBp_`&vwYQpPq` z{g>iWtL0mh&UKEa;gipz_iHtwAEQI3a#4!uhh3Qj55TfoB@jS|DMRd!W{fZ0HqUtLsk=-*;VvW`2oes38-?zKy`m(q$tW-bF8% zazr2hOEc2TFk3me^)uJko8P(=Y)D6FPrDJmj)KCp#FVb}dx$t4n z^2dkRQkSY9A~WBw>w06VjA;07|Dg;|R^}DW4qD-I#jfO(jo~5vauDd3gP^$#ayYyo zi4-+2(G)@9o-GQq+*~*mL#p)<)Uy!E&g?uiMj*ttbsTkb(XW|$8vr$=g*0porshuQ zoehM=sdyOOiKXqVW@$fmACYmn-M~agF?V#S*mwXcpt1yk#{dm@c=}A$8Nve9RdC`- zyjdtB=he*Go>B|$0r7|`JC$xRUcglg!IhwX6ceyuv5aDAGKRmiW?>44KH#F6eE1MI zKHx@w?12al3YB(3TR5&ZPy zRD>Tl;j0y^?C#UQ;V#($8!ksC*DlQ1%Y54d0k_fBFi(R}b}A*(w*#>|dIL9S#|~I$ z<>Q(N=pBC>E_#6lK*jF;q>wj_op?tjcaByra0i);Q6*MhGB z)c&QiSyJiy=V1JAOYFw0}u5YD$6_w%*@4g3N#L>+T_W?>z{t~-;fNc#!VA-dks5@${ zP3aCER-jj24^I%uNFyvlY#}gLkx;&)I%lN0bhQ}Uwb_lZ;ljnDqg0Z6mdO6M6(|>0 zR{P$hqT#*X8}N&|Z>3lC2pt~(e9^9#*D}~vMz~i3VjqOC$j;8n(>i3%#&X?vATF)_mYj4l z3B^vc=bkdi+2pOVt+e-s1dFk1FB6Kpyq8+7Pp_suAN|;Qy%f2a{6o%LuEaYz8A|>E z*h+p|I){dV>%oqMj@dD91S{5KhMwq1n5uc&Nj19X9{g>|=ydFrNwU)XOvi4E-TJEo zsrSd;y8DMZBdHx~({wFkl%@W~945B0_LKhE)VFUcMnaRM-963ugWu( zT=>abH{oZf;si2m`$0s%f(X7SUo7)Vn`ee-m}!EoJsb3nK>5A0u;*PM85w>o^>?%8 zpCS-y=CQJ`=BwSl&Cp0X&8=Bsk!-~6EBi0ffIXbo!T!U_rX;~)rXRa&S_^!yvcRUQ zP3@*j%{*9J+W}jfRC2^z)z+J#T0yR}R2Ff>^cmwP`4BSVcgMZ#ex_fG7Vou&|Tc=pbQQ!T{RjqW)wx1X5ptQ*EWw~i4@YHg9S_gidX|JY?a z1y~ybhbBz5TbdEN5fHltu4|cszYuqxrr`!(;>V^(C4mhe30l~aDSy@~)e{uHyAFzW zSae>o&ag{6!GzJ*48j&sNaZI49?ZJ-c_)f+JV6(R7H)!}_XLvCb~ zWE2QCRc~|{o>0zzPIA->dCgawZQ_jMz?kmkUk&X=mw9=b`ZC^w?yu6H+9sJ7wenwm z27Il9Z&lvB&uQe5hh!ooS9CQRn69i-y1(gkeO3}P^(ReGZ*hEnthm^2`AFmMMyb^4 zLH8_YcED%HeF0Oc%<<=DQj6IjgSQu#R!DEo^{lT9@cQ?RzJKoSVKMMm$VM0kc<(4a zauQT4CsX)g{ve-CJE-6#sh9^3@_Zju8%~49OkCuJW?8r1N~pKlFQE65eW7^!Vf%X5 zuj@P-Nv37#VIc2B^)dSvTbfV)c%HY7Gr#}4ko~3R^+{`7yau;vv-<^8)J!c#bH15^ z*DF%%Jf4qr*eJsB0dy!gb-C$>&?WIeBsrI+>wJ)zrcN^79)eKZ4+3o~lD7fbL+7&5 z4Tuv6X-MPOxEUu0-3)XjCPz>ZXap5p`%@s$_id4Fx>rdEN-FU^R@so9P8n55G$xPO z0QR`OeqS*a9*vGC)vDAd^*@WZ=C`Q5R~H77+NlOBlqm#`m=?i3bBDWhI6B%Mb4Nl% z8W_y!6GxZ0dcKYdJ2@ab9vI@!WJlK61$exi7Vd5Nw94$5w6ny5YcuLF4^K$Fngg)G?d@Z)M!r z$PqHrHxXod%r+Qmu@e4?p*b9_Gz%^Z7aNXdKYc5()x=5?~&NDa+xo?yu)};+BlzEbRNm_>fKV;*4o#R{oavUn`;N8t&Nv&*C76aaZ{9vfp4= zp(q4-U@6Y}+sOC|H8SpBF9NbNW{ztTYYKOIHa&coggKdsd{Y*!`P?j4fR9_oBvvbQ zhg@V+^pr|=lOt12Mb13I5l!K0fAZ8mT>>q7>KWMDhOU0L7b{TqkWL?r42=@n(3pM` zF{aCB;+*udie{p-mYr?uI9Wr!qw?*rToNi;A6s zpJI@wFN3>ta`Va0lyYt(Z-Kn3CgesuTHXe2nm+y^$52+$wS?X(@aS|=NZBLV?fq(O z)D{s3$O3u(%QOm#))W7j3TvD{&gdmpl%VXQU2_|i+uS4W7;92a{V$(B4t$MMc-7g5 z%P~Gmn(>FxA!RFKtS!d6S_`G%tpKChHwV5*&3xE@FRWU6(=4TYzxcGRRAo=fX`I>@ zk*gqp!ia*Hl@Yxp=|)txyed)xWTc_!WV>f9tUDsw=tN}+${UJdxbbOcTzZWn0xH^t zqM-zxK-f?G-Zs+6{LlaGt$v;OQw}?x1I8+~3fxsq%^cEB_BvVQqFR5zvmm8EUro_) zVs3OHlSS~3(a-x+ z%h{<80%!8BiA?mvCP$(wGQmQ$h2@*$cQq24e?d<+aAXil83T4b`)Lj|4V|T(lnkJK zw_5eh{_Kr)ni)NO!n<3qKtrB994uYA#?O%SpSBB2S!GRnRH~Jx*J(33+r|7)-*;G4 zEK%1n7K6D&gucV-4AMh*!8DZpu5z_&Em#YgB9JOJ3QJ@_A#t!x?#J$B3hiLrD&n^l z_oEge+IA5`c1HunqqD|4Y5hjmPfHY8?nWK%Hj?tcXZI#|4Ao1C&h^wo0tqtzs=@Y0 z@S;Y_SLDzDVEmze^jGSaO4n2r#i*Qi{=eU^DW>l7B@#19wY@p1R3A*&PSvXl-+m24 z0Dx=4f*v7d$B@7|%D7&m``}*nS)sU394fmR&hcj_Sl)^R>N(wL`<4iAX?8j2!0Ts_ zNXMwzzwTcuPKmgu+RXaO7#ubt-E`*Lq?p^yw5nBJ=I%OFNzUdE(bmt*oZo>nM3*Rx z^fvgq+XggrI=8u4H1Wqh_ZI7EW1Ps6KRZ}%Y$c)=i0RnTmd&Et#l%SQdY~yL9w4@LKfT0vG z)kfBCV$uP+7ccjoz*TfD>uT}06k!t2xYLVK8WVx9Vf}(6940EJ+CFKgb}s<{o;9>v zod2S^3$d?&2d8;zK1R5*UXu| zmkV7ec+e*-f5Z<}-t4r$mTS8?te|qYME|*l3;>xz$UnKxI)C_R4wvsYd1{@d=Q^;U zLJ;D-VlFoo9X7IE(5LQ-s}4HVBz64Pf0BW`^SrV-rUz=6&~BY%YFgHD92dR)SKzAL zHMY^rZ6E!Nn+zxc%+jYIBLEJ&VWCK zwX)h@9YADVwg%W#$p~`?Y_IL6=-*q(KMvY?1}gdLBVa#v;X_QR=iX`n%WGZQvrn&M zb39OW)unvA

C)!!CpTJd+&jK=T|!g_|wwfTjDAv5k>acx$8V>4`aLT@hh&rny4?b6aJMiVR#saBWP9U!X;=_-*sd+o^Sl6Tj zE*XZxI5w#t*6$f1k2FAD$;a-Is!Q?9O7pLxSkB77c#hk;{itWfqUhJ!RI(Blk7Y(1 zaPn1^*j!nbBD7Jw++QelevsLAWheBfu^V_K0*_9HbbQ=kS8@*qfyc@GQsk{A26X-I;T@KWRVLj5$wC#Zp&F_<2F+Y1z?J#&= z2Pt@H(0rgCGdj-w-ak%i^N+YL8!^0Gw4mZAN!+NqavHI6IQ>^IsJKM;j!8mHxirb0 zYJB-yGIyIX%!rotnOwcF4^brbjsKkTl9~Y6u}sOZ)}t?RH-fcYj6`63Kc?o2DmNf7 zcTLpcRJ?p}S?;WFW2QT9DaLracz~qZzKIK8ux2-Tl~YZ?BM#8Y#6rjbVJ8ygGzN2`DHea53NV z+)Soa5wTh8_k>iHIdaxwLE@B7EuaGe}P z;3kd4H=FnO;cvFMGz?@t$}caI5=6h%BDApDw>0=1NoomYv_0_4zYlggKXWQb`nT%Q z4$vVjs)9UoL!dd8hJiabeRpGvdd! zBEtoN>HaUq(Q1OFJF(;6reF74mF128l2`xO-BL!&oMHrewY8Jx;vvevC}(9w0N_pi zf46|0*x0)_4g>1Z^L1$7wey5(D|x@I!OB-!NqZ#ErWO1owYwPZsXn>zAACzf`a~pcuTQ33jBgk%bo79MC_pL2T$Wgxsj`=-6?5|YTJnQ>9jxoGXJ1U+7fBzyZbOwuqA+60 zWE03T1YiKH9$fXQ@ZuAA+G2Gd{)6i~yO2PymGfHxa|ce1y7KF5g4c41H*pu~-roir zovSVu)S6?ix;|dSD{sWnGtAg3%A#+eJ1m0O0swX#Lwc(98r#1(!ukxnNwDXG zeZFQiC(;~C)4LLp03B(MeghXQXw~DT+RJav*maV6+L_ckaNE4SUM3OJMEYpq$wx{Z z058=${>!qRomkdn1kWae#LPTRktI;CNB?3Ftwwb1ZXkRA!JDqf1Lo2XU!HwSW>F9; zf%G~dPQGvtfb;!}xq#S3hZ4~>{BO70m2yTMf8?%rU3z>Cgh+sfN5g`8%b<1-36u^A zv>tJ~>VnPAB$WKwxX1@Z(tSiU0KqWl-{!d^kPrKQ`-+wppC4Q6gh9^CM5vFlWSRjm zl_9IIvsZjn5_0wVx<2!!YcQhFntd7W(x zL*r&%w-ApNXljWN>4!lnLA8agvycCfjRzEX^13%c9>je29tx@1q9Q6;<4fN^U((zV zlPrQL3bYa-@T_v6fQqZ#_FAGqR74{uV3CZx{+TMs&+C5{N=Jah{^EwK{6!DS9<1#!m=uE_v5w@C!u2N_!Io<1a`6TfYNI3akk0VwWK_ZGE zDxZjsOB#u;#fJvhOc|x+k6?-ko+ktC9DUJFK1&oV?RF{^XCJgW6cFlL`r3yksE3rht&%RebTZHz>aO z|FeZ&d7k&ng=0gYXH_k}y`NSI?;eF)B`C=GE%+w@X9r)l zkRBi7K^m+=k5B(g@_#^=1QI~>qm^K0j~fi!C%3NZqRP4GQQuf}uS-t^>o0m@(j(D! zj~8U`SFemdISLNbpSXDo;{ZD<+6$!`auN@0aXB}43UL*mvYu-Uhuv^y)nqk};J*aP z5`3QKtn-bOHf3%burZzsWKY&NQOF5L~Sm=iIzwh$Nc@ zGP#S2?S5kEyMY;v-7-eCMg8s}HJmzWG%~GSfUlrt!psSj2+DMZH$YdK(YpCpe9Cx( zDUaY9X&8Q6dq$aD^_rtBb9A)|9T-i^|MAD}ar@+X^(|slv@g@=XQA$m$Ss&S(&zCq zvL5A1jaTI^TmI_0`RyYm4IODe^fK=CipZ9w#?dHKx#byHEb{b=2G0q3n{3TJd~;ns z{y43H|3LhnfdUsPU>ABjDI)TOf&xb8$3uZWsfY00;nSwRJ$@qF9F%*}({44oY}xa; z^|D;eckDpp#}}zOD%tsF4#&}NF=Rd>lcTO%i;bI(dcjl!JNL_B*wcJg*=ChLF0eJVK`XzyY$^UE_j!b6TYA-@v+;#+agBp^7w~0g}s>||B~-= z=X6bWFotL-)ZQf4FF5tVHd-=vqTT5s_61bQ<^I~zM?CF03goff(ypdI>pwI%h8OI# z--gx3kzs&Uz3X7BW#bq!Dh^Q^Vzv?MSvtm3MkN1z{jHdULdrb>c0Ly{{AqYDj^EO~jDQ}V>Q4fw$JkUc{-WE|C1I}F1Ex!<-gwqr_35)D~}5C$UiqVdn% z>f*Ps5OC^m%GFv;S-6+Qaf}$Krw)W5hLl*1f@bd1HdZQDVWOl3R8fDZ;aGv9rU&|f z73KJvMQGd)H3EV|yvnurrbkRuF^_$}ZfSHKK>^7myEX4djt4uJ@p-#yR0g}D*i3{u z$-)svzE7t+a%H_!hO~!$zbQRZifz@w%b?@HOUaAGy7kW9in8guuWkke*zH=RKAeT% zg&K6jLd5(_{R|QQmudaPSGmqFpg>tOuFCRs>p$)C_}icQzbk<`HQb1j6x`f+nl0yn zH;>eqR)m^4;S@}a?$m`{pX&(~v^B0z1@R4%(xfMT)*f=}V}WAAba#8v&oT1^7$u!L zFr5<_ehP4o_#RynVuMrPIS98dEE6mLGUc`Y(Y8TridZg~X0c%N*l`}fCaKdMyxD!i zmK`tz@UQOC5?~g$-_f22aO>^8mjVMsFOO@4n0bSrk3}=93y}fFW9?tjSj-B~-)n6P z8VHyG%mHJu5IaiQl~0tsg*(J!JT3SkxoSB|f3kPR+l{4K4xhWf(KUG_CHt3sEHI(1 zmiH`VUK)oKBcqawf;CUZhF;B)$=`d>R~z4(TAcxH93<(|x(raQe$LU+Y7L8s8un^u z50<{p>*pBfr%ep&QdXb4v_Uo^`jZA_9ghpXK{Uu`V9mmB`mB*YjO#g2>=i-u$5K%zt?;2=l!Cd`B+DL9%kz{ ztMb{FZ&5qR$K8UUjXPH3aZl-_<-N5?D?3$Sge+u4d2E7T_H6(5qVU7l2DS1;sJ&Hf zP2CNZ`kEH`0ETi1dWp&Xotmy;Mp91}vGG@%_8>!}4kYyI2Az*H|A`E^XCK|tUlpaEVjpSJ)681gEYy?uuu<;Jw~L6bbzn#As} zatRoyZrXn-kkkP`yWbR}YECLV3J%{n`B@*n11A=B5TC_M@FSU<|Mg%UUjNH@g8)0- z>{|yIB#U&qixte9g4`^cm0?y57f<}Cn(n<6q%YtYI+eeTh<&<^zE@N}TKW#>1lsQ( zZlLqL93##R>QV{+L@O%m(@Zb2PZRGs#G^xq9Ri84WGdqXS|+n*#UCUH_KoA4heJZ>;jm%yt`o%WgoBRSnql}eY)56RC%R?%p zs3-{ClmpPg5oq(B0Im`~{CXB^MXVE29-bI)AY_%%r6EQ7=Jv_dB~?!D?4w?ttdcE+ z3(n8a)pQwC{Fdyqbf7K+%g$bL^v{DS3Gg}jy<%_fhlXNV0}xh~Jv-WWG-TBx3RtH+ zM}C(257FE>aZ>;f3)h;RM>Nh60$c@g@BdMyF`CdjuhaeBGSrcM*}ViRqOwBikmsdR zU@pH8j#hBb*AmJxO+su5E?~1ks&0{bY0bSoBE!l|GO@!8GOWl()l;IxYC;5eM~994 zG%^42rl0PMpUo#x3|w_C3PusB!oKSd${bm{Gd7139PsJv;I~LBA9VO< zC=jxh)h%+^&|n|%{bUksPJ56<4?Mj#G^~oAxYZ<3_@Pp11GCF5ezOt@*=bNi?8k)H zX>+oMjW~+?y(2n>WKI^U?iA6=qf3Mqrl*exC)u3^7@5vs7(tlAwS<~kFOhW0tt>3s z3H1JWvQp9fx_O@Nqv^^_>?p(uP!a?Bo)vv+0L4v?^k=bj9`9o3cdT*yv7Ce;7}*r(dAt7&WL>;Oj%XkAvGsE}p*=ncv4 z8pHQkg7syqKDlK8i$6R2eE-9*YCYz^tRAT)qx_3*L1;#ft z=&%(iocK)2@U$dN49V0G>2nE_WI3K4>*tmgeao%SP&l4q>Y%R^hMH9QFk-09;Kc9Y z&wUu)dlH6Wg<;vOSZIvc!6_ENH3#4@=mzQZA2A#dyfvS zW&&_B0U9W_Nn||jzn}p>uRNfCND};Pemoqy1&pVKk&g76qS&R0U6K-D1KI;bu%H+I z4=wWz z^4-(+KWqSQpNXTT(DEnFcOK7=M1M%Wqc{I0p%L{W^J(R`Z@*bdj4?W#{HF5|c)}6r zvb&!?%|$6;Nb)18r(Px?oNGhXq2<;at!;G}WTWb{iPk_1qu>HyJ>!vqiBxXs*7dLU zbP%Lu2r?U{$%1?KrcxA}WdF<-CquX_d-wAYo)N=bwAQO}$axFnIXNnyRI3RW+?14% z55#IZFp{W+Q?LRv%Id$vv$qCE8L}9Qwi1SETYP#8XIW;F7@IfP z>+dH=f0*4qek`>ZfFgFO$m_tu?Q%c*to>R&?DT!xI zi>#X&-Y~b|l3_C69}Yd!YWPj^Bg|^U@`xlEVbvFr6gnSO`BlX$WiYaJFzo7rX6Wy4 z8vE?k-z-ubRI0e4jC2Y_=!OI609mdsuCiYEP9}ThJmM;nT9ycTFy?8JqJLt;0a`uovAas@#X z*C7O(thbmtEZ&1smsyCSY*fG9CM*DqyN_+@es__CbN7YfR%?B)V6_n0{uq;+dbjLZ zMEB8Kl-4y-F#_xY5!rE^$k{fdCXbeJAW*2)%&@>Xjg!67*$cV7mByC%l|1dN-d^Qf zy@HIs_p^29H5UeXTtuQzmae<|_jkds7ri{6I1Xy1^S$ElW61I+oI^l~eXsgq=7)6a zTzkiTK1c3f5>6iWWEt3Tro;tv;14Q^HlpScsh;awFP5>kgW zt!sW7#@BiHR+M*CUr$JAE}1Zzmv4aJBnXg0zIS+}jt(=JL0wLQTCFwDf@U_{n%Ti~ z1d_T54oG6gzyLFcpkU8^vD$9G0tkNL(D4wogLo3@iQx{6JnOG1_#wRe^cs9SPf^jZ z#NQM7;T;W#Fp<2tU`s&ThkU_C>9FbHCnKAzKEg@ER%V&M*Y*4UeXsYu zU+?GZ^?HuS^YwU8eExh?{cw$60YP}y<}FJrf3awkEPM9cC&A@bjRoZ z{m$c(yMC6-uR(ccrBY>ybXq*5r@GWgy)!j0=L7}GKzl9A76d12xN$H3Yc7cO?Abp= zu_0R{)?LGWTtyLUrgg6EMVfd2+;b+mOs$j%FMx(ehNmfe1T~;tl_X~Ak z?zH$juqfh^_PI1|kZD?NNI|(Q5TVTrFUZOcoy)y#SrsAYo9lQ!1eHhtg**N%uyTc$0q0L@y`!Dn#e4 zKM65eIqjVzHOb`^{?hlVn!fIK=Xp3**;D>3!5va+s7SvVdfjV;F2p( zFo`~Qv%*fs=l6#`{>`6>F_)=)uIHUN zV#CA!3r*7uAVw^?}A@mvMILwEgQ3-dh(l<=2#0yS=jUmp!OcyGx16lNRObem-2 zAr2uN>&n-MrCV=*&Ss8=*;Y^<-Sp}wjNZ$s{e)+cWj0?Hk-K7XeAtf=zVdeK<@qg^ zFmv`e%MKwpHmdgM)X?Y_{mSf<69Nle*BeWC8omwS5fll~Ff91+rFHm672Dg-MQ@cK zPMc(k(m^^Xf$lrHXWs;lc^$+dL_ie??uI?a9=}DIy1$#$r8;W;lat*2V5sPdoOQMWiPK zsqa~C9vLK4ViLuo0W;=+Ni`?dGx|Euz80a^*9%Wm?y0AFn(V2SCH`+NToU- z0yJd!)216;T9Xexos)Q9GO0s$*Rb;1e?baPkjz&9w)RB8m&O{Q7sB()2AqHz5BSr| z1pKPI5Y%pq96yoB`6a#?{&3iV(Ps$<_>WlwAjRM9mk?jAO+b0ZyAqlKmuV24&Li~F z;cFv~y%J6z!-3pUH}L1_{yw!Q<*o`_enL?#?{>G0c5k{TX{J+Iw-u1$%}_SxLGnxW zStOiCOAMaS)t_GL)zL6e*Iqd2Lf`Y}nfqzu?e$Dhc1kjq>O32G0*%s~T}(@i))tsc z93f&nfsBTGOrqn^q0ASW$D_e(g<(ZrfoQ6jBa7n_Eb5s`q^c?cECE6`<_gZF5{g4F9a zG%a&Ha#^tYep%^+OB=icNlKO%S&%Y-+WjigYd}~~$Nr|cpsoV7YID!RcC>2kvnyG8 z%~H_!zhblJ&Go*5Tuy60FrO+WkEorLL^HEw@M}C-*Rjb*Q5XGO&}mX zK(wKvxeyfH-9ve!xH;im{8c~}!2O7u$&nS>QId-^Y-Wd5oE=MCAUiV-`F@n(3~4&4 z5mBAabfcfOm`c9;s@T_DDx~1^RT{PpTm&%ZGHPoOipaYbj^dc_BiG>t%*Zit+0 zK;e}3B(cw*KzFLTcQW#E8+-g^&SGA>4tB8RpokVGOOi__Y_|FlB-r?pb&h|-3(c&s zu(;3$i((IZ^%Hh~N()a3$+v@lUY_>IK*g((Qvi~x^HGk(aXtx7LV=r^LC6w*af81H z2@=|u1$5v0$8LSArRS3LRR^5zU0>(RJx|L$og!8LD2GXK1BRind7Ke;CN(Zg^O+GA zFkw5XV13(PbG?rbJCrGQl^=RM4{*G=YIT)&6*zGYjMtMteTlJ5LAqLJ3xz)R66I+TRb79oCzmHMR28#Wq|&=7kvUcRWr45#VP zfaWtim(xo3Auk}dx@i#W$ToU&#_GaEYb(qk^?ZPM2FQ!Bfn?EWCC$n9_SpA7x$1os zLwCcZyUB}-^b0~)O> z?lP=cUL2%cK8YZwWxz;q-`ock#ZB(k|G3!@Qc%PB8M}^@;&%yM)wuzD(vLz#O5bIR z0wl|ggQf2imY+Hvk;jVFIoipZbOf;mMUF zKVZbT{dl{R!K_Bp`X9bwdY><>8kfsFw)oLByi&*UN1FGqJh;dE3MUr@g5{seV(Z~T zs+g;DFhzwFBV8vxrW@29N%+Dk?`Kn}OW?%gOA`IL)f^>V3Q4FUX;lMqhJ(iM<)G-` zGMr!u37O~vGgXJ74@CITr~UGy0~?>Lh!_Lb4uS}mno@WjX=g-=Mdk9nl#rXmB_{*` zFyA6b-uAV5hu8}r#hZQCT^-i|CcKObTJK)ouFunZ?BR4(P{l?XK(L zSKUsZGcL&)d2LwTQjA8yZ#-Hr^2573k*ep3<4~qmzem}p)Mvk# zbUsuVIM+sIG|-MMHU)M9*q8PN29|V@l=cGwZM)eh#m9_{ODu6rlc3gwG2@4Vf*(=T znU0t)>T~0B$xHvf-_p6oZ$}ajCt%QGkK^S@Q>>T3ZgIf@)KKlHD{aPWeX7m-m0|P3 zS`Ez`rt9t8jUAwFF+yn3=5b)2fvwUB@U1{WDQc@a)`BIkwUUiHZ6j(M4;jphg05rZ z3_&?no8IV0#*Z+ftphu^}j)PACelg3xQ?7U79xGxX{vsA`1ucGD`>Q?0 ztOuy?-L8*D7kp|*K&z*RBZ$ZF)U|bk(HOVXsWT zm`{P_|2kElye9;>>x~_th*twM{aP@wLfVP-Z4o8xlF&d>rlUQEIYd~1x*)Nl;q?_t zfl%MTcd{t`uK14VNXH2elr(h*0gY`~SM9NtgxOKT+?jL1d04*Pn;qsgxggUsUp;<; zR*ibAz&R3`AuWArpKqU=6&n*FM&3CLkZ>-SZ$7A`bZ}OAF>wHEM$scm)>S?o=Y9hz z{0Pmw5C7Y$DT3%rg;=@g_Y=N-eo17z%b+nYmZk>qe_lO6f(2T zCPtOVpg|#=A_$*1+d1ssU03<)99kT1Ci(!HGg_GNlXy)=W-5HbnX-9+ZeICc+bwge zlth;p`w-Dr@;2QCsW4KOPf(uX|8WdWOJTQC9?m||uRJ?3I#xh`Gh_U6>=1aJV%V3- zJE3Ton-!YeU{bq_BP_e!h{PFOR6(Cnz5O;P8}Fh1>wi~7*dv*&EDH)q%J0+JVs z5BPvKKuTT<;j4)W?hMF2EY^r~Zwm^sH8WBx$#<5aV~16&uShvygEyBcpuctfZPqq@>>Fo)-<7Re8hQ$IaOJ6s~;u{vAbbXjCN7L-LG} z;{`=VPo5o_E}kl8uNlP5DQI*eZfgF#d41Wif9B<}#bTZpGgl1e3I>)txxssyf9*E0 zn!I50L@>c8Sb0Olxd)w~ES`k8#MtTz?(lE{$92B1RJRJq&msJoi_YWC3i;@8* zrB)SWjmaE>1Xj%&LoyB;B5YPd?+zy_?&lu++SnNj%gG0Wv}iS-R=-+4E9Mu)h>db2 zStI+wzXq2<`j1BMXZBk`>T@q%z$5lYwRb~7r!kZo$>sz0%P_7(K%jEJtCoI?(GlY^ z#*CDq+yzcXj)pGl+JAwWoavjrc;b&UNtH~r{8!y0KQLb4;~@>&+yEp<=4 z`dKsb`FCdm!GYiQeG%6zL>IHa2tSqRol(!)dD{S`;!8!_4@W7o(+Iw#hfWv z^Iw;k-f?a1*T?z^w~$gX51?ifbx#<&PQxJyWlaI=ka@&ka(SU~!U6SNrnWlzI-S10 zyG8oDTdFXz$Fz@~GBdP0T7cu&^sblS_9AvFsM>y2t2<8L`OM#FZulm^#(w5-=SDZs zfM8aZf|f8ag~-&FMhSE(%CkBS%$xZOyl%2oGCzlrZ!eF0faTich^wWhO{jvLJT(_9 zb__S_auShK`ZF12O$+Nl^o6$LEklvKnVanoUyUqG^#<`IEn7*ALGk=Jr-uFFcrQkf zbxr{Rqg$-NiFrI%A^c_xGrKBHR?~9)z1Z5LfHh>hfR7b>E21~bAI(E5BOhH%0qZl7 zCsc@Y>r<>cK_{9Az2B#_XLIG9E9{!jVobyP-iUBMqhnKbL}2IAsaJM*$l8uTTBaK!%_KrhFNFgUm;t>in_AU{yHEQ^1he zx(i~^@t3!DcZE&#C`+Qw&?jv|Tfz96WAe+}1lnLFAawD{Z0fhtmY zL;(%PkqpbFx;SoiN$c|qW@L{H7Q~mcZxvyoEdB>kH&^8{RtB;!9e?lsGR_*ZzW?F@ zG@*N*gah9pW30i!&0vR#_oUgali;Rufk4B@`$CCLdr9h?tm#!r2X@Tlw4V;pnxn3! zrQ+mPwc-N`J6n5e_5k8~!UDC;CtUQdH!zq~EZRQ`t0QUnd^o$M0~8s95#fuVe?-%K zNGZwvD@Su8&j2n4E<7BfA4a#*D4s_O7 zxlq+q{F4pQeeyb%X_5PJKRdv$bAZ3Xuwt{zGV87__Bt1$%hLQnTNSi?UuoX*DF9}- zqEzNV-zLj^MHO?ADPsNCE6|yuh9Y;tjf`FgIHKgpTD~s7E}j+123-0z95LbY;{*v8 zHbqQ@(@dP34AA`3tT6gd%b;QRK-D~IQCAm0nIe{+))<`QgcieKKv^q~NSkydE`+*c z{EHu|V*TC-AAe27E5M@%;qS)Vs&rLy;({jc>hG+tdLLW|nk4&TP^P7w@lcfI)vsYd z!v(Gbbemoi?|{UD|JFzUY5MDR1+CYFhe$odxbZZHmA9r*QrH}cfo*!0MBBI77R~Go zH|oAAr9QN7%Lk7<<-iq!st+Z^C`5%tkxV5d+F+;}NeZC0+4qEEp5${L$49HN9v($& zLjTrD(`-(;_w2AAE}L$H8Jy3jP#LG{u7`oP+6R1xZ(*0tMV!lt2yj4=4DRVV@Ct|bog6|1>({47Fasmv`Arqr%txENqkks zc!tn_Ispe!;;q@NEG>fTvbwGdA%;ZFC&0mN6tom#EB{TJ!S@~_ijLvT&*%dUFS9V$ z#5w+KP$`MG=drL?Jb^3e%^6wD#Y2CynmW6BSUTa&V#Q&9IeS2Dq)J{v)aY$yQQ_oo zpLewc9r{BD3}^-osrkA*jbNEO>0(Xo4Qr|Nj^qBtVow>L>wI8SGA+&JyK zQPQ%KFQD}m1~HO4uKx`REO%o@iV0A?v!p-_%VqU@Y_|$D31i0-L3w?OqYXZKb9|ABKN9ERf2P3xH-7dq zd-nag{_^{pL{ulp$k=Hq`Q-|tRgD7HA24sh76_Vm;i}}C#B(9b6y!o1lsJo2{L0mx zRLB1RHHzqu)Cnnn-BdR)LmyEx0-}p9zOr;j_iez<>q846Va{I4A>?W+7C>W%11;9} z+eU=cyE}b$)?%?YZG+{C;RZEqFnszNYob`bjxroV?!1@bgE2II>;ftKb<6 zmq-HBAH)6nB3i!vgt{IvV^q1^VJ6=SZiEEo4G9q4t3);!A|9c8;$bybn-Nh?d%8b< za?tdrX;U+HsQjU0f0P7KI`>&b-Q3P|A>6P3Dm(t1(y^WXG2Q*1=>?n?3?P6k7KnK| z{QFH!_3O11uvXHXu=?vhr6!?3HYs%se4rMJY&*ktV8Ptq0Dv5_#0zOFzvI^>z~ctd zaHT8^V%0y0vE?=QZWMiM?rHFlQ_hCuc(?!8DcmoB)&%JPpaXj~7v#5~a^rQq9Wcwa zoiIt)wHXS51o_99>@Qz+J@<7uZ&&4h(^^d0_RoV*B_h(RsO@ND5e8-jRo o60!aE%C}?;1R0P*nMkO^&`}trBy8lpP5kke+8xyzwAJJP0}W78dH?_b literal 0 HcmV?d00001 diff --git a/src/renderer/src/config/ocr.ts b/src/renderer/src/config/ocr.ts new file mode 100644 index 0000000000..b899cbb5f0 --- /dev/null +++ b/src/renderer/src/config/ocr.ts @@ -0,0 +1,32 @@ +import { + BuiltinOcrProvider, + BuiltinOcrProviderId, + ImageOcrProvider, + OcrProviderCapability, + OcrTesseractProvider +} from '@renderer/types' + +const tesseract: BuiltinOcrProvider & ImageOcrProvider & OcrTesseractProvider = { + id: 'tesseract', + name: 'Tesseract', + capabilities: { + image: true + }, + config: { + langs: { + chi_sim: true, + chi_tra: true, + eng: true + } + } +} as const satisfies OcrTesseractProvider + +export const BUILTIN_OCR_PROVIDERS_MAP = { + tesseract +} as const satisfies Record + +export const BUILTIN_OCR_PROVIDERS: BuiltinOcrProvider[] = Object.values(BUILTIN_OCR_PROVIDERS_MAP) + +export const DEFAULT_OCR_PROVIDER = { + image: tesseract +} as const satisfies Record diff --git a/src/renderer/src/config/ocrProviders.ts b/src/renderer/src/config/ocrProviders.ts deleted file mode 100644 index 5e482e10ef..0000000000 --- a/src/renderer/src/config/ocrProviders.ts +++ /dev/null @@ -1,12 +0,0 @@ -import MacOSLogo from '@renderer/assets/images/providers/macos.svg' - -export function getOcrProviderLogo(providerId: string) { - switch (providerId) { - case 'system': - return MacOSLogo - default: - return undefined - } -} - -export const OCR_PROVIDER_CONFIG = {} diff --git a/src/renderer/src/hooks/useOcr.ts b/src/renderer/src/hooks/useOcr.ts new file mode 100644 index 0000000000..a1cbac0f8f --- /dev/null +++ b/src/renderer/src/hooks/useOcr.ts @@ -0,0 +1,54 @@ +import { loggerService } from '@logger' +import * as OcrService from '@renderer/services/ocr/OcrService' +import { useAppSelector } from '@renderer/store' +import { ImageFileMetadata, isImageFile, SupportedOcrFile } from '@renderer/types' +import { uuid } from '@renderer/utils' +import { formatErrorMessage } from '@renderer/utils/error' +import { useTranslation } from 'react-i18next' + +const logger = loggerService.withContext('useOcr') + +export const useOcr = () => { + const { t } = useTranslation() + const imageProvider = useAppSelector((state) => state.ocr.imageProvider) + + /** + * 对图片文件进行OCR识别 + * @param image 图片文件元数据 + * @returns OCR识别结果的Promise + * @throws OCR失败时抛出错误 + */ + const ocrImage = async (image: ImageFileMetadata) => { + return OcrService.ocr(image, imageProvider) + } + + /** + * 对支持的文件进行OCR识别. + * @param file 支持OCR的文件 + * @returns OCR识别结果的Promise + * @throws 当文件类型不支持或OCR失败时抛出错误 + */ + const ocr = async (file: SupportedOcrFile) => { + const key = uuid() + window.message.loading({ content: t('ocr.processing'), key, duration: 0 }) + // await to keep show loading message + try { + if (isImageFile(file)) { + return await ocrImage(file) + } else { + // @ts-expect-error all types should be covered + throw new Error(t('ocr.file.not_supported', { type: file.type })) + } + } catch (e) { + logger.error('Failed to ocr.', e as Error) + window.message.error(t('ocr.error.unknown') + ': ' + formatErrorMessage(e)) + throw e + } finally { + window.message.destroy(key) + } + } + + return { + ocr + } +} diff --git a/src/renderer/src/hooks/useOcrProvider.ts b/src/renderer/src/hooks/useOcrProvider.ts new file mode 100644 index 0000000000..ce2eb5b8fc --- /dev/null +++ b/src/renderer/src/hooks/useOcrProvider.ts @@ -0,0 +1,84 @@ +import { loggerService } from '@logger' +import { BUILTIN_OCR_PROVIDERS_MAP } from '@renderer/config/ocr' +import { useAppSelector } from '@renderer/store' +import { addOcrProvider, removeOcrProvider, updateOcrProviderConfig } from '@renderer/store/ocr' +import { isBuiltinOcrProviderId, OcrProvider, OcrProviderConfig } from '@renderer/types' +import { useTranslation } from 'react-i18next' +import { useDispatch } from 'react-redux' + +const logger = loggerService.withContext('useOcrProvider') + +export const useOcrProviders = () => { + const providers = useAppSelector((state) => state.ocr.providers) + const dispatch = useDispatch() + const { t } = useTranslation() + + /** + * 添加一个新的OCR服务提供者 + * @param provider - OCR提供者对象,包含id和其他配置信息 + * @throws {Error} 当尝试添加一个已存在ID的提供者时抛出错误 + */ + const addProvider = (provider: OcrProvider) => { + if (providers.some((p) => p.id === provider.id)) { + const msg = `Provider with id ${provider.id} already exists` + logger.error(msg) + window.message.error(t('ocr.error.provider.existing')) + throw new Error(msg) + } + dispatch(addOcrProvider(provider)) + } + + /** + * 移除一个OCR服务提供者 + * @param id - 要移除的OCR提供者ID + * @throws {Error} 当尝试移除一个内置提供商时抛出错误 + */ + const removeProvider = (id: string) => { + if (isBuiltinOcrProviderId(id)) { + const msg = `Cannot remove builtin provider ${id}` + logger.error(msg) + window.message.error(t('ocr.error.provider.cannot_remove_builtin')) + throw new Error(msg) + } + + dispatch(removeOcrProvider(id)) + } + + return { providers, addProvider, removeProvider } +} + +export const useOcrProvider = (id: string) => { + const { t } = useTranslation() + const dispatch = useDispatch() + const { providers, addProvider } = useOcrProviders() + let provider = providers.find((p) => p.id === id) + + // safely fallback + if (!provider) { + logger.error(`Ocr Provider ${id} not found`) + window.message.error(t('ocr.error.provider.not_found')) + if (isBuiltinOcrProviderId(id)) { + try { + addProvider(BUILTIN_OCR_PROVIDERS_MAP[id]) + } catch (e) { + logger.warn(`Add ${BUILTIN_OCR_PROVIDERS_MAP[id].name} failed. Just use temp provider from config.`) + window.message.warning(t('ocr.warning.provider.fallback', { name: BUILTIN_OCR_PROVIDERS_MAP[id].name })) + } finally { + provider = BUILTIN_OCR_PROVIDERS_MAP[id] + } + } else { + logger.warn(`Fallback to tesseract`) + window.message.warning(t('ocr.warning.provider.fallback', { name: 'Tesseract' })) + provider = BUILTIN_OCR_PROVIDERS_MAP.tesseract + } + } + + const updateConfig = (update: Partial) => { + dispatch(updateOcrProviderConfig({ id: provider.id, update })) + } + + return { + provider, + updateConfig + } +} diff --git a/src/renderer/src/i18n/locales/en-us.json b/src/renderer/src/i18n/locales/en-us.json index 48bb9664a1..9dbf612fa5 100644 --- a/src/renderer/src/i18n/locales/en-us.json +++ b/src/renderer/src/i18n/locales/en-us.json @@ -1574,6 +1574,26 @@ }, "tip": "If the response is successful, then only messages exceeding 30 seconds will trigger a reminder" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "Cannot delete built-in provider", + "existing": "The provider already exists", + "not_found": "OCR provider does not exist", + "update_failed": "Failed to update configuration" + }, + "unknown": "An error occurred during the OCR process" + }, + "file": { + "not_supported": "Unsupported file type {{type}}" + }, + "processing": "OCR processing...", + "warning": { + "provider": { + "fallback": "Reverted to {{name}}, which may cause issues" + } + } + }, "ollama": { "keep_alive_time": { "description": "The time in minutes to keep the connection alive, default is 5 minutes.", @@ -3498,6 +3518,20 @@ }, "title": "Settings", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "The provider does not exist" + }, + "tesseract": { + "langs": "Supported languages", + "temp_tooltip": "Currently only Chinese and English are supported" + }, + "title": "Image" + }, + "image_provider": "OCR service provider", + "title": "OCR service" + }, "preprocess": { "provider": "Document Processing Provider", "provider_placeholder": "Choose a document processing provider", diff --git a/src/renderer/src/i18n/locales/ja-jp.json b/src/renderer/src/i18n/locales/ja-jp.json index d731da1934..f3a819565b 100644 --- a/src/renderer/src/i18n/locales/ja-jp.json +++ b/src/renderer/src/i18n/locales/ja-jp.json @@ -1574,6 +1574,26 @@ }, "tip": "応答が成功した場合、30秒を超えるメッセージのみに通知を行います" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "組み込みプロバイダーは削除できません", + "existing": "プロバイダーはすでに存在します", + "not_found": "OCRプロバイダーが存在しません", + "update_failed": "更新構成に失敗しました" + }, + "unknown": "OCR処理中にエラーが発生しました" + }, + "file": { + "not_supported": "サポートされていないファイルタイプ {{type}}" + }, + "processing": "OCR処理中...", + "warning": { + "provider": { + "fallback": "{{name}} に戻されました。これにより問題が発生する可能性があります。" + } + } + }, "ollama": { "keep_alive_time": { "description": "モデルがメモリに保持される時間(デフォルト:5分)", @@ -3498,6 +3518,20 @@ }, "title": "設定", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "該提供者は存在しません" + }, + "tesseract": { + "langs": "サポートされている言語", + "temp_tooltip": "現在のところ、中国語と英語のみをサポートしています" + }, + "title": "画像" + }, + "image_provider": "OCRサービスプロバイダー", + "title": "OCRサービス" + }, "preprocess": { "provider": "プレプロセスプロバイダー", "provider_placeholder": "前処理プロバイダーを選択してください", diff --git a/src/renderer/src/i18n/locales/ru-ru.json b/src/renderer/src/i18n/locales/ru-ru.json index 21251f332d..e5a7323bcc 100644 --- a/src/renderer/src/i18n/locales/ru-ru.json +++ b/src/renderer/src/i18n/locales/ru-ru.json @@ -1574,6 +1574,26 @@ }, "tip": "Если ответ успешен, уведомление выдается только по сообщениям, превышающим 30 секунд" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "Не удается удалить встроенного поставщика", + "existing": "Поставщик уже существует", + "not_found": "Поставщик OCR отсутствует", + "update_failed": "Обновление конфигурации не удалось" + }, + "unknown": "Произошла ошибка в процессе распознавания текста" + }, + "file": { + "not_supported": "Неподдерживаемый тип файла {{type}}" + }, + "processing": "Обработка OCR...", + "warning": { + "provider": { + "fallback": "Возвращено к {{name}}, это может вызвать проблемы" + } + } + }, "ollama": { "keep_alive_time": { "description": "Время в минутах, в течение которого модель остается активной, по умолчанию 5 минут.", @@ -3498,6 +3518,20 @@ }, "title": "Настройки", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "Поставщик не существует" + }, + "tesseract": { + "langs": "Поддерживаемые языки", + "temp_tooltip": "На данный момент поддерживаются только китайский и английский языки" + }, + "title": "Изображение" + }, + "image_provider": "Поставщик услуг OCR", + "title": "OCR-сервис" + }, "preprocess": { "provider": "Поставщик обработки документов", "provider_placeholder": "Выберите поставщика услуг обработки документов", diff --git a/src/renderer/src/i18n/locales/zh-cn.json b/src/renderer/src/i18n/locales/zh-cn.json index 4307fb1208..4ba42ba646 100644 --- a/src/renderer/src/i18n/locales/zh-cn.json +++ b/src/renderer/src/i18n/locales/zh-cn.json @@ -1574,6 +1574,26 @@ }, "tip": "如果响应成功,则只针对超过30秒的消息进行提醒" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "不能删除内置提供商", + "existing": "提供商已存在", + "not_found": "OCR 提供商不存在", + "update_failed": "更新配置失败" + }, + "unknown": "OCR 过程发生错误" + }, + "file": { + "not_supported": "不支持的文件类型 {{type}}" + }, + "processing": "OCR 处理中...", + "warning": { + "provider": { + "fallback": "已回退到 {{name}},这可能导致问题" + } + } + }, "ollama": { "keep_alive_time": { "description": "对话后模型在内存中保持的时间(默认:5 分钟)", @@ -3498,6 +3518,20 @@ }, "title": "设置", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "该提供商不存在" + }, + "tesseract": { + "langs": "支持的语言", + "temp_tooltip": "目前暂时只支持中文和英文" + }, + "title": "图片" + }, + "image_provider": "OCR 服务提供商", + "title": "OCR 服务" + }, "preprocess": { "provider": "文档处理服务商", "provider_placeholder": "选择一个文档处理服务商", diff --git a/src/renderer/src/i18n/locales/zh-tw.json b/src/renderer/src/i18n/locales/zh-tw.json index e9a41e2813..6d25a814b0 100644 --- a/src/renderer/src/i18n/locales/zh-tw.json +++ b/src/renderer/src/i18n/locales/zh-tw.json @@ -1574,6 +1574,26 @@ }, "tip": "如果回應成功,則只針對超過30秒的訊息發出提醒" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "不能刪除內建提供者", + "existing": "提供商已存在", + "not_found": "OCR 提供商不存在", + "update_failed": "更新配置失敗" + }, + "unknown": "OCR過程發生錯誤" + }, + "file": { + "not_supported": "不支持的文件類型 {{type}}" + }, + "processing": "OCR 處理中...", + "warning": { + "provider": { + "fallback": "已回退到 {{name}},這可能導致問題" + } + } + }, "ollama": { "keep_alive_time": { "description": "對話後模型在記憶體中保持的時間(預設為 5 分鐘)", @@ -3498,6 +3518,20 @@ }, "title": "設定", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "該提供商不存在" + }, + "tesseract": { + "langs": "支援的語言", + "temp_tooltip": "目前暫時只支援中文和英文" + }, + "title": "圖片" + }, + "image_provider": "OCR 服務提供商", + "title": "OCR 服務" + }, "preprocess": { "provider": "文件處理供應商", "provider_placeholder": "選擇一個文件處理供應商", diff --git a/src/renderer/src/i18n/translate/el-gr.json b/src/renderer/src/i18n/translate/el-gr.json index b0c96f2aa7..43bdc945ac 100644 --- a/src/renderer/src/i18n/translate/el-gr.json +++ b/src/renderer/src/i18n/translate/el-gr.json @@ -1574,6 +1574,26 @@ }, "tip": "Εάν η απάντηση είναι επιτυχής, η ειδοποίηση εμφανίζεται μόνο για μηνύματα που υπερβαίνουν τα 30 δευτερόλεπτα" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "Δεν είναι δυνατή η διαγραφή του ενσωματωμένου παρόχου", + "existing": "Ο πάροχος υπηρεσιών υπάρχει ήδη", + "not_found": "Ο πάροχος OCR δεν υπάρχει", + "update_failed": "Αποτυχία ενημέρωσης της διαμόρφωσης" + }, + "unknown": "Η διαδικασία OCR εμφάνισε σφάλμα" + }, + "file": { + "not_supported": "Μη υποστηριζόμενος τύπος αρχείου {{type}}" + }, + "processing": "Η επεξεργασία OCR βρίσκεται σε εξέλιξη...", + "warning": { + "provider": { + "fallback": "Επαναφέρθηκε στο {{name}}, το οποίο μπορεί να προκαλέσει προβλήματα" + } + } + }, "ollama": { "keep_alive_time": { "description": "Χρόνος που ο μοντέλος διατηρείται στη μνήμη μετά τη συζήτηση (προεπιλογή: 5 λεπτά)", @@ -3498,6 +3518,20 @@ }, "title": "Ρυθμίσεις", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "Ο πάροχος δεν υπάρχει" + }, + "tesseract": { + "langs": "Υποστηριζόμενες γλώσσες", + "temp_tooltip": "Προς το παρόν υποστηρίζονται μόνο η κινεζική και η αγγλική γλώσσα" + }, + "title": "Εικόνα" + }, + "image_provider": "Πάροχοι υπηρεσιών OCR", + "title": "Υπηρεσία OCR" + }, "preprocess": { "provider": "πάροχος υπηρεσιών προεπεξεργασίας εγγράφων", "provider_placeholder": "Επιλέξτε έναν πάροχο υπηρεσιών προεπεξεργασίας εγγράφων", diff --git a/src/renderer/src/i18n/translate/es-es.json b/src/renderer/src/i18n/translate/es-es.json index efd6643820..e0d86e7a37 100644 --- a/src/renderer/src/i18n/translate/es-es.json +++ b/src/renderer/src/i18n/translate/es-es.json @@ -1574,6 +1574,26 @@ }, "tip": "Si la respuesta es exitosa, solo se enviará un recordatorio para mensajes que excedan los 30 segundos" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "No se puede eliminar el proveedor integrado", + "existing": "El proveedor ya existe", + "not_found": "El proveedor de OCR no existe", + "update_failed": "Actualización de la configuración fallida" + }, + "unknown": "El proceso OCR ha fallado" + }, + "file": { + "not_supported": "Tipo de archivo no compatible {{type}}" + }, + "processing": "Procesando OCR...", + "warning": { + "provider": { + "fallback": "Se ha revertido a {{name}}, lo que podría causar problemas" + } + } + }, "ollama": { "keep_alive_time": { "description": "Tiempo que el modelo permanece en memoria después de la conversación (por defecto: 5 minutos)", @@ -3498,6 +3518,20 @@ }, "title": "Configuración", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "El proveedor no existe" + }, + "tesseract": { + "langs": "Idiomas compatibles", + "temp_tooltip": "Actualmente solo se admiten chino e inglés." + }, + "title": "Imagen" + }, + "image_provider": "Proveedor de servicios OCR", + "title": "Servicio OCR" + }, "preprocess": { "provider": "Proveedor de servicios de preprocesamiento de documentos", "provider_placeholder": "Seleccionar un proveedor de servicios de preprocesamiento de documentos", diff --git a/src/renderer/src/i18n/translate/fr-fr.json b/src/renderer/src/i18n/translate/fr-fr.json index 37008d5c4f..646e2b28a4 100644 --- a/src/renderer/src/i18n/translate/fr-fr.json +++ b/src/renderer/src/i18n/translate/fr-fr.json @@ -1574,6 +1574,26 @@ }, "tip": "Si la réponse est réussie, un rappel est envoyé uniquement pour les messages dépassant 30 secondes" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "Impossible de supprimer le fournisseur intégré", + "existing": "Le fournisseur existe déjà", + "not_found": "Le fournisseur OCR n'existe pas", + "update_failed": "Échec de la mise à jour de la configuration" + }, + "unknown": "Une erreur s'est produite lors du processus OCR" + }, + "file": { + "not_supported": "Type de fichier non pris en charge {{type}}" + }, + "processing": "Traitement OCR en cours...", + "warning": { + "provider": { + "fallback": "Revenu à {{name}}, ce qui pourrait entraîner des problèmes" + } + } + }, "ollama": { "keep_alive_time": { "description": "Le temps pendant lequel le modèle reste en mémoire après la conversation (par défaut : 5 minutes)", @@ -3498,6 +3518,20 @@ }, "title": "Paramètres", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "Ce fournisseur n'existe pas" + }, + "tesseract": { + "langs": "Langues prises en charge", + "temp_tooltip": "Pour le moment, seuls le chinois et l'anglais sont pris en charge." + }, + "title": "Image" + }, + "image_provider": "Fournisseur de service OCR", + "title": "Service OCR" + }, "preprocess": { "provider": "fournisseur de services de prétraitement de documents", "provider_placeholder": "Choisissez un prestataire de traitement de documents", diff --git a/src/renderer/src/i18n/translate/pt-pt.json b/src/renderer/src/i18n/translate/pt-pt.json index f245c4a463..e1828408f0 100644 --- a/src/renderer/src/i18n/translate/pt-pt.json +++ b/src/renderer/src/i18n/translate/pt-pt.json @@ -1574,6 +1574,26 @@ }, "tip": "Se a resposta for bem-sucedida, lembrete apenas para mensagens que excedam 30 segundos" }, + "ocr": { + "error": { + "provider": { + "cannot_remove_builtin": "Não é possível excluir o provedor integrado", + "existing": "O provedor já existe", + "not_found": "O provedor OCR não existe", + "update_failed": "Falha ao atualizar a configuração" + }, + "unknown": "O processo OCR apresentou um erro" + }, + "file": { + "not_supported": "Tipo de arquivo não suportado {{type}}" + }, + "processing": "Processamento OCR em andamento...", + "warning": { + "provider": { + "fallback": "Revertido para {{name}}, o que pode causar problemas" + } + } + }, "ollama": { "keep_alive_time": { "description": "Tempo que o modelo permanece na memória após a conversa (padrão: 5 minutos)", @@ -3498,6 +3518,20 @@ }, "title": "Configurações", "tool": { + "ocr": { + "image": { + "error": { + "provider_not_found": "O provedor não existe" + }, + "tesseract": { + "langs": "Idiomas suportados", + "temp_tooltip": "No momento, apenas chinês e inglês são suportados." + }, + "title": "Imagem" + }, + "image_provider": "Provedor de serviços OCR", + "title": "Serviço OCR" + }, "preprocess": { "provider": "prestador de serviços de pré-processamento de documentos", "provider_placeholder": "Escolha um fornecedor de pré-processamento de documentos", diff --git a/src/renderer/src/pages/settings/DocProcessSettings/OcrImageSettings.tsx b/src/renderer/src/pages/settings/DocProcessSettings/OcrImageSettings.tsx new file mode 100644 index 0000000000..3efdf94fa0 --- /dev/null +++ b/src/renderer/src/pages/settings/DocProcessSettings/OcrImageSettings.tsx @@ -0,0 +1,62 @@ +import { loggerService } from '@logger' +import { useAppSelector } from '@renderer/store' +import { setImageOcrProvider } from '@renderer/store/ocr' +import { isImageOcrProvider, OcrProvider } from '@renderer/types' +import { Select } from 'antd' +import { useEffect } from 'react' +import { useTranslation } from 'react-i18next' +import { useDispatch } from 'react-redux' + +import { SettingRow, SettingRowTitle } from '..' + +const logger = loggerService.withContext('OcrImageSettings') + +type Props = { + setProvider: (provider: OcrProvider) => void +} + +const OcrImageSettings = ({ setProvider }: Props) => { + const { t } = useTranslation() + const providers = useAppSelector((state) => state.ocr.providers) + const imageProvider = useAppSelector((state) => state.ocr.imageProvider) + const imageProviders = providers.filter((p) => isImageOcrProvider(p)) + const dispatch = useDispatch() + + // 挂载时更新外部状态 + useEffect(() => { + setProvider(imageProvider) + }, [imageProvider, setProvider]) + + const updateImageProvider = (id: string) => { + const provider = imageProviders.find((p) => p.id === id) + if (!provider) { + logger.error(`Failed to find image provider by id: ${id}`) + window.message.error(t('settings.tool.ocr.image.error.provider_not_found')) + return + } + + setProvider(provider) + dispatch(setImageOcrProvider(provider)) + } + + return ( + <> + + {t('settings.tool.ocr.image_provider')} +

+ +
+ + + ) +} diff --git a/src/renderer/src/pages/settings/PreprocessSettings/PreprocessSettings.tsx b/src/renderer/src/pages/settings/DocProcessSettings/PreprocessProviderSettings.tsx similarity index 100% rename from src/renderer/src/pages/settings/PreprocessSettings/PreprocessSettings.tsx rename to src/renderer/src/pages/settings/DocProcessSettings/PreprocessProviderSettings.tsx diff --git a/src/renderer/src/pages/settings/PreprocessSettings/index.tsx b/src/renderer/src/pages/settings/DocProcessSettings/PreprocessSettings.tsx similarity index 90% rename from src/renderer/src/pages/settings/PreprocessSettings/index.tsx rename to src/renderer/src/pages/settings/DocProcessSettings/PreprocessSettings.tsx index f80c0cd679..a09265a637 100644 --- a/src/renderer/src/pages/settings/PreprocessSettings/index.tsx +++ b/src/renderer/src/pages/settings/DocProcessSettings/PreprocessSettings.tsx @@ -5,8 +5,8 @@ import { Select } from 'antd' import { FC, useState } from 'react' import { useTranslation } from 'react-i18next' -import { SettingContainer, SettingDivider, SettingGroup, SettingRow, SettingRowTitle, SettingTitle } from '..' -import PreprocessProviderSettings from './PreprocessSettings' +import { SettingDivider, SettingGroup, SettingRow, SettingRowTitle, SettingTitle } from '..' +import PreprocessProviderSettings from './PreprocessProviderSettings' const PreprocessSettings: FC = () => { const { preprocessProviders } = usePreprocessProviders() @@ -25,7 +25,7 @@ const PreprocessSettings: FC = () => { } return ( - + <> {t('settings.tool.preprocess.title')} @@ -52,7 +52,7 @@ const PreprocessSettings: FC = () => { )} - + ) } export default PreprocessSettings diff --git a/src/renderer/src/pages/settings/DocProcessSettings/index.tsx b/src/renderer/src/pages/settings/DocProcessSettings/index.tsx new file mode 100644 index 0000000000..526f507fff --- /dev/null +++ b/src/renderer/src/pages/settings/DocProcessSettings/index.tsx @@ -0,0 +1,18 @@ +import { useTheme } from '@renderer/context/ThemeProvider' +import { FC } from 'react' + +import { SettingContainer } from '..' +import OcrSettings from './OcrSettings' +import PreprocessSettings from './PreprocessSettings' + +const DocProcessSettings: FC = () => { + const { theme: themeMode } = useTheme() + + return ( + + + + + ) +} +export default DocProcessSettings diff --git a/src/renderer/src/pages/settings/SettingsPage.tsx b/src/renderer/src/pages/settings/SettingsPage.tsx index 3a72865d63..b8666a8f7d 100644 --- a/src/renderer/src/pages/settings/SettingsPage.tsx +++ b/src/renderer/src/pages/settings/SettingsPage.tsx @@ -26,10 +26,10 @@ import styled from 'styled-components' import AboutSettings from './AboutSettings' import DataSettings from './DataSettings/DataSettings' import DisplaySettings from './DisplaySettings/DisplaySettings' +import DocProcessSettings from './DocProcessSettings' import GeneralSettings from './GeneralSettings' import MCPSettings from './MCPSettings' import MemorySettings from './MemorySettings' -import PreprocessSettings from './PreprocessSettings' import ProvidersList from './ProviderSettings' import QuickAssistantSettings from './QuickAssistantSettings' import QuickPhraseSettings from './QuickPhraseSettings' @@ -100,8 +100,8 @@ const SettingsPage: FC = () => { {t('memory.title')} - - + + {t('settings.tool.preprocess.title')} @@ -144,7 +144,7 @@ const SettingsPage: FC = () => { } /> } /> } /> - } /> + } /> } /> } /> } /> diff --git a/src/renderer/src/services/ocr/OcrService.ts b/src/renderer/src/services/ocr/OcrService.ts new file mode 100644 index 0000000000..3d8339f6e3 --- /dev/null +++ b/src/renderer/src/services/ocr/OcrService.ts @@ -0,0 +1,23 @@ +import { loggerService } from '@logger' +import { isOcrApiProvider, OcrProvider, OcrResult, SupportedOcrFile } from '@renderer/types' + +import { OcrApiClientFactory } from './clients/OcrApiClientFactory' + +const logger = loggerService.withContext('renderer:OcrService') + +/** + * ocr a file + * @param file any supported file + * @param provider ocr provider + * @returns ocr result + * @throws {Error} + */ +export const ocr = async (file: SupportedOcrFile, provider: OcrProvider): Promise => { + logger.info(`ocr file ${file.path}`) + if (isOcrApiProvider(provider)) { + const client = OcrApiClientFactory.create(provider) + return client.ocr(file) + } else { + return window.api.ocr.ocr(file, provider) + } +} diff --git a/src/renderer/src/services/ocr/clients/OcrApiClientFactory.ts b/src/renderer/src/services/ocr/clients/OcrApiClientFactory.ts new file mode 100644 index 0000000000..e685c0e3f9 --- /dev/null +++ b/src/renderer/src/services/ocr/clients/OcrApiClientFactory.ts @@ -0,0 +1,28 @@ +import { loggerService } from '@logger' +import { OcrApiProvider } from '@renderer/types' + +import { OcrBaseApiClient } from './OcrBaseApiClient' +import { OcrExampleApiClient } from './OcrExampleApiClient' + +const logger = loggerService.withContext('OcrApiClientFactory') + +export class OcrApiClientFactory { + /** + * Create an ApiClient instance for the given provider + * 为给定的提供者创建ApiClient实例 + */ + static create(provider: OcrApiProvider): OcrBaseApiClient { + logger.debug(`Creating ApiClient for provider:`, { + id: provider.id, + config: provider.config + }) + + let instance: OcrBaseApiClient + + // Extend other clients here + // eslint-disable-next-line prefer-const + instance = new OcrExampleApiClient(provider) + + return instance + } +} diff --git a/src/renderer/src/services/ocr/clients/OcrBaseApiClient.ts b/src/renderer/src/services/ocr/clients/OcrBaseApiClient.ts new file mode 100644 index 0000000000..c9605671ae --- /dev/null +++ b/src/renderer/src/services/ocr/clients/OcrBaseApiClient.ts @@ -0,0 +1,43 @@ +import { OcrApiProvider, OcrHandler } from '@renderer/types' + +export abstract class OcrBaseApiClient { + public provider: OcrApiProvider + protected host: string + protected apiKey: string + + constructor(provider: OcrApiProvider) { + this.provider = provider + this.host = this.getHost() + this.apiKey = this.getApiKey() + } + + abstract ocr: OcrHandler + + // copy from BaseApiClient + public getHost(): string { + return this.provider.config.api.apiHost + } + + // copy from BaseApiClient + public getApiKey() { + const keys = this.provider.config.api.apiKey.split(',').map((key) => key.trim()) + const keyName = `ocr_provider:${this.provider.id}:last_used_key` + + if (keys.length === 1) { + return keys[0] + } + + const lastUsedKey = window.keyv.get(keyName) + if (!lastUsedKey) { + window.keyv.set(keyName, keys[0]) + return keys[0] + } + + const currentIndex = keys.indexOf(lastUsedKey) + const nextIndex = (currentIndex + 1) % keys.length + const nextKey = keys[nextIndex] + window.keyv.set(keyName, nextKey) + + return nextKey + } +} diff --git a/src/renderer/src/services/ocr/clients/OcrExampleApiClient.ts b/src/renderer/src/services/ocr/clients/OcrExampleApiClient.ts new file mode 100644 index 0000000000..34d28173bb --- /dev/null +++ b/src/renderer/src/services/ocr/clients/OcrExampleApiClient.ts @@ -0,0 +1,15 @@ +import { OcrApiProvider, SupportedOcrFile } from '@renderer/types' + +import { OcrBaseApiClient } from './OcrBaseApiClient' + +export type OcrExampleProvider = OcrApiProvider + +export class OcrExampleApiClient extends OcrBaseApiClient { + constructor(provider: OcrApiProvider) { + super(provider) + } + + public ocr = async (file: SupportedOcrFile) => { + return { text: `Example output: ${file.path}` } + } +} diff --git a/src/renderer/src/store/index.ts b/src/renderer/src/store/index.ts index d90ee7282d..cdd3be560d 100644 --- a/src/renderer/src/store/index.ts +++ b/src/renderer/src/store/index.ts @@ -20,6 +20,7 @@ import migrate from './migrate' import minapps from './minapps' import newMessagesReducer from './newMessage' import nutstore from './nutstore' +import ocr from './ocr' import paintings from './paintings' import preprocess from './preprocess' import runtime from './runtime' @@ -55,14 +56,15 @@ const rootReducer = combineReducers({ messages: newMessagesReducer, messageBlocks: messageBlocksReducer, inputTools: inputToolsReducer, - translate + translate, + ocr }) const persistedReducer = persistReducer( { key: 'cherry-studio', storage, - version: 136, + version: 137, blacklist: ['runtime', 'messages', 'messageBlocks', 'tabs'], migrate }, diff --git a/src/renderer/src/store/migrate.ts b/src/renderer/src/store/migrate.ts index ee677ead39..f1f170eafb 100644 --- a/src/renderer/src/store/migrate.ts +++ b/src/renderer/src/store/migrate.ts @@ -3,6 +3,7 @@ import { nanoid } from '@reduxjs/toolkit' import { DEFAULT_CONTEXTCOUNT, DEFAULT_TEMPERATURE, isMac } from '@renderer/config/constant' import { DEFAULT_MIN_APPS } from '@renderer/config/minapps' import { isFunctionCallingModel, isNotSupportedTextDelta, SYSTEM_MODELS } from '@renderer/config/models' +import { BUILTIN_OCR_PROVIDERS, DEFAULT_OCR_PROVIDER } from '@renderer/config/ocr' import { TRANSLATE_PROMPT } from '@renderer/config/prompts' import { isSupportArrayContentProvider, @@ -2174,6 +2175,18 @@ const migrateConfig = { logger.error('migrate 136 error', error as Error) return state } + }, + '137': (state: RootState) => { + try { + state.ocr = { + providers: BUILTIN_OCR_PROVIDERS, + imageProvider: DEFAULT_OCR_PROVIDER.image + } + return state + } catch (error) { + logger.error('migrate 137 error', error as Error) + return state + } } } diff --git a/src/renderer/src/store/ocr.ts b/src/renderer/src/store/ocr.ts new file mode 100644 index 0000000000..7e4ba3d348 --- /dev/null +++ b/src/renderer/src/store/ocr.ts @@ -0,0 +1,61 @@ +import { createSlice, PayloadAction } from '@reduxjs/toolkit' +import { BUILTIN_OCR_PROVIDERS, DEFAULT_OCR_PROVIDER } from '@renderer/config/ocr' +import { ImageOcrProvider, OcrProvider, OcrProviderConfig } from '@renderer/types' + +export interface OcrState { + providers: OcrProvider[] + imageProvider: ImageOcrProvider +} + +const initialState: OcrState = { + providers: BUILTIN_OCR_PROVIDERS, + imageProvider: DEFAULT_OCR_PROVIDER.image +} + +const ocrSlice = createSlice({ + name: 'ocr', + initialState, + reducers: { + setOcrProviders(state, action: PayloadAction) { + state.providers = action.payload + }, + addOcrProvider(state, action: PayloadAction) { + state.providers.push(action.payload) + }, + removeOcrProvider(state, action: PayloadAction) { + state.providers = state.providers.filter((provider) => provider.id !== action.payload) + }, + updateOcrProvider(state, action: PayloadAction>) { + const index = state.providers.findIndex((provider) => provider.id === action.payload.id) + if (index !== -1) { + Object.assign(state.providers[index], action.payload) + } + }, + updateOcrProviderConfig( + state, + action: PayloadAction<{ id: string; update: Omit, 'id'> }> + ) { + const index = state.providers.findIndex((provider) => provider.id === action.payload.id) + if (index !== -1) { + if (!state.providers[index].config) { + state.providers[index].config = {} + } + Object.assign(state.providers[index].config, action.payload.update) + } + }, + setImageOcrProvider(state, action: PayloadAction) { + state.imageProvider = action.payload + } + } +}) + +export const { + setOcrProviders, + addOcrProvider, + removeOcrProvider, + updateOcrProvider, + updateOcrProviderConfig, + setImageOcrProvider +} = ocrSlice.actions + +export default ocrSlice.reducer diff --git a/src/renderer/src/types/file.ts b/src/renderer/src/types/file.ts index db998c60d6..db5c51e5b3 100644 --- a/src/renderer/src/types/file.ts +++ b/src/renderer/src/types/file.ts @@ -100,3 +100,16 @@ export enum FileTypes { DOCUMENT = 'document', OTHER = 'other' } + +export type ImageFileMetadata = FileMetadata & { + type: FileTypes.IMAGE +} + +/** + * 类型守卫函数,用于检查一个 FileMetadata 是否为图片文件元数据 + * @param file - 要检查的文件元数据 + * @returns 如果文件是图片类型则返回 true + */ +export const isImageFile = (file: FileMetadata): file is ImageFileMetadata => { + return file.type === FileTypes.IMAGE +} diff --git a/src/renderer/src/types/index.ts b/src/renderer/src/types/index.ts index edb81bd969..ee35d7202f 100644 --- a/src/renderer/src/types/index.ts +++ b/src/renderer/src/types/index.ts @@ -9,6 +9,8 @@ export * from './file' import type { FileMetadata } from './file' import type { Message } from './newMessage' +export * from './ocr' + export type Assistant = { id: string name: string diff --git a/src/renderer/src/types/ocr.ts b/src/renderer/src/types/ocr.ts new file mode 100644 index 0000000000..c537191318 --- /dev/null +++ b/src/renderer/src/types/ocr.ts @@ -0,0 +1,142 @@ +import Tesseract from 'tesseract.js' + +import { FileMetadata, ImageFileMetadata, isImageFile } from '.' + +export const BuiltinOcrProviderIds = { + tesseract: 'tesseract' +} as const + +export type BuiltinOcrProviderId = keyof typeof BuiltinOcrProviderIds + +export const isBuiltinOcrProviderId = (id: string): id is BuiltinOcrProviderId => { + return Object.hasOwn(BuiltinOcrProviderIds, id) +} + +// extensible +export const OcrProviderCapabilities = { + image: 'image' +} as const + +export type OcrProviderCapability = keyof typeof OcrProviderCapabilities + +export const isOcrProviderCapability = (cap: string): cap is OcrProviderCapability => { + return Object.hasOwn(OcrProviderCapabilities, cap) +} + +export type OcrProviderCapabilityRecord = Partial> + +// OCR models and providers share the same type definition. +// A provider can offer capabilities to process multiple file types, +// while a model belonging to that provider may be limited to processing only one specific file type. +export type OcrModelCapabilityRecord = OcrProviderCapabilityRecord + +export interface OcrModel { + id: string + name: string + providerId: string + capabilities: OcrModelCapabilityRecord +} + +/** + * Extend this type to define provider-specefic config types. + */ +export type OcrProviderApiConfig = { + apiKey: string + apiHost: string + apiVersion?: string +} + +export const isOcrProviderApiConfig = (config: unknown): config is OcrProviderApiConfig => { + return ( + typeof config === 'object' && + config !== null && + 'apiKey' in config && + typeof config.apiKey === 'string' && + 'apiHost' in config && + typeof config.apiHost === 'string' && + (!('apiVersion' in config) || typeof config.apiVersion === 'string') + ) +} + +/** + * For future. Model based ocr, api based ocr. May different api client. + * + * Extend this type to define provider-specific config types. + */ +export type OcrProviderConfig = { + /** Not used for now. Could safely remove. */ + api?: OcrProviderApiConfig + /** Not used for now. Could safely remove. */ + models?: OcrModel[] + /** Not used for now. Could safely remove. */ + enabled?: boolean +} + +export type OcrProvider = { + id: string + name: string + capabilities: OcrProviderCapabilityRecord + config?: OcrProviderConfig +} + +export type OcrApiProvider = OcrProvider & { + config: OcrProviderConfig & { + api: OcrProviderApiConfig + } +} + +export const isOcrApiProvider = (p: OcrProvider): p is OcrApiProvider => { + return !!(p.config && p.config.api && isOcrProviderApiConfig(p.config.api)) +} + +export type BuiltinOcrProvider = OcrProvider & { + id: BuiltinOcrProviderId +} + +export const isBuiltinOcrProvider = (p: OcrProvider): p is BuiltinOcrProvider => { + return isBuiltinOcrProviderId(p.id) +} + +// Not sure compatiable api endpoint exists. May not support custom ocr provider +export type CustomOcrProvider = OcrProvider & { + id: Exclude +} + +export type ImageOcrProvider = OcrProvider & { + capabilities: OcrProviderCapabilityRecord & { + [OcrProviderCapabilities.image]: true + } +} + +export const isImageOcrProvider = (p: OcrProvider): p is ImageOcrProvider => { + return p.capabilities.image === true +} + +export type SupportedOcrFile = ImageFileMetadata + +export const isSupportedOcrFile = (file: FileMetadata): file is SupportedOcrFile => { + return isImageFile(file) +} + +export type OcrResult = { + text: string +} + +export type OcrHandler = (file: SupportedOcrFile) => Promise + +export type OcrImageHandler = (file: ImageFileMetadata) => Promise + +// Tesseract Types +export type OcrTesseractConfig = OcrProviderConfig & { + langs: Partial> +} + +export type OcrTesseractProvider = BuiltinOcrProvider & { + config: OcrTesseractConfig +} + +export const isOcrTesseractProvider = (p: OcrProvider): p is OcrTesseractProvider => { + return p.id === BuiltinOcrProviderIds.tesseract +} + +export type TesseractLangCode = Tesseract.LanguageCode diff --git a/src/renderer/src/utils/ocr.ts b/src/renderer/src/utils/ocr.ts new file mode 100644 index 0000000000..1c4e6628d3 --- /dev/null +++ b/src/renderer/src/utils/ocr.ts @@ -0,0 +1,12 @@ +import TesseractLogo from '@renderer/assets/images/providers/Tesseract.js.png' +import { isBuiltinOcrProviderId } from '@renderer/types' + +export function getOcrProviderLogo(providerId: string) { + if (isBuiltinOcrProviderId(providerId)) { + switch (providerId) { + case 'tesseract': + return TesseractLogo + } + } + return undefined +} diff --git a/yarn.lock b/yarn.lock index 6c1b8a4bb7..db442ef6f1 100644 --- a/yarn.lock +++ b/yarn.lock @@ -2953,7 +2953,7 @@ __metadata: languageName: node linkType: hard -"@emnapi/runtime@npm:^1.4.5": +"@emnapi/runtime@npm:^1.4.4, @emnapi/runtime@npm:^1.4.5": version: 1.4.5 resolution: "@emnapi/runtime@npm:1.4.5" dependencies: @@ -3524,6 +3524,207 @@ __metadata: languageName: node linkType: hard +"@img/sharp-darwin-arm64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-darwin-arm64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-darwin-arm64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-darwin-arm64": + optional: true + conditions: os=darwin & cpu=arm64 + languageName: node + linkType: hard + +"@img/sharp-darwin-x64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-darwin-x64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-darwin-x64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-darwin-x64": + optional: true + conditions: os=darwin & cpu=x64 + languageName: node + linkType: hard + +"@img/sharp-libvips-darwin-arm64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-darwin-arm64@npm:1.2.0" + conditions: os=darwin & cpu=arm64 + languageName: node + linkType: hard + +"@img/sharp-libvips-darwin-x64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-darwin-x64@npm:1.2.0" + conditions: os=darwin & cpu=x64 + languageName: node + linkType: hard + +"@img/sharp-libvips-linux-arm64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linux-arm64@npm:1.2.0" + conditions: os=linux & cpu=arm64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-libvips-linux-arm@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linux-arm@npm:1.2.0" + conditions: os=linux & cpu=arm & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-libvips-linux-ppc64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linux-ppc64@npm:1.2.0" + conditions: os=linux & cpu=ppc64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-libvips-linux-s390x@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linux-s390x@npm:1.2.0" + conditions: os=linux & cpu=s390x & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-libvips-linux-x64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linux-x64@npm:1.2.0" + conditions: os=linux & cpu=x64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-libvips-linuxmusl-arm64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linuxmusl-arm64@npm:1.2.0" + conditions: os=linux & cpu=arm64 & libc=musl + languageName: node + linkType: hard + +"@img/sharp-libvips-linuxmusl-x64@npm:1.2.0": + version: 1.2.0 + resolution: "@img/sharp-libvips-linuxmusl-x64@npm:1.2.0" + conditions: os=linux & cpu=x64 & libc=musl + languageName: node + linkType: hard + +"@img/sharp-linux-arm64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linux-arm64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linux-arm64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linux-arm64": + optional: true + conditions: os=linux & cpu=arm64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-linux-arm@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linux-arm@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linux-arm": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linux-arm": + optional: true + conditions: os=linux & cpu=arm & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-linux-ppc64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linux-ppc64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linux-ppc64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linux-ppc64": + optional: true + conditions: os=linux & cpu=ppc64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-linux-s390x@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linux-s390x@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linux-s390x": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linux-s390x": + optional: true + conditions: os=linux & cpu=s390x & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-linux-x64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linux-x64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linux-x64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linux-x64": + optional: true + conditions: os=linux & cpu=x64 & libc=glibc + languageName: node + linkType: hard + +"@img/sharp-linuxmusl-arm64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linuxmusl-arm64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linuxmusl-arm64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linuxmusl-arm64": + optional: true + conditions: os=linux & cpu=arm64 & libc=musl + languageName: node + linkType: hard + +"@img/sharp-linuxmusl-x64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-linuxmusl-x64@npm:0.34.3" + dependencies: + "@img/sharp-libvips-linuxmusl-x64": "npm:1.2.0" + dependenciesMeta: + "@img/sharp-libvips-linuxmusl-x64": + optional: true + conditions: os=linux & cpu=x64 & libc=musl + languageName: node + linkType: hard + +"@img/sharp-wasm32@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-wasm32@npm:0.34.3" + dependencies: + "@emnapi/runtime": "npm:^1.4.4" + conditions: cpu=wasm32 + languageName: node + linkType: hard + +"@img/sharp-win32-arm64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-win32-arm64@npm:0.34.3" + conditions: os=win32 & cpu=arm64 + languageName: node + linkType: hard + +"@img/sharp-win32-ia32@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-win32-ia32@npm:0.34.3" + conditions: os=win32 & cpu=ia32 + languageName: node + linkType: hard + +"@img/sharp-win32-x64@npm:0.34.3": + version: 0.34.3 + resolution: "@img/sharp-win32-x64@npm:0.34.3" + conditions: os=win32 & cpu=x64 + languageName: node + linkType: hard + "@isaacs/cliui@npm:^8.0.2": version: 8.0.2 resolution: "@isaacs/cliui@npm:8.0.2" @@ -8631,11 +8832,13 @@ __metadata: rollup-plugin-visualizer: "npm:^5.12.0" sass: "npm:^1.88.0" selection-hook: "npm:^1.0.11" + sharp: "npm:^0.34.3" shiki: "npm:^3.9.1" strict-url-sanitise: "npm:^0.0.1" string-width: "npm:^7.2.0" styled-components: "npm:^6.1.11" tar: "npm:^7.4.3" + tesseract.js: "patch:tesseract.js@npm%3A6.0.1#~/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch" tiny-pinyin: "npm:^1.3.2" tokenx: "npm:^1.1.0" tsx: "npm:^4.20.3" @@ -9371,6 +9574,13 @@ __metadata: languageName: node linkType: hard +"bmp-js@npm:^0.1.0": + version: 0.1.0 + resolution: "bmp-js@npm:0.1.0" + checksum: 10c0/c651bd5936dcf8d67900050fac14dcbe30baf87c3d21c58f4934fcdf46172e152a87d8c0c3ca25caa2b4b2c7780ef3b5fcc6cd20afd8f0351856cadb1bef9694 + languageName: node + linkType: hard + "body-parser@npm:^2.2.0": version: 2.2.0 resolution: "body-parser@npm:2.2.0" @@ -10139,7 +10349,7 @@ __metadata: languageName: node linkType: hard -"color-string@npm:^1.6.0": +"color-string@npm:^1.6.0, color-string@npm:^1.9.0": version: 1.9.1 resolution: "color-string@npm:1.9.1" dependencies: @@ -10168,6 +10378,16 @@ __metadata: languageName: node linkType: hard +"color@npm:^4.2.3": + version: 4.2.3 + resolution: "color@npm:4.2.3" + dependencies: + color-convert: "npm:^2.0.1" + color-string: "npm:^1.9.0" + checksum: 10c0/7fbe7cfb811054c808349de19fb380252e5e34e61d7d168ec3353e9e9aacb1802674bddc657682e4e9730c2786592a4de6f8283e7e0d3870b829bb0b7b2f6118 + languageName: node + linkType: hard + "color@npm:^5.0.0": version: 5.0.0 resolution: "color@npm:5.0.0" @@ -11280,7 +11500,7 @@ __metadata: languageName: node linkType: hard -"detect-libc@npm:^2.0.3": +"detect-libc@npm:^2.0.3, detect-libc@npm:^2.0.4": version: 2.0.4 resolution: "detect-libc@npm:2.0.4" checksum: 10c0/c15541f836eba4b1f521e4eecc28eefefdbc10a94d3b8cb4c507689f332cc111babb95deda66f2de050b22122113189986d5190be97d51b5a2b23b938415e67c @@ -14050,6 +14270,13 @@ __metadata: languageName: node linkType: hard +"idb-keyval@npm:^6.2.0": + version: 6.2.2 + resolution: "idb-keyval@npm:6.2.2" + checksum: 10c0/b52f0d2937cc2ec9f1da536b0b5c0875af3043ca210714beaffead4ec1f44f2ad322220305fd024596203855224d9e3523aed83e971dfb62ddc21b5b1721aeef + languageName: node + linkType: hard + "ieee754@npm:^1.1.13, ieee754@npm:^1.2.1": version: 1.2.1 resolution: "ieee754@npm:1.2.1" @@ -14441,6 +14668,13 @@ __metadata: languageName: node linkType: hard +"is-url@npm:^1.2.4": + version: 1.2.4 + resolution: "is-url@npm:1.2.4" + checksum: 10c0/0157a79874f8f95fdd63540e3f38c8583c2ef572661cd0693cda80ae3e42dfe8e9a4a972ec1b827f861d9a9acf75b37f7d58a37f94a8a053259642912c252bc3 + languageName: node + linkType: hard + "is-wsl@npm:^2.2.0": version: 2.2.0 resolution: "is-wsl@npm:2.2.0" @@ -17550,6 +17784,15 @@ __metadata: languageName: node linkType: hard +"opencollective-postinstall@npm:^2.0.3": + version: 2.0.3 + resolution: "opencollective-postinstall@npm:2.0.3" + bin: + opencollective-postinstall: index.js + checksum: 10c0/8a0104a218bc1afaae943f0af378461eeb2836f9848bad872bbd067ec5d1d9791636f307454ab77d0746f10341366f295384656a340ebdb87a2585058e8567e5 + languageName: node + linkType: hard + "option@npm:~0.2.1": version: 0.2.4 resolution: "option@npm:0.2.4" @@ -19454,6 +19697,13 @@ __metadata: languageName: node linkType: hard +"regenerator-runtime@npm:^0.13.3": + version: 0.13.11 + resolution: "regenerator-runtime@npm:0.13.11" + checksum: 10c0/12b069dc774001fbb0014f6a28f11c09ebfe3c0d984d88c9bced77fdb6fedbacbca434d24da9ae9371bfbf23f754869307fb51a4c98a8b8b18e5ef748677ca24 + languageName: node + linkType: hard + "regex-recursion@npm:^6.0.2": version: 6.0.2 resolution: "regex-recursion@npm:6.0.2" @@ -20145,6 +20395,15 @@ __metadata: languageName: node linkType: hard +"semver@npm:^7.7.2": + version: 7.7.2 + resolution: "semver@npm:7.7.2" + bin: + semver: bin/semver.js + checksum: 10c0/aca305edfbf2383c22571cb7714f48cadc7ac95371b4b52362fb8eeffdfbc0de0669368b82b2b15978f8848f01d7114da65697e56cd8c37b0dab8c58e543f9ea + languageName: node + linkType: hard + "send@npm:^1.1.0, send@npm:^1.2.0": version: 1.2.0 resolution: "send@npm:1.2.0" @@ -20213,6 +20472,84 @@ __metadata: languageName: node linkType: hard +"sharp@npm:^0.34.3": + version: 0.34.3 + resolution: "sharp@npm:0.34.3" + dependencies: + "@img/sharp-darwin-arm64": "npm:0.34.3" + "@img/sharp-darwin-x64": "npm:0.34.3" + "@img/sharp-libvips-darwin-arm64": "npm:1.2.0" + "@img/sharp-libvips-darwin-x64": "npm:1.2.0" + "@img/sharp-libvips-linux-arm": "npm:1.2.0" + "@img/sharp-libvips-linux-arm64": "npm:1.2.0" + "@img/sharp-libvips-linux-ppc64": "npm:1.2.0" + "@img/sharp-libvips-linux-s390x": "npm:1.2.0" + "@img/sharp-libvips-linux-x64": "npm:1.2.0" + "@img/sharp-libvips-linuxmusl-arm64": "npm:1.2.0" + "@img/sharp-libvips-linuxmusl-x64": "npm:1.2.0" + "@img/sharp-linux-arm": "npm:0.34.3" + "@img/sharp-linux-arm64": "npm:0.34.3" + "@img/sharp-linux-ppc64": "npm:0.34.3" + "@img/sharp-linux-s390x": "npm:0.34.3" + "@img/sharp-linux-x64": "npm:0.34.3" + "@img/sharp-linuxmusl-arm64": "npm:0.34.3" + "@img/sharp-linuxmusl-x64": "npm:0.34.3" + "@img/sharp-wasm32": "npm:0.34.3" + "@img/sharp-win32-arm64": "npm:0.34.3" + "@img/sharp-win32-ia32": "npm:0.34.3" + "@img/sharp-win32-x64": "npm:0.34.3" + color: "npm:^4.2.3" + detect-libc: "npm:^2.0.4" + semver: "npm:^7.7.2" + dependenciesMeta: + "@img/sharp-darwin-arm64": + optional: true + "@img/sharp-darwin-x64": + optional: true + "@img/sharp-libvips-darwin-arm64": + optional: true + "@img/sharp-libvips-darwin-x64": + optional: true + "@img/sharp-libvips-linux-arm": + optional: true + "@img/sharp-libvips-linux-arm64": + optional: true + "@img/sharp-libvips-linux-ppc64": + optional: true + "@img/sharp-libvips-linux-s390x": + optional: true + "@img/sharp-libvips-linux-x64": + optional: true + "@img/sharp-libvips-linuxmusl-arm64": + optional: true + "@img/sharp-libvips-linuxmusl-x64": + optional: true + "@img/sharp-linux-arm": + optional: true + "@img/sharp-linux-arm64": + optional: true + "@img/sharp-linux-ppc64": + optional: true + "@img/sharp-linux-s390x": + optional: true + "@img/sharp-linux-x64": + optional: true + "@img/sharp-linuxmusl-arm64": + optional: true + "@img/sharp-linuxmusl-x64": + optional: true + "@img/sharp-wasm32": + optional: true + "@img/sharp-win32-arm64": + optional: true + "@img/sharp-win32-ia32": + optional: true + "@img/sharp-win32-x64": + optional: true + checksum: 10c0/df9e6645e3db6ed298a0ac956ba74e468c367fc038b547936fbdddc6a29fce9af40413acbef73b3716291530760f311a20e45c8983f20ee5ea69dd2f21464a2b + languageName: node + linkType: hard + "shebang-command@npm:^2.0.0": version: 2.0.0 resolution: "shebang-command@npm:2.0.0" @@ -21001,6 +21338,47 @@ __metadata: languageName: node linkType: hard +"tesseract.js-core@npm:^6.0.0": + version: 6.0.0 + resolution: "tesseract.js-core@npm:6.0.0" + checksum: 10c0/c04be8bbaa296be658664496754f21e857bdffff84113f08adf02f03a1f84596d68b3542ed2fda4a6dc138abb84b09b30ab07c04ee5950879e780876d343955f + languageName: node + linkType: hard + +"tesseract.js@npm:6.0.1": + version: 6.0.1 + resolution: "tesseract.js@npm:6.0.1" + dependencies: + bmp-js: "npm:^0.1.0" + idb-keyval: "npm:^6.2.0" + is-url: "npm:^1.2.4" + node-fetch: "npm:^2.6.9" + opencollective-postinstall: "npm:^2.0.3" + regenerator-runtime: "npm:^0.13.3" + tesseract.js-core: "npm:^6.0.0" + wasm-feature-detect: "npm:^1.2.11" + zlibjs: "npm:^0.3.1" + checksum: 10c0/1d73bb1fbc00c8629756d9594989d8bbfabda657a8cad84922ad68eb0f073148c82845bf71a882e5d2427a46edb5a470356864e60562c7a8442bddd70251435a + languageName: node + linkType: hard + +"tesseract.js@patch:tesseract.js@npm%3A6.0.1#~/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch": + version: 6.0.1 + resolution: "tesseract.js@patch:tesseract.js@npm%3A6.0.1#~/.yarn/patches/tesseract.js-npm-6.0.1-2562a7e46d.patch::version=6.0.1&hash=a9cf7b" + dependencies: + bmp-js: "npm:^0.1.0" + idb-keyval: "npm:^6.2.0" + is-url: "npm:^1.2.4" + node-fetch: "npm:^2.6.9" + opencollective-postinstall: "npm:^2.0.3" + regenerator-runtime: "npm:^0.13.3" + tesseract.js-core: "npm:^6.0.0" + wasm-feature-detect: "npm:^1.2.11" + zlibjs: "npm:^0.3.1" + checksum: 10c0/8a94fcc688ff21a9e82b721563d8fa174837ba807d0f01290fe9a1bb6a1c96ecaf7dc1c83510510f3d5185abd15f1cc5fc3cb7ad6c0eee0c4b3e278106f8a5da + languageName: node + linkType: hard + "test-exclude@npm:^7.0.1": version: 7.0.1 resolution: "test-exclude@npm:7.0.1" @@ -22173,6 +22551,13 @@ __metadata: languageName: node linkType: hard +"wasm-feature-detect@npm:^1.2.11": + version: 1.8.0 + resolution: "wasm-feature-detect@npm:1.8.0" + checksum: 10c0/2cb43e91bbf7aa7c121bc76b3133de3ab6dc4f482acc1d2dc46c528e8adb7a51c72df5c2aacf1d219f113c04efd1706f18274d5790542aa5dd49e0644e3ee665 + languageName: node + linkType: hard + "wcwidth@npm:^1.0.1": version: 1.0.1 resolution: "wcwidth@npm:1.0.1" @@ -22678,6 +23063,13 @@ __metadata: languageName: node linkType: hard +"zlibjs@npm:^0.3.1": + version: 0.3.1 + resolution: "zlibjs@npm:0.3.1" + checksum: 10c0/2d110bfcb0f8b8dbf225423f6556da9c5bca95c8b849c1218983676158a24b5cd0350357e0c4d504e27f8c7e18d471d9712576f35114a81a51bcf83453f02beb + languageName: node + linkType: hard + "zod-to-json-schema@npm:^3.22.3, zod-to-json-schema@npm:^3.22.4, zod-to-json-schema@npm:^3.22.5, zod-to-json-schema@npm:^3.24.1": version: 3.24.5 resolution: "zod-to-json-schema@npm:3.24.5"