* feat: add fuzzy search for file list with relevance scoring - Add fuzzy option to DirectoryListOptions (default: true) - Implement isFuzzyMatch for subsequence matching - Add getFuzzyMatchScore for relevance-based sorting - Remove searchByContent method (content-based search) - Increase maxDepth to 10 and maxEntries to 20 * perf: optimize fuzzy search with ripgrep glob pre-filtering - Add queryToGlobPattern to convert query to glob pattern - Use ripgrep --iglob for initial filtering instead of loading all files - Reduces memory footprint and improves performance for large directories * feat: add greedy substring match fallback for fuzzy search - Add isGreedySubstringMatch for flexible matching - Fallback to greedy match when glob pre-filter returns empty - Allows 'updatercontroller' to match 'updateController.ts' * fix: improve greedy substring match algorithm - Search from longest to shortest substring for better matching - Fix issue where 'updatercontroller' couldn't match 'updateController' * docs: add fuzzy search documentation (en/zh) * refactor: extract MAX_ENTRIES_PER_SEARCH constant * refactor: use logarithmic scaling for path length penalty - Replace linear penalty (0.8 * length) with logarithmic scaling - Prevents long paths from dominating the score - Add PATH_LENGTH_PENALTY_FACTOR constant with explanation * refactor: extract scoring constants with documentation - Add named constants for scoring factors (SCORE_SEGMENT_MATCH, etc.) - Update en/zh documentation with scoring strategy explanation * refactor: move PATH_LENGTH_PENALTY_FACTOR to class level constant * refactor: extract buildRipgrepBaseArgs helper method - Reduce code duplication for ripgrep argument building - Consolidate directory exclusion patterns and depth handling * refactor: rename MAX_ENTRIES_PER_SEARCH to MAX_SEARCH_RESULTS * fix: escape ! character in glob pattern for negation support * fix: avoid duplicate scoring for filename starts and contains * docs: clarify fuzzy search filtering and scoring strategies * fix: limit word boundary bonus to single match * fix: add dedicated scoring for greedy substring match - Add getGreedyMatchScore function that rewards fewer fragments and tighter matches - Add isFuzzyMatch validation before scoring in fuzzy glob path - Use greedy scoring for fallback path to properly rank longest matches first Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --------- Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
4.6 KiB
Fuzzy Search for File List
This document describes the fuzzy search implementation for file listing in Cherry Studio.
Overview
The fuzzy search feature allows users to find files by typing partial or approximate file names/paths. It uses a two-tier file filtering strategy (ripgrep glob pre-filtering with greedy substring fallback) combined with subsequence-based scoring for optimal performance and flexibility.
Features
- Ripgrep Glob Pre-filtering: Primary filtering using glob patterns for fast native-level filtering
- Greedy Substring Matching: Fallback file filtering strategy when ripgrep glob pre-filtering returns no results
- Subsequence-based Segment Scoring: During scoring, path segments gain additional weight when query characters appear in order
- Relevance Scoring: Results are sorted by a relevance score derived from multiple factors
Matching Strategies
1. Ripgrep Glob Pre-filtering (Primary)
The query is converted to a glob pattern for ripgrep to do initial filtering:
Query: "updater"
Glob: "*u*p*d*a*t*e*r*"
This leverages ripgrep's native performance for the initial file filtering.
2. Greedy Substring Matching (Fallback)
When the glob pre-filter returns no results, the system falls back to greedy substring matching. This allows more flexible matching:
Query: "updatercontroller"
File: "packages/update/src/node/updateController.ts"
Matching process:
1. Find "update" (longest match from start)
2. Remaining "rcontroller" → find "r" then "controller"
3. All parts matched → Success
Scoring Algorithm
Results are ranked by a relevance score based on named constants defined in FileStorage.ts:
| Constant | Value | Description |
|---|---|---|
SCORE_FILENAME_STARTS |
100 | Filename starts with query (highest priority) |
SCORE_FILENAME_CONTAINS |
80 | Filename contains exact query substring |
SCORE_SEGMENT_MATCH |
60 | Per path segment that matches query |
SCORE_WORD_BOUNDARY |
20 | Query matches start of a word |
SCORE_CONSECUTIVE_CHAR |
15 | Per consecutive character match |
PATH_LENGTH_PENALTY_FACTOR |
4 | Logarithmic penalty for longer paths |
Scoring Strategy
The scoring prioritizes:
- Filename matches (highest): Files where the query appears in the filename are most relevant
- Path segment matches: Multiple matching segments indicate stronger relevance
- Word boundaries: Matching at word starts (e.g., "upd" matching "update") is preferred
- Consecutive matches: Longer consecutive character sequences score higher
- Path length: Shorter paths are preferred (logarithmic penalty prevents long paths from dominating)
Example Scoring
For query updater:
| File | Score Factors |
|---|---|
RCUpdater.js |
Short path + filename contains "updater" |
updateController.ts |
Multiple segment matches |
UpdaterHelper.plist |
Long path penalty |
Configuration
DirectoryListOptions
interface DirectoryListOptions {
recursive?: boolean // Default: true
maxDepth?: number // Default: 10
includeHidden?: boolean // Default: false
includeFiles?: boolean // Default: true
includeDirectories?: boolean // Default: true
maxEntries?: number // Default: 20
searchPattern?: string // Default: '.'
fuzzy?: boolean // Default: true
}
Usage
// Basic fuzzy search
const files = await window.api.file.listDirectory(dirPath, {
searchPattern: 'updater',
fuzzy: true,
maxEntries: 20
})
// Disable fuzzy search (exact glob matching)
const files = await window.api.file.listDirectory(dirPath, {
searchPattern: 'update',
fuzzy: false
})
Performance Considerations
- Ripgrep Pre-filtering: Most queries are handled by ripgrep's native glob matching, which is extremely fast
- Fallback Only When Needed: Greedy substring matching (which loads all files) only runs when glob matching returns empty results
- Result Limiting: Only top 20 results are returned by default
- Excluded Directories: Common large directories are automatically excluded:
node_modules.gitdist,build.next,.nuxtcoverage,.cache
Implementation Details
The implementation is located in src/main/services/FileStorage.ts:
queryToGlobPattern(): Converts query to ripgrep glob patternisFuzzyMatch(): Subsequence matching algorithmisGreedySubstringMatch(): Greedy substring matching fallbackgetFuzzyMatchScore(): Calculates relevance scorelistDirectoryWithRipgrep(): Main search orchestration