mirror of
https://github.com/CherryHQ/cherry-studio.git
synced 2025-12-31 08:29:07 +08:00
feat: add fuzzy search for file list with relevance scoring (#12131)
* feat: add fuzzy search for file list with relevance scoring - Add fuzzy option to DirectoryListOptions (default: true) - Implement isFuzzyMatch for subsequence matching - Add getFuzzyMatchScore for relevance-based sorting - Remove searchByContent method (content-based search) - Increase maxDepth to 10 and maxEntries to 20 * perf: optimize fuzzy search with ripgrep glob pre-filtering - Add queryToGlobPattern to convert query to glob pattern - Use ripgrep --iglob for initial filtering instead of loading all files - Reduces memory footprint and improves performance for large directories * feat: add greedy substring match fallback for fuzzy search - Add isGreedySubstringMatch for flexible matching - Fallback to greedy match when glob pre-filter returns empty - Allows 'updatercontroller' to match 'updateController.ts' * fix: improve greedy substring match algorithm - Search from longest to shortest substring for better matching - Fix issue where 'updatercontroller' couldn't match 'updateController' * docs: add fuzzy search documentation (en/zh) * refactor: extract MAX_ENTRIES_PER_SEARCH constant * refactor: use logarithmic scaling for path length penalty - Replace linear penalty (0.8 * length) with logarithmic scaling - Prevents long paths from dominating the score - Add PATH_LENGTH_PENALTY_FACTOR constant with explanation * refactor: extract scoring constants with documentation - Add named constants for scoring factors (SCORE_SEGMENT_MATCH, etc.) - Update en/zh documentation with scoring strategy explanation * refactor: move PATH_LENGTH_PENALTY_FACTOR to class level constant * refactor: extract buildRipgrepBaseArgs helper method - Reduce code duplication for ripgrep argument building - Consolidate directory exclusion patterns and depth handling * refactor: rename MAX_ENTRIES_PER_SEARCH to MAX_SEARCH_RESULTS * fix: escape ! character in glob pattern for negation support * fix: avoid duplicate scoring for filename starts and contains * docs: clarify fuzzy search filtering and scoring strategies * fix: limit word boundary bonus to single match * fix: add dedicated scoring for greedy substring match - Add getGreedyMatchScore function that rewards fewer fragments and tighter matches - Add isFuzzyMatch validation before scoring in fuzzy glob path - Use greedy scoring for fallback path to properly rank longest matches first Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --------- Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
This commit is contained in:
parent
068cf1083c
commit
bc9eeb9f30
129
docs/en/references/fuzzy-search.md
Normal file
129
docs/en/references/fuzzy-search.md
Normal file
@ -0,0 +1,129 @@
|
||||
# Fuzzy Search for File List
|
||||
|
||||
This document describes the fuzzy search implementation for file listing in Cherry Studio.
|
||||
|
||||
## Overview
|
||||
|
||||
The fuzzy search feature allows users to find files by typing partial or approximate file names/paths. It uses a two-tier file filtering strategy (ripgrep glob pre-filtering with greedy substring fallback) combined with subsequence-based scoring for optimal performance and flexibility.
|
||||
|
||||
## Features
|
||||
|
||||
- **Ripgrep Glob Pre-filtering**: Primary filtering using glob patterns for fast native-level filtering
|
||||
- **Greedy Substring Matching**: Fallback file filtering strategy when ripgrep glob pre-filtering returns no results
|
||||
- **Subsequence-based Segment Scoring**: During scoring, path segments gain additional weight when query characters appear in order
|
||||
- **Relevance Scoring**: Results are sorted by a relevance score derived from multiple factors
|
||||
|
||||
## Matching Strategies
|
||||
|
||||
### 1. Ripgrep Glob Pre-filtering (Primary)
|
||||
|
||||
The query is converted to a glob pattern for ripgrep to do initial filtering:
|
||||
|
||||
```
|
||||
Query: "updater"
|
||||
Glob: "*u*p*d*a*t*e*r*"
|
||||
```
|
||||
|
||||
This leverages ripgrep's native performance for the initial file filtering.
|
||||
|
||||
### 2. Greedy Substring Matching (Fallback)
|
||||
|
||||
When the glob pre-filter returns no results, the system falls back to greedy substring matching. This allows more flexible matching:
|
||||
|
||||
```
|
||||
Query: "updatercontroller"
|
||||
File: "packages/update/src/node/updateController.ts"
|
||||
|
||||
Matching process:
|
||||
1. Find "update" (longest match from start)
|
||||
2. Remaining "rcontroller" → find "r" then "controller"
|
||||
3. All parts matched → Success
|
||||
```
|
||||
|
||||
## Scoring Algorithm
|
||||
|
||||
Results are ranked by a relevance score based on named constants defined in `FileStorage.ts`:
|
||||
|
||||
| Constant | Value | Description |
|
||||
|----------|-------|-------------|
|
||||
| `SCORE_FILENAME_STARTS` | 100 | Filename starts with query (highest priority) |
|
||||
| `SCORE_FILENAME_CONTAINS` | 80 | Filename contains exact query substring |
|
||||
| `SCORE_SEGMENT_MATCH` | 60 | Per path segment that matches query |
|
||||
| `SCORE_WORD_BOUNDARY` | 20 | Query matches start of a word |
|
||||
| `SCORE_CONSECUTIVE_CHAR` | 15 | Per consecutive character match |
|
||||
| `PATH_LENGTH_PENALTY_FACTOR` | 4 | Logarithmic penalty for longer paths |
|
||||
|
||||
### Scoring Strategy
|
||||
|
||||
The scoring prioritizes:
|
||||
1. **Filename matches** (highest): Files where the query appears in the filename are most relevant
|
||||
2. **Path segment matches**: Multiple matching segments indicate stronger relevance
|
||||
3. **Word boundaries**: Matching at word starts (e.g., "upd" matching "update") is preferred
|
||||
4. **Consecutive matches**: Longer consecutive character sequences score higher
|
||||
5. **Path length**: Shorter paths are preferred (logarithmic penalty prevents long paths from dominating)
|
||||
|
||||
### Example Scoring
|
||||
|
||||
For query `updater`:
|
||||
|
||||
| File | Score Factors |
|
||||
|------|---------------|
|
||||
| `RCUpdater.js` | Short path + filename contains "updater" |
|
||||
| `updateController.ts` | Multiple segment matches |
|
||||
| `UpdaterHelper.plist` | Long path penalty |
|
||||
|
||||
## Configuration
|
||||
|
||||
### DirectoryListOptions
|
||||
|
||||
```typescript
|
||||
interface DirectoryListOptions {
|
||||
recursive?: boolean // Default: true
|
||||
maxDepth?: number // Default: 10
|
||||
includeHidden?: boolean // Default: false
|
||||
includeFiles?: boolean // Default: true
|
||||
includeDirectories?: boolean // Default: true
|
||||
maxEntries?: number // Default: 20
|
||||
searchPattern?: string // Default: '.'
|
||||
fuzzy?: boolean // Default: true
|
||||
}
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```typescript
|
||||
// Basic fuzzy search
|
||||
const files = await window.api.file.listDirectory(dirPath, {
|
||||
searchPattern: 'updater',
|
||||
fuzzy: true,
|
||||
maxEntries: 20
|
||||
})
|
||||
|
||||
// Disable fuzzy search (exact glob matching)
|
||||
const files = await window.api.file.listDirectory(dirPath, {
|
||||
searchPattern: 'update',
|
||||
fuzzy: false
|
||||
})
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Ripgrep Pre-filtering**: Most queries are handled by ripgrep's native glob matching, which is extremely fast
|
||||
2. **Fallback Only When Needed**: Greedy substring matching (which loads all files) only runs when glob matching returns empty results
|
||||
3. **Result Limiting**: Only top 20 results are returned by default
|
||||
4. **Excluded Directories**: Common large directories are automatically excluded:
|
||||
- `node_modules`
|
||||
- `.git`
|
||||
- `dist`, `build`
|
||||
- `.next`, `.nuxt`
|
||||
- `coverage`, `.cache`
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The implementation is located in `src/main/services/FileStorage.ts`:
|
||||
|
||||
- `queryToGlobPattern()`: Converts query to ripgrep glob pattern
|
||||
- `isFuzzyMatch()`: Subsequence matching algorithm
|
||||
- `isGreedySubstringMatch()`: Greedy substring matching fallback
|
||||
- `getFuzzyMatchScore()`: Calculates relevance score
|
||||
- `listDirectoryWithRipgrep()`: Main search orchestration
|
||||
129
docs/zh/references/fuzzy-search.md
Normal file
129
docs/zh/references/fuzzy-search.md
Normal file
@ -0,0 +1,129 @@
|
||||
# 文件列表模糊搜索
|
||||
|
||||
本文档描述了 Cherry Studio 中文件列表的模糊搜索实现。
|
||||
|
||||
## 概述
|
||||
|
||||
模糊搜索功能允许用户通过输入部分或近似的文件名/路径来查找文件。它使用两层文件过滤策略(ripgrep glob 预过滤 + 贪婪子串匹配回退),结合基于子序列的评分,以获得最佳性能和灵活性。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- **Ripgrep Glob 预过滤**:使用 glob 模式进行快速原生级过滤的主要过滤策略
|
||||
- **贪婪子串匹配**:当 ripgrep glob 预过滤无结果时的回退文件过滤策略
|
||||
- **基于子序列的段评分**:评分时,当查询字符按顺序出现时,路径段获得额外权重
|
||||
- **相关性评分**:结果按多因素相关性分数排序
|
||||
|
||||
## 匹配策略
|
||||
|
||||
### 1. Ripgrep Glob 预过滤(主要)
|
||||
|
||||
查询被转换为 glob 模式供 ripgrep 进行初始过滤:
|
||||
|
||||
```
|
||||
查询: "updater"
|
||||
Glob: "*u*p*d*a*t*e*r*"
|
||||
```
|
||||
|
||||
这利用了 ripgrep 的原生性能进行初始文件过滤。
|
||||
|
||||
### 2. 贪婪子串匹配(回退)
|
||||
|
||||
当 glob 预过滤无结果时,系统回退到贪婪子串匹配。这允许更灵活的匹配:
|
||||
|
||||
```
|
||||
查询: "updatercontroller"
|
||||
文件: "packages/update/src/node/updateController.ts"
|
||||
|
||||
匹配过程:
|
||||
1. 找到 "update"(从开头的最长匹配)
|
||||
2. 剩余 "rcontroller" → 找到 "r" 然后 "controller"
|
||||
3. 所有部分都匹配 → 成功
|
||||
```
|
||||
|
||||
## 评分算法
|
||||
|
||||
结果根据 `FileStorage.ts` 中定义的命名常量进行相关性分数排名:
|
||||
|
||||
| 常量 | 值 | 描述 |
|
||||
|------|-----|------|
|
||||
| `SCORE_FILENAME_STARTS` | 100 | 文件名以查询开头(最高优先级)|
|
||||
| `SCORE_FILENAME_CONTAINS` | 80 | 文件名包含精确查询子串 |
|
||||
| `SCORE_SEGMENT_MATCH` | 60 | 每个匹配查询的路径段 |
|
||||
| `SCORE_WORD_BOUNDARY` | 20 | 查询匹配单词开头 |
|
||||
| `SCORE_CONSECUTIVE_CHAR` | 15 | 每个连续字符匹配 |
|
||||
| `PATH_LENGTH_PENALTY_FACTOR` | 4 | 较长路径的对数惩罚 |
|
||||
|
||||
### 评分策略
|
||||
|
||||
评分优先级:
|
||||
1. **文件名匹配**(最高):查询出现在文件名中的文件最相关
|
||||
2. **路径段匹配**:多个匹配段表示更强的相关性
|
||||
3. **词边界**:在单词开头匹配(如 "upd" 匹配 "update")更优先
|
||||
4. **连续匹配**:更长的连续字符序列得分更高
|
||||
5. **路径长度**:较短路径更优先(对数惩罚防止长路径主导评分)
|
||||
|
||||
### 评分示例
|
||||
|
||||
对于查询 `updater`:
|
||||
|
||||
| 文件 | 评分因素 |
|
||||
|------|----------|
|
||||
| `RCUpdater.js` | 短路径 + 文件名包含 "updater" |
|
||||
| `updateController.ts` | 多个路径段匹配 |
|
||||
| `UpdaterHelper.plist` | 长路径惩罚 |
|
||||
|
||||
## 配置
|
||||
|
||||
### DirectoryListOptions
|
||||
|
||||
```typescript
|
||||
interface DirectoryListOptions {
|
||||
recursive?: boolean // 默认: true
|
||||
maxDepth?: number // 默认: 10
|
||||
includeHidden?: boolean // 默认: false
|
||||
includeFiles?: boolean // 默认: true
|
||||
includeDirectories?: boolean // 默认: true
|
||||
maxEntries?: number // 默认: 20
|
||||
searchPattern?: string // 默认: '.'
|
||||
fuzzy?: boolean // 默认: true
|
||||
}
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
```typescript
|
||||
// 基本模糊搜索
|
||||
const files = await window.api.file.listDirectory(dirPath, {
|
||||
searchPattern: 'updater',
|
||||
fuzzy: true,
|
||||
maxEntries: 20
|
||||
})
|
||||
|
||||
// 禁用模糊搜索(精确 glob 匹配)
|
||||
const files = await window.api.file.listDirectory(dirPath, {
|
||||
searchPattern: 'update',
|
||||
fuzzy: false
|
||||
})
|
||||
```
|
||||
|
||||
## 性能考虑
|
||||
|
||||
1. **Ripgrep 预过滤**:大多数查询由 ripgrep 的原生 glob 匹配处理,速度极快
|
||||
2. **仅在需要时回退**:贪婪子串匹配(加载所有文件)仅在 glob 匹配返回空结果时运行
|
||||
3. **结果限制**:默认只返回前 20 个结果
|
||||
4. **排除目录**:自动排除常见的大型目录:
|
||||
- `node_modules`
|
||||
- `.git`
|
||||
- `dist`、`build`
|
||||
- `.next`、`.nuxt`
|
||||
- `coverage`、`.cache`
|
||||
|
||||
## 实现细节
|
||||
|
||||
实现位于 `src/main/services/FileStorage.ts`:
|
||||
|
||||
- `queryToGlobPattern()`:将查询转换为 ripgrep glob 模式
|
||||
- `isFuzzyMatch()`:子序列匹配算法
|
||||
- `isGreedySubstringMatch()`:贪婪子串匹配回退
|
||||
- `getFuzzyMatchScore()`:计算相关性分数
|
||||
- `listDirectoryWithRipgrep()`:主搜索协调
|
||||
@ -130,16 +130,18 @@ interface DirectoryListOptions {
|
||||
includeDirectories?: boolean
|
||||
maxEntries?: number
|
||||
searchPattern?: string
|
||||
fuzzy?: boolean
|
||||
}
|
||||
|
||||
const DEFAULT_DIRECTORY_LIST_OPTIONS: Required<DirectoryListOptions> = {
|
||||
recursive: true,
|
||||
maxDepth: 3,
|
||||
maxDepth: 10,
|
||||
includeHidden: false,
|
||||
includeFiles: true,
|
||||
includeDirectories: true,
|
||||
maxEntries: 10,
|
||||
searchPattern: '.'
|
||||
maxEntries: 20,
|
||||
searchPattern: '.',
|
||||
fuzzy: true
|
||||
}
|
||||
|
||||
class FileStorage {
|
||||
@ -1046,10 +1048,226 @@ class FileStorage {
|
||||
}
|
||||
|
||||
/**
|
||||
* Search files by content pattern
|
||||
* Fuzzy match: checks if all characters in query appear in text in order (case-insensitive)
|
||||
* Example: "updater" matches "packages/update/src/node/updateController.ts"
|
||||
*/
|
||||
private async searchByContent(resolvedPath: string, options: Required<DirectoryListOptions>): Promise<string[]> {
|
||||
const args: string[] = ['-l']
|
||||
private isFuzzyMatch(text: string, query: string): boolean {
|
||||
let i = 0 // text index
|
||||
let j = 0 // query index
|
||||
const textLower = text.toLowerCase()
|
||||
const queryLower = query.toLowerCase()
|
||||
|
||||
while (i < textLower.length && j < queryLower.length) {
|
||||
if (textLower[i] === queryLower[j]) {
|
||||
j++
|
||||
}
|
||||
i++
|
||||
}
|
||||
return j === queryLower.length
|
||||
}
|
||||
|
||||
/**
|
||||
* Scoring constants for fuzzy match relevance ranking
|
||||
* Higher values = higher priority in search results
|
||||
*/
|
||||
private static readonly SCORE_SEGMENT_MATCH = 60 // Per path segment that matches query
|
||||
private static readonly SCORE_FILENAME_CONTAINS = 80 // Filename contains exact query substring
|
||||
private static readonly SCORE_FILENAME_STARTS = 100 // Filename starts with query (highest priority)
|
||||
private static readonly SCORE_CONSECUTIVE_CHAR = 15 // Per consecutive character match
|
||||
private static readonly SCORE_WORD_BOUNDARY = 20 // Query matches start of a word
|
||||
private static readonly PATH_LENGTH_PENALTY_FACTOR = 4 // Logarithmic penalty multiplier for longer paths
|
||||
|
||||
/**
|
||||
* Calculate fuzzy match score (higher is better)
|
||||
* Scoring factors:
|
||||
* - Consecutive character matches (bonus)
|
||||
* - Match at word boundaries (bonus)
|
||||
* - Shorter path length (bonus)
|
||||
* - Match in filename vs directory (bonus)
|
||||
*/
|
||||
private getFuzzyMatchScore(filePath: string, query: string): number {
|
||||
const pathLower = filePath.toLowerCase()
|
||||
const queryLower = query.toLowerCase()
|
||||
const fileName = filePath.split('/').pop() || ''
|
||||
const fileNameLower = fileName.toLowerCase()
|
||||
|
||||
let score = 0
|
||||
|
||||
// Count how many times query-related words appear in path segments
|
||||
const pathSegments = pathLower.split(/[/\\]/)
|
||||
let segmentMatchCount = 0
|
||||
for (const segment of pathSegments) {
|
||||
if (this.isFuzzyMatch(segment, queryLower)) {
|
||||
segmentMatchCount++
|
||||
}
|
||||
}
|
||||
score += segmentMatchCount * FileStorage.SCORE_SEGMENT_MATCH
|
||||
|
||||
// Bonus for filename starting with query (stronger than generic "contains")
|
||||
if (fileNameLower.startsWith(queryLower)) {
|
||||
score += FileStorage.SCORE_FILENAME_STARTS
|
||||
} else if (fileNameLower.includes(queryLower)) {
|
||||
// Bonus for exact substring match in filename (e.g., "updater" in "RCUpdater.js")
|
||||
score += FileStorage.SCORE_FILENAME_CONTAINS
|
||||
}
|
||||
|
||||
// Calculate consecutive match bonus
|
||||
let i = 0
|
||||
let j = 0
|
||||
let consecutiveCount = 0
|
||||
let maxConsecutive = 0
|
||||
|
||||
while (i < pathLower.length && j < queryLower.length) {
|
||||
if (pathLower[i] === queryLower[j]) {
|
||||
consecutiveCount++
|
||||
maxConsecutive = Math.max(maxConsecutive, consecutiveCount)
|
||||
j++
|
||||
} else {
|
||||
consecutiveCount = 0
|
||||
}
|
||||
i++
|
||||
}
|
||||
score += maxConsecutive * FileStorage.SCORE_CONSECUTIVE_CHAR
|
||||
|
||||
// Bonus for word boundary matches (e.g., "upd" matches start of "update")
|
||||
// Only count once to avoid inflating scores for paths with repeated patterns
|
||||
const boundaryPrefix = queryLower.slice(0, Math.min(3, queryLower.length))
|
||||
const words = pathLower.split(/[/\\._-]/)
|
||||
for (const word of words) {
|
||||
if (word.startsWith(boundaryPrefix)) {
|
||||
score += FileStorage.SCORE_WORD_BOUNDARY
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// Penalty for longer paths (prefer shorter, more specific matches)
|
||||
// Use logarithmic scaling to prevent long paths from dominating the score
|
||||
// A 50-char path gets ~-16 penalty, 100-char gets ~-18, 200-char gets ~-21
|
||||
score -= Math.log(filePath.length + 1) * FileStorage.PATH_LENGTH_PENALTY_FACTOR
|
||||
|
||||
return score
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert query to glob pattern for ripgrep pre-filtering
|
||||
* e.g., "updater" -> "*u*p*d*a*t*e*r*"
|
||||
*/
|
||||
private queryToGlobPattern(query: string): string {
|
||||
// Escape special glob characters (including ! for negation)
|
||||
const escaped = query.replace(/[[\]{}()*+?.,\\^$|#!]/g, '\\$&')
|
||||
// Convert to fuzzy glob: each char separated by *
|
||||
return '*' + escaped.split('').join('*') + '*'
|
||||
}
|
||||
|
||||
/**
|
||||
* Greedy substring match: check if all characters in query can be matched
|
||||
* by finding consecutive substrings in text (not necessarily single chars)
|
||||
* e.g., "updatercontroller" matches "updateController" by:
|
||||
* "update" + "r" (from Controller) + "controller"
|
||||
*/
|
||||
private isGreedySubstringMatch(text: string, query: string): boolean {
|
||||
const textLower = text.toLowerCase()
|
||||
const queryLower = query.toLowerCase()
|
||||
|
||||
let queryIndex = 0
|
||||
let searchStart = 0
|
||||
|
||||
while (queryIndex < queryLower.length) {
|
||||
// Try to find the longest matching substring starting at queryIndex
|
||||
let bestMatchLen = 0
|
||||
let bestMatchPos = -1
|
||||
|
||||
for (let len = queryLower.length - queryIndex; len >= 1; len--) {
|
||||
const substr = queryLower.slice(queryIndex, queryIndex + len)
|
||||
const foundAt = textLower.indexOf(substr, searchStart)
|
||||
if (foundAt !== -1) {
|
||||
bestMatchLen = len
|
||||
bestMatchPos = foundAt
|
||||
break // Found longest possible match
|
||||
}
|
||||
}
|
||||
|
||||
if (bestMatchLen === 0) {
|
||||
// No substring match found, query cannot be matched
|
||||
return false
|
||||
}
|
||||
|
||||
queryIndex += bestMatchLen
|
||||
searchStart = bestMatchPos + bestMatchLen
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate greedy substring match score (higher is better)
|
||||
* Rewards: fewer match fragments, shorter match span, matches in filename
|
||||
*/
|
||||
private getGreedyMatchScore(filePath: string, query: string): number {
|
||||
const textLower = filePath.toLowerCase()
|
||||
const queryLower = query.toLowerCase()
|
||||
const fileName = filePath.split('/').pop() || ''
|
||||
const fileNameLower = fileName.toLowerCase()
|
||||
|
||||
let queryIndex = 0
|
||||
let searchStart = 0
|
||||
let fragmentCount = 0
|
||||
let firstMatchPos = -1
|
||||
let lastMatchEnd = 0
|
||||
|
||||
while (queryIndex < queryLower.length) {
|
||||
let bestMatchLen = 0
|
||||
let bestMatchPos = -1
|
||||
|
||||
for (let len = queryLower.length - queryIndex; len >= 1; len--) {
|
||||
const substr = queryLower.slice(queryIndex, queryIndex + len)
|
||||
const foundAt = textLower.indexOf(substr, searchStart)
|
||||
if (foundAt !== -1) {
|
||||
bestMatchLen = len
|
||||
bestMatchPos = foundAt
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
if (bestMatchLen === 0) {
|
||||
return -Infinity // No match
|
||||
}
|
||||
|
||||
fragmentCount++
|
||||
if (firstMatchPos === -1) firstMatchPos = bestMatchPos
|
||||
lastMatchEnd = bestMatchPos + bestMatchLen
|
||||
queryIndex += bestMatchLen
|
||||
searchStart = lastMatchEnd
|
||||
}
|
||||
|
||||
const matchSpan = lastMatchEnd - firstMatchPos
|
||||
let score = 0
|
||||
|
||||
// Fewer fragments = better (single continuous match is best)
|
||||
// Max bonus when fragmentCount=1, decreases as fragments increase
|
||||
score += Math.max(0, 100 - (fragmentCount - 1) * 30)
|
||||
|
||||
// Shorter span relative to query length = better (tighter match)
|
||||
// Perfect match: span equals query length
|
||||
const spanRatio = queryLower.length / matchSpan
|
||||
score += spanRatio * 50
|
||||
|
||||
// Bonus for match in filename
|
||||
if (this.isGreedySubstringMatch(fileNameLower, queryLower)) {
|
||||
score += 80
|
||||
}
|
||||
|
||||
// Penalty for longer paths
|
||||
score -= Math.log(filePath.length + 1) * 4
|
||||
|
||||
return score
|
||||
}
|
||||
|
||||
/**
|
||||
* Build common ripgrep arguments for file listing
|
||||
*/
|
||||
private buildRipgrepBaseArgs(options: Required<DirectoryListOptions>, resolvedPath: string): string[] {
|
||||
const args: string[] = ['--files']
|
||||
|
||||
// Handle hidden files
|
||||
if (!options.includeHidden) {
|
||||
@ -1076,82 +1294,74 @@ class FileStorage {
|
||||
args.push('--max-depth', options.maxDepth.toString())
|
||||
}
|
||||
|
||||
// Handle max count
|
||||
if (options.maxEntries > 0) {
|
||||
args.push('--max-count', options.maxEntries.toString())
|
||||
}
|
||||
|
||||
// Add search pattern (search in content)
|
||||
args.push(options.searchPattern)
|
||||
|
||||
// Add the directory path
|
||||
args.push(resolvedPath)
|
||||
|
||||
const { exitCode, output } = await executeRipgrep(args)
|
||||
|
||||
// Exit code 0 means files found, 1 means no files found (still success), 2+ means error
|
||||
if (exitCode >= 2) {
|
||||
throw new Error(`Ripgrep failed with exit code ${exitCode}: ${output}`)
|
||||
}
|
||||
|
||||
// Parse ripgrep output (already sorted by relevance)
|
||||
const results = output
|
||||
.split('\n')
|
||||
.filter((line) => line.trim())
|
||||
.map((line) => line.replace(/\\/g, '/'))
|
||||
.slice(0, options.maxEntries)
|
||||
|
||||
return results
|
||||
return args
|
||||
}
|
||||
|
||||
private async listDirectoryWithRipgrep(
|
||||
resolvedPath: string,
|
||||
options: Required<DirectoryListOptions>
|
||||
): Promise<string[]> {
|
||||
const maxEntries = options.maxEntries
|
||||
// Fuzzy search mode: use ripgrep glob for pre-filtering, then score in JS
|
||||
if (options.fuzzy && options.searchPattern && options.searchPattern !== '.') {
|
||||
const args = this.buildRipgrepBaseArgs(options, resolvedPath)
|
||||
|
||||
// Step 1: Search by filename first
|
||||
// Insert glob pattern before the path (last element)
|
||||
const globPattern = this.queryToGlobPattern(options.searchPattern)
|
||||
args.splice(args.length - 1, 0, '--iglob', globPattern)
|
||||
|
||||
const { exitCode, output } = await executeRipgrep(args)
|
||||
|
||||
if (exitCode >= 2) {
|
||||
throw new Error(`Ripgrep failed with exit code ${exitCode}: ${output}`)
|
||||
}
|
||||
|
||||
const filteredFiles = output
|
||||
.split('\n')
|
||||
.filter((line) => line.trim())
|
||||
.map((line) => line.replace(/\\/g, '/'))
|
||||
|
||||
// If fuzzy glob found results, validate fuzzy match, sort and return
|
||||
if (filteredFiles.length > 0) {
|
||||
return filteredFiles
|
||||
.filter((file) => this.isFuzzyMatch(file, options.searchPattern))
|
||||
.map((file) => ({ file, score: this.getFuzzyMatchScore(file, options.searchPattern) }))
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, options.maxEntries)
|
||||
.map((item) => item.file)
|
||||
}
|
||||
|
||||
// Fallback: if no results, try greedy substring match on all files
|
||||
logger.debug('Fuzzy glob returned no results, falling back to greedy substring match')
|
||||
const fallbackArgs = this.buildRipgrepBaseArgs(options, resolvedPath)
|
||||
|
||||
const fallbackResult = await executeRipgrep(fallbackArgs)
|
||||
|
||||
if (fallbackResult.exitCode >= 2) {
|
||||
return []
|
||||
}
|
||||
|
||||
const allFiles = fallbackResult.output
|
||||
.split('\n')
|
||||
.filter((line) => line.trim())
|
||||
.map((line) => line.replace(/\\/g, '/'))
|
||||
|
||||
const greedyMatched = allFiles.filter((file) => this.isGreedySubstringMatch(file, options.searchPattern))
|
||||
|
||||
return greedyMatched
|
||||
.map((file) => ({ file, score: this.getGreedyMatchScore(file, options.searchPattern) }))
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, options.maxEntries)
|
||||
.map((item) => item.file)
|
||||
}
|
||||
|
||||
// Fallback: search by filename only (non-fuzzy mode)
|
||||
logger.debug('Searching by filename pattern', { pattern: options.searchPattern, path: resolvedPath })
|
||||
const filenameResults = await this.searchByFilename(resolvedPath, options)
|
||||
|
||||
logger.debug('Found matches by filename', { count: filenameResults.length })
|
||||
|
||||
// If we have enough filename matches, return them
|
||||
if (filenameResults.length >= maxEntries) {
|
||||
return filenameResults.slice(0, maxEntries)
|
||||
}
|
||||
|
||||
// Step 2: If filename matches are less than maxEntries, search by content to fill up
|
||||
logger.debug('Filename matches insufficient, searching by content to fill up', {
|
||||
filenameCount: filenameResults.length,
|
||||
needed: maxEntries - filenameResults.length
|
||||
})
|
||||
|
||||
// Adjust maxEntries for content search to get enough results
|
||||
const contentOptions = {
|
||||
...options,
|
||||
maxEntries: maxEntries - filenameResults.length + 20 // Request extra to account for duplicates
|
||||
}
|
||||
|
||||
const contentResults = await this.searchByContent(resolvedPath, contentOptions)
|
||||
|
||||
logger.debug('Found matches by content', { count: contentResults.length })
|
||||
|
||||
// Combine results: filename matches first, then content matches (deduplicated)
|
||||
const combined = [...filenameResults]
|
||||
const filenameSet = new Set(filenameResults)
|
||||
|
||||
for (const filePath of contentResults) {
|
||||
if (!filenameSet.has(filePath)) {
|
||||
combined.push(filePath)
|
||||
if (combined.length >= maxEntries) {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
logger.debug('Combined results', { total: combined.length, filenameCount: filenameResults.length })
|
||||
return combined.slice(0, maxEntries)
|
||||
return filenameResults.slice(0, options.maxEntries)
|
||||
}
|
||||
|
||||
public validateNotesDirectory = async (_: Electron.IpcMainInvokeEvent, dirPath: string): Promise<boolean> => {
|
||||
|
||||
@ -9,6 +9,7 @@ import { useTranslation } from 'react-i18next'
|
||||
|
||||
const logger = loggerService.withContext('useActivityDirectoryPanel')
|
||||
const MAX_FILE_RESULTS = 500
|
||||
const MAX_SEARCH_RESULTS = 20
|
||||
const areFileListsEqual = (prev: string[], next: string[]) => {
|
||||
if (prev === next) return true
|
||||
if (prev.length !== next.length) return false
|
||||
@ -193,11 +194,11 @@ export const useActivityDirectoryPanel = (params: Params, role: 'button' | 'mana
|
||||
try {
|
||||
const files = await window.api.file.listDirectory(dirPath, {
|
||||
recursive: true,
|
||||
maxDepth: 4,
|
||||
maxDepth: 10,
|
||||
includeHidden: false,
|
||||
includeFiles: true,
|
||||
includeDirectories: true,
|
||||
maxEntries: MAX_FILE_RESULTS,
|
||||
maxEntries: MAX_SEARCH_RESULTS,
|
||||
searchPattern: searchPattern || '.'
|
||||
})
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user