mirror of
https://github.com/CherryHQ/cherry-studio.git
synced 2026-01-03 11:19:10 +08:00
feat(migration): enhance ChatMigrator for comprehensive chat data migration
- Implemented detailed preparation, execution, and validation phases for migrating chat topics and messages from Dexie to SQLite. - Added robust logging and error handling to track migration progress and issues. - Introduced data transformation strategies to convert old message structures into a new tree format, ensuring data integrity and consistency. - Updated migration guide documentation to reflect changes in migrator registration and detailed comments for maintainability.
This commit is contained in:
parent
4fcf047fa9
commit
4f4785396a
@ -31,9 +31,10 @@ src/main/data/migration/v2/
|
||||
- `execute(ctx)`: perform inserts/updates; manage your own transactions; report progress via `reportProgress`
|
||||
- `validate(ctx)`: verify counts and integrity; return `ValidateResult` with stats (`sourceCount`, `targetCount`, `skippedCount`) and any `errors`
|
||||
- Registration: list migrators (in order) in `migrators/index.ts` so the engine can sort and run them.
|
||||
- Current migrators:
|
||||
- Current migrators (see `migrators/README-<name>.md` for detailed documentation):
|
||||
- `PreferencesMigrator` (implemented): maps ElectronStore + Redux settings to the `preference` table using `mappings/PreferencesMappings.ts`.
|
||||
- `AssistantMigrator`, `KnowledgeMigrator`, `ChatMigrator` (placeholders): scaffolding and TODO notes for future tables.
|
||||
- `ChatMigrator` (implemented): migrates topics and messages from Dexie to SQLite. See [`README-ChatMigrator.md`](../../../src/main/data/migration/v2/migrators/README-ChatMigrator.md).
|
||||
- `AssistantMigrator`, `KnowledgeMigrator` (placeholders): scaffolding and TODO notes for future tables.
|
||||
- Conventions:
|
||||
- All logging goes through `loggerService` with a migrator-specific context.
|
||||
- Use `MigrationContext.sources` instead of accessing raw files/stores directly.
|
||||
@ -62,3 +63,10 @@ src/main/data/migration/v2/
|
||||
- [ ] Wire progress updates through `reportProgress` so UI shows per-migrator progress.
|
||||
- [ ] Register the migrator in `migrators/index.ts` with the correct `order`.
|
||||
- [ ] Add any new target tables to `MigrationEngine.verifyAndClearNewTables` once those tables exist.
|
||||
- [ ] Include detailed comments for maintainability (file-level, function-level, logic blocks).
|
||||
- [ ] **Create/update `migrators/README-<MigratorName>.md`** with detailed documentation including:
|
||||
- Data sources and target tables
|
||||
- Key transformations
|
||||
- Field mappings (source → target)
|
||||
- Dropped fields and rationale
|
||||
- Code quality notes
|
||||
|
||||
@ -5,7 +5,9 @@
|
||||
|
||||
import { dbService } from '@data/db/DbService'
|
||||
import { appStateTable } from '@data/db/schemas/appState'
|
||||
import { messageTable } from '@data/db/schemas/message'
|
||||
import { preferenceTable } from '@data/db/schemas/preference'
|
||||
import { topicTable } from '@data/db/schemas/topic'
|
||||
import { loggerService } from '@logger'
|
||||
import type {
|
||||
MigrationProgress,
|
||||
@ -24,8 +26,6 @@ import { createMigrationContext } from './MigrationContext'
|
||||
|
||||
// TODO: Import these tables when they are created in user data schema
|
||||
// import { assistantTable } from '../../db/schemas/assistant'
|
||||
// import { topicTable } from '../../db/schemas/topic'
|
||||
// import { messageTable } from '../../db/schemas/message'
|
||||
// import { fileTable } from '../../db/schemas/file'
|
||||
// import { knowledgeBaseTable } from '../../db/schemas/knowledgeBase'
|
||||
|
||||
@ -197,12 +197,13 @@ export class MigrationEngine {
|
||||
const db = dbService.getDb()
|
||||
|
||||
// Tables to clear - add more as they are created
|
||||
// Order matters: child tables must be cleared before parent tables
|
||||
const tables = [
|
||||
{ table: messageTable, name: 'message' }, // Must clear before topic (FK reference)
|
||||
{ table: topicTable, name: 'topic' },
|
||||
{ table: preferenceTable, name: 'preference' }
|
||||
// TODO: Add these when tables are created
|
||||
// { table: assistantTable, name: 'assistant' },
|
||||
// { table: topicTable, name: 'topic' },
|
||||
// { table: messageTable, name: 'message' },
|
||||
// { table: fileTable, name: 'file' },
|
||||
// { table: knowledgeBaseTable, name: 'knowledge_base' }
|
||||
]
|
||||
@ -216,14 +217,15 @@ export class MigrationEngine {
|
||||
}
|
||||
}
|
||||
|
||||
// Clear tables in reverse dependency order
|
||||
// Clear tables in dependency order (children before parents)
|
||||
// Messages reference topics, so delete messages first
|
||||
await db.delete(messageTable)
|
||||
await db.delete(topicTable)
|
||||
await db.delete(preferenceTable)
|
||||
// TODO: Add these when tables are created (in correct order)
|
||||
// await db.delete(messageTable)
|
||||
// await db.delete(topicTable)
|
||||
// await db.delete(fileTable)
|
||||
// await db.delete(knowledgeBaseTable)
|
||||
// await db.delete(assistantTable)
|
||||
await db.delete(preferenceTable)
|
||||
|
||||
logger.info('All new architecture tables cleared successfully')
|
||||
}
|
||||
|
||||
@ -1,81 +1,623 @@
|
||||
/**
|
||||
* Chat migrator - migrates topics and messages from Dexie to SQLite
|
||||
* Chat Migrator - Migrates topics and messages from Dexie to SQLite
|
||||
*
|
||||
* TODO: Implement when chat tables are created
|
||||
* Data source: Dexie topics table (messages are embedded in topics)
|
||||
* Target tables: topic, message
|
||||
* ## Overview
|
||||
*
|
||||
* Note: This migrator handles the largest amount of data (potentially millions of messages)
|
||||
* and uses streaming JSON reading with batch inserts for memory efficiency.
|
||||
* This migrator handles the largest data migration task: transferring all chat topics
|
||||
* and their messages from the old Dexie/IndexedDB storage to the new SQLite database.
|
||||
*
|
||||
* ## Data Sources
|
||||
*
|
||||
* | Data | Source | File/Path |
|
||||
* |------|--------|-----------|
|
||||
* | Topics with messages | Dexie `topics` table | `topics.json` → `{ id, messages[] }` |
|
||||
* | Message blocks | Dexie `message_blocks` table | `message_blocks.json` |
|
||||
* | Assistants (for meta) | Redux `assistants` slice | `ReduxStateReader.getCategory('assistants')` |
|
||||
*
|
||||
* ## Target Tables
|
||||
*
|
||||
* - `topicTable` - Stores conversation topics/threads
|
||||
* - `messageTable` - Stores chat messages with tree structure
|
||||
*
|
||||
* ## Key Transformations
|
||||
*
|
||||
* 1. **Linear → Tree Structure**
|
||||
* - Old: Messages stored as linear array in `topic.messages[]`
|
||||
* - New: Tree via `parentId` + `siblingsGroupId`
|
||||
*
|
||||
* 2. **Multi-model Responses**
|
||||
* - Old: `askId` links responses to user message, `foldSelected` marks active
|
||||
* - New: Shared `parentId` + non-zero `siblingsGroupId` groups siblings
|
||||
*
|
||||
* 3. **Block Inlining**
|
||||
* - Old: `message.blocks: string[]` (IDs) + separate `message_blocks` table
|
||||
* - New: `message.data.blocks: MessageDataBlock[]` (inline JSON)
|
||||
*
|
||||
* 4. **Citation Migration**
|
||||
* - Old: Separate `CitationMessageBlock`
|
||||
* - New: Merged into `MainTextBlock.references` as ContentReference[]
|
||||
*
|
||||
* 5. **Mention Migration**
|
||||
* - Old: `message.mentions: Model[]`
|
||||
* - New: `MentionReference[]` in `MainTextBlock.references`
|
||||
*
|
||||
* ## Performance Considerations
|
||||
*
|
||||
* - Uses streaming JSON reader for large data sets (potentially millions of messages)
|
||||
* - Processes topics in batches to control memory usage
|
||||
* - Pre-loads all blocks into memory map for O(1) lookup (blocks table is smaller)
|
||||
* - Uses database transactions for atomicity and performance
|
||||
*
|
||||
* @since v2.0.0
|
||||
*/
|
||||
|
||||
import { messageTable } from '@data/db/schemas/message'
|
||||
import { topicTable } from '@data/db/schemas/topic'
|
||||
import { loggerService } from '@logger'
|
||||
import type { ExecuteResult, PrepareResult, ValidateResult } from '@shared/data/migration/v2/types'
|
||||
import type { ExecuteResult, PrepareResult, ValidateResult, ValidationError } from '@shared/data/migration/v2/types'
|
||||
import { eq, sql } from 'drizzle-orm'
|
||||
import { v4 as uuidv4 } from 'uuid'
|
||||
|
||||
import type { MigrationContext } from '../core/MigrationContext'
|
||||
import { BaseMigrator } from './BaseMigrator'
|
||||
import {
|
||||
buildBlockLookup,
|
||||
buildMessageTree,
|
||||
type NewMessage,
|
||||
type NewTopic,
|
||||
type OldAssistant,
|
||||
type OldBlock,
|
||||
type OldTopic,
|
||||
type OldTopicMeta,
|
||||
resolveBlocks,
|
||||
transformMessage,
|
||||
transformTopic
|
||||
} from './mappings/ChatMappings'
|
||||
|
||||
const logger = loggerService.withContext('ChatMigrator')
|
||||
|
||||
/**
|
||||
* Batch size for processing topics
|
||||
* Chosen to balance memory usage and transaction overhead
|
||||
*/
|
||||
const TOPIC_BATCH_SIZE = 50
|
||||
|
||||
/**
|
||||
* Batch size for inserting messages
|
||||
* SQLite has limits on the number of parameters per statement
|
||||
*/
|
||||
const MESSAGE_INSERT_BATCH_SIZE = 100
|
||||
|
||||
/**
|
||||
* Assistant data from Redux for generating AssistantMeta
|
||||
*/
|
||||
interface AssistantState {
|
||||
assistants: OldAssistant[]
|
||||
}
|
||||
|
||||
/**
|
||||
* Prepared data for execution phase
|
||||
*/
|
||||
interface PreparedTopicData {
|
||||
topic: NewTopic
|
||||
messages: NewMessage[]
|
||||
}
|
||||
|
||||
export class ChatMigrator extends BaseMigrator {
|
||||
readonly id = 'chat'
|
||||
readonly name = 'ChatData'
|
||||
readonly description = 'Migrate chat data'
|
||||
readonly description = 'Migrate chat topics and messages'
|
||||
readonly order = 4
|
||||
|
||||
async prepare(): Promise<PrepareResult> {
|
||||
logger.info('ChatMigrator.prepare - placeholder implementation')
|
||||
// Prepared data for execution
|
||||
private topicCount = 0
|
||||
private messageCount = 0
|
||||
private blockLookup: Map<string, OldBlock> = new Map()
|
||||
private assistantLookup: Map<string, OldAssistant> = new Map()
|
||||
// Topic metadata from Redux (name, pinned, etc.) - Dexie only has messages
|
||||
private topicMetaLookup: Map<string, OldTopicMeta> = new Map()
|
||||
// Topic → AssistantId mapping from Redux (Dexie topics don't store assistantId)
|
||||
private topicAssistantLookup: Map<string, string> = new Map()
|
||||
private skippedTopics = 0
|
||||
private skippedMessages = 0
|
||||
// Track seen message IDs to handle duplicates across topics
|
||||
private seenMessageIds = new Set<string>()
|
||||
// Block statistics for diagnostics
|
||||
private blockStats = { requested: 0, resolved: 0, messagesWithMissingBlocks: 0, messagesWithEmptyBlocks: 0 }
|
||||
|
||||
// TODO: Implement when chat tables are created
|
||||
// 1. Check if topics.json export file exists
|
||||
// 2. Validate JSON format with sample read
|
||||
// 3. Count total topics and estimate message count
|
||||
// 4. Check for data integrity (e.g., messages have valid topic references)
|
||||
/**
|
||||
* Prepare phase - validate source data and count items
|
||||
*
|
||||
* Steps:
|
||||
* 1. Check if topics.json and message_blocks.json exist
|
||||
* 2. Load all blocks into memory for fast lookup
|
||||
* 3. Load assistant data for generating meta
|
||||
* 4. Count topics and estimate message count
|
||||
* 5. Validate sample data for integrity
|
||||
*/
|
||||
async prepare(ctx: MigrationContext): Promise<PrepareResult> {
|
||||
const warnings: string[] = []
|
||||
|
||||
return {
|
||||
success: true,
|
||||
itemCount: 0,
|
||||
warnings: ['ChatMigrator not yet implemented - waiting for chat tables']
|
||||
}
|
||||
}
|
||||
try {
|
||||
// Step 1: Verify export files exist
|
||||
const topicsExist = await ctx.sources.dexieExport.tableExists('topics')
|
||||
if (!topicsExist) {
|
||||
logger.warn('topics.json not found, skipping chat migration')
|
||||
return {
|
||||
success: true,
|
||||
itemCount: 0,
|
||||
warnings: ['topics.json not found - no chat data to migrate']
|
||||
}
|
||||
}
|
||||
|
||||
async execute(): Promise<ExecuteResult> {
|
||||
logger.info('ChatMigrator.execute - placeholder implementation')
|
||||
const blocksExist = await ctx.sources.dexieExport.tableExists('message_blocks')
|
||||
if (!blocksExist) {
|
||||
warnings.push('message_blocks.json not found - messages will have empty blocks')
|
||||
}
|
||||
|
||||
// TODO: Implement when chat tables are created
|
||||
// Use streaming JSON reader for large message files:
|
||||
//
|
||||
// const streamReader = _ctx.sources.dexieExport.createStreamReader('topics')
|
||||
// await streamReader.readInBatches<OldTopic>(
|
||||
// BATCH_SIZE,
|
||||
// async (topics, batchIndex) => {
|
||||
// // 1. Insert topics
|
||||
// // 2. Extract and insert messages from each topic
|
||||
// // 3. Report progress
|
||||
// }
|
||||
// )
|
||||
// Step 2: Load all blocks into lookup map
|
||||
// Blocks table is typically smaller than messages, safe to load entirely
|
||||
if (blocksExist) {
|
||||
logger.info('Loading message blocks into memory...')
|
||||
const blocks = await ctx.sources.dexieExport.readTable<OldBlock>('message_blocks')
|
||||
this.blockLookup = buildBlockLookup(blocks)
|
||||
logger.info(`Loaded ${this.blockLookup.size} blocks into lookup map`)
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
processedCount: 0
|
||||
}
|
||||
}
|
||||
// Step 3: Load assistant data for generating AssistantMeta
|
||||
// Also extract topic metadata from assistants (Redux stores topic metadata in assistants.topics[])
|
||||
const assistantState = ctx.sources.reduxState.getCategory<AssistantState>('assistants')
|
||||
if (assistantState?.assistants) {
|
||||
for (const assistant of assistantState.assistants) {
|
||||
this.assistantLookup.set(assistant.id, assistant)
|
||||
|
||||
async validate(): Promise<ValidateResult> {
|
||||
logger.info('ChatMigrator.validate - placeholder implementation')
|
||||
// Extract topic metadata from this assistant's topics array
|
||||
// Redux stores topic metadata (name, pinned, etc.) but with messages: []
|
||||
// Also track topic → assistantId mapping (Dexie doesn't store assistantId)
|
||||
if (assistant.topics && Array.isArray(assistant.topics)) {
|
||||
for (const topic of assistant.topics) {
|
||||
if (topic.id) {
|
||||
this.topicMetaLookup.set(topic.id, topic)
|
||||
this.topicAssistantLookup.set(topic.id, assistant.id)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
logger.info(
|
||||
`Loaded ${this.assistantLookup.size} assistants and ${this.topicMetaLookup.size} topic metadata entries`
|
||||
)
|
||||
} else {
|
||||
warnings.push('No assistant data found - topics will have null assistantMeta and missing names')
|
||||
}
|
||||
|
||||
// TODO: Implement when chat tables are created
|
||||
// 1. Count validation for topics and messages
|
||||
// 2. Sample validation (check a few topics have correct message counts)
|
||||
// 3. Reference integrity validation
|
||||
// Step 4: Count topics and estimate messages
|
||||
const topicReader = ctx.sources.dexieExport.createStreamReader('topics')
|
||||
this.topicCount = await topicReader.count()
|
||||
logger.info(`Found ${this.topicCount} topics to migrate`)
|
||||
|
||||
return {
|
||||
success: true,
|
||||
errors: [],
|
||||
stats: {
|
||||
sourceCount: 0,
|
||||
targetCount: 0,
|
||||
skippedCount: 0
|
||||
// Estimate message count from sample
|
||||
if (this.topicCount > 0) {
|
||||
const sampleTopics = await topicReader.readSample<OldTopic>(10)
|
||||
const avgMessagesPerTopic =
|
||||
sampleTopics.reduce((sum, t) => sum + (t.messages?.length || 0), 0) / sampleTopics.length
|
||||
this.messageCount = Math.round(this.topicCount * avgMessagesPerTopic)
|
||||
logger.info(`Estimated ${this.messageCount} messages based on sample`)
|
||||
}
|
||||
|
||||
// Step 5: Validate sample data
|
||||
if (this.topicCount > 0) {
|
||||
const sampleTopics = await topicReader.readSample<OldTopic>(5)
|
||||
for (const topic of sampleTopics) {
|
||||
if (!topic.id) {
|
||||
warnings.push(`Found topic without id - will be skipped`)
|
||||
}
|
||||
if (!topic.messages || !Array.isArray(topic.messages)) {
|
||||
warnings.push(`Topic ${topic.id} has invalid messages array`)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('Prepare phase completed', {
|
||||
topics: this.topicCount,
|
||||
estimatedMessages: this.messageCount,
|
||||
blocks: this.blockLookup.size,
|
||||
assistants: this.assistantLookup.size
|
||||
})
|
||||
|
||||
return {
|
||||
success: true,
|
||||
itemCount: this.topicCount,
|
||||
warnings: warnings.length > 0 ? warnings : undefined
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error('Prepare failed', error as Error)
|
||||
return {
|
||||
success: false,
|
||||
itemCount: 0,
|
||||
warnings: [error instanceof Error ? error.message : String(error)]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute phase - perform the actual data migration
|
||||
*
|
||||
* Processing strategy:
|
||||
* 1. Stream topics in batches to control memory
|
||||
* 2. For each topic batch:
|
||||
* a. Transform topics and their messages
|
||||
* b. Build message tree structure
|
||||
* c. Insert topics in single transaction
|
||||
* d. Insert messages in batched transactions
|
||||
* 3. Report progress throughout
|
||||
*/
|
||||
async execute(ctx: MigrationContext): Promise<ExecuteResult> {
|
||||
if (this.topicCount === 0) {
|
||||
logger.info('No topics to migrate')
|
||||
return { success: true, processedCount: 0 }
|
||||
}
|
||||
|
||||
let processedTopics = 0
|
||||
let processedMessages = 0
|
||||
|
||||
try {
|
||||
const db = ctx.db
|
||||
const topicReader = ctx.sources.dexieExport.createStreamReader('topics')
|
||||
|
||||
// Process topics in batches
|
||||
await topicReader.readInBatches<OldTopic>(TOPIC_BATCH_SIZE, async (topics, batchIndex) => {
|
||||
logger.debug(`Processing topic batch ${batchIndex + 1}`, { count: topics.length })
|
||||
|
||||
// Transform all topics and messages in this batch
|
||||
const preparedData: PreparedTopicData[] = []
|
||||
|
||||
for (const oldTopic of topics) {
|
||||
try {
|
||||
const prepared = this.prepareTopicData(oldTopic)
|
||||
if (prepared) {
|
||||
preparedData.push(prepared)
|
||||
} else {
|
||||
this.skippedTopics++
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to transform topic ${oldTopic.id}`, { error })
|
||||
this.skippedTopics++
|
||||
}
|
||||
}
|
||||
|
||||
// Insert topics in a transaction
|
||||
if (preparedData.length > 0) {
|
||||
await db.transaction(async (tx) => {
|
||||
// Insert topics
|
||||
const topicValues = preparedData.map((d) => d.topic)
|
||||
await tx.insert(topicTable).values(topicValues)
|
||||
|
||||
// Collect all messages, handling duplicate IDs by generating new ones
|
||||
const allMessages: NewMessage[] = []
|
||||
for (const data of preparedData) {
|
||||
for (const msg of data.messages) {
|
||||
if (this.seenMessageIds.has(msg.id)) {
|
||||
const newId = uuidv4()
|
||||
logger.warn(`Duplicate message ID found: ${msg.id}, assigning new ID: ${newId}`)
|
||||
msg.id = newId
|
||||
}
|
||||
this.seenMessageIds.add(msg.id)
|
||||
allMessages.push(msg)
|
||||
}
|
||||
}
|
||||
|
||||
// Insert messages in batches (SQLite parameter limit)
|
||||
for (let i = 0; i < allMessages.length; i += MESSAGE_INSERT_BATCH_SIZE) {
|
||||
const batch = allMessages.slice(i, i + MESSAGE_INSERT_BATCH_SIZE)
|
||||
await tx.insert(messageTable).values(batch)
|
||||
}
|
||||
|
||||
processedMessages += allMessages.length
|
||||
})
|
||||
|
||||
processedTopics += preparedData.length
|
||||
}
|
||||
|
||||
// Report progress
|
||||
const progress = Math.round((processedTopics / this.topicCount) * 100)
|
||||
this.reportProgress(
|
||||
progress,
|
||||
`已迁移 ${processedTopics}/${this.topicCount} 个对话,${processedMessages} 条消息`
|
||||
)
|
||||
})
|
||||
|
||||
logger.info('Execute completed', {
|
||||
processedTopics,
|
||||
processedMessages,
|
||||
skippedTopics: this.skippedTopics,
|
||||
skippedMessages: this.skippedMessages
|
||||
})
|
||||
|
||||
// Log block statistics for diagnostics
|
||||
logger.info('Block migration statistics', {
|
||||
blocksRequested: this.blockStats.requested,
|
||||
blocksResolved: this.blockStats.resolved,
|
||||
blocksMissing: this.blockStats.requested - this.blockStats.resolved,
|
||||
messagesWithEmptyBlocks: this.blockStats.messagesWithEmptyBlocks,
|
||||
messagesWithMissingBlocks: this.blockStats.messagesWithMissingBlocks
|
||||
})
|
||||
|
||||
return {
|
||||
success: true,
|
||||
processedCount: processedTopics
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error('Execute failed', error as Error)
|
||||
return {
|
||||
success: false,
|
||||
processedCount: processedTopics,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate phase - verify migrated data integrity
|
||||
*
|
||||
* Validation checks:
|
||||
* 1. Topic count matches source (minus skipped)
|
||||
* 2. Message count is within expected range
|
||||
* 3. Sample topics have correct structure
|
||||
* 4. Foreign key integrity (messages belong to existing topics)
|
||||
*/
|
||||
async validate(ctx: MigrationContext): Promise<ValidateResult> {
|
||||
const errors: ValidationError[] = []
|
||||
const db = ctx.db
|
||||
|
||||
try {
|
||||
// Count topics in target
|
||||
const topicResult = await db.select({ count: sql<number>`count(*)` }).from(topicTable).get()
|
||||
const targetTopicCount = topicResult?.count ?? 0
|
||||
|
||||
// Count messages in target
|
||||
const messageResult = await db.select({ count: sql<number>`count(*)` }).from(messageTable).get()
|
||||
const targetMessageCount = messageResult?.count ?? 0
|
||||
|
||||
logger.info('Validation counts', {
|
||||
sourceTopics: this.topicCount,
|
||||
targetTopics: targetTopicCount,
|
||||
skippedTopics: this.skippedTopics,
|
||||
targetMessages: targetMessageCount
|
||||
})
|
||||
|
||||
// Validate topic count
|
||||
const expectedTopics = this.topicCount - this.skippedTopics
|
||||
if (targetTopicCount < expectedTopics) {
|
||||
errors.push({
|
||||
key: 'topic_count',
|
||||
message: `Topic count mismatch: expected ${expectedTopics}, got ${targetTopicCount}`
|
||||
})
|
||||
}
|
||||
|
||||
// Sample validation: check a few topics have messages
|
||||
const sampleTopics = await db.select().from(topicTable).limit(5).all()
|
||||
for (const topic of sampleTopics) {
|
||||
const msgCount = await db
|
||||
.select({ count: sql<number>`count(*)` })
|
||||
.from(messageTable)
|
||||
.where(eq(messageTable.topicId, topic.id))
|
||||
.get()
|
||||
|
||||
if (msgCount?.count === 0) {
|
||||
// This is a warning, not an error - some topics may legitimately have no messages
|
||||
logger.warn(`Topic ${topic.id} has no messages after migration`)
|
||||
}
|
||||
}
|
||||
|
||||
// Check for orphan messages (messages without valid topic)
|
||||
// This shouldn't happen due to foreign key constraints, but verify anyway
|
||||
const orphanCheck = await db
|
||||
.select({ count: sql<number>`count(*)` })
|
||||
.from(messageTable)
|
||||
.where(sql`${messageTable.topicId} NOT IN (SELECT id FROM ${topicTable})`)
|
||||
.get()
|
||||
|
||||
if (orphanCheck && orphanCheck.count > 0) {
|
||||
errors.push({
|
||||
key: 'orphan_messages',
|
||||
message: `Found ${orphanCheck.count} orphan messages without valid topics`
|
||||
})
|
||||
}
|
||||
|
||||
return {
|
||||
success: errors.length === 0,
|
||||
errors,
|
||||
stats: {
|
||||
sourceCount: this.topicCount,
|
||||
targetCount: targetTopicCount,
|
||||
skippedCount: this.skippedTopics
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error('Validation failed', error as Error)
|
||||
return {
|
||||
success: false,
|
||||
errors: [
|
||||
{
|
||||
key: 'validation',
|
||||
message: error instanceof Error ? error.message : String(error)
|
||||
}
|
||||
],
|
||||
stats: {
|
||||
sourceCount: this.topicCount,
|
||||
targetCount: 0,
|
||||
skippedCount: this.skippedTopics
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Prepare a single topic and its messages for migration
|
||||
*
|
||||
* @param oldTopic - Source topic from Dexie (has messages, may lack metadata)
|
||||
* @returns Prepared data or null if topic should be skipped
|
||||
*
|
||||
* ## Data Merging
|
||||
*
|
||||
* Topic data comes from two sources:
|
||||
* - Dexie `topics` table: Has `id`, `messages[]`, `assistantId`
|
||||
* - Redux `assistants[].topics[]`: Has metadata (`name`, `pinned`, `prompt`, etc.)
|
||||
*
|
||||
* We merge Redux metadata into the Dexie topic before transformation.
|
||||
*/
|
||||
private prepareTopicData(oldTopic: OldTopic): PreparedTopicData | null {
|
||||
// Validate required fields
|
||||
if (!oldTopic.id) {
|
||||
logger.warn('Topic missing id, skipping')
|
||||
return null
|
||||
}
|
||||
|
||||
// Merge topic metadata from Redux (name, pinned, etc.)
|
||||
// Dexie topics may have stale or missing metadata; Redux is authoritative for these fields
|
||||
const topicMeta = this.topicMetaLookup.get(oldTopic.id)
|
||||
if (topicMeta) {
|
||||
// Merge Redux metadata into Dexie topic
|
||||
// Note: Redux topic.name can also be empty from ancient version migrations (see store/migrate.ts:303-305)
|
||||
oldTopic.name = topicMeta.name || oldTopic.name
|
||||
oldTopic.pinned = topicMeta.pinned ?? oldTopic.pinned
|
||||
oldTopic.prompt = topicMeta.prompt ?? oldTopic.prompt
|
||||
oldTopic.isNameManuallyEdited = topicMeta.isNameManuallyEdited ?? oldTopic.isNameManuallyEdited
|
||||
// Use Redux timestamps if available and Dexie lacks them
|
||||
if (topicMeta.createdAt && !oldTopic.createdAt) {
|
||||
oldTopic.createdAt = topicMeta.createdAt
|
||||
}
|
||||
if (topicMeta.updatedAt && !oldTopic.updatedAt) {
|
||||
oldTopic.updatedAt = topicMeta.updatedAt
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback: If name is still empty after merge, use a default name
|
||||
// This handles cases where both Dexie and Redux have empty names (ancient version bug)
|
||||
if (!oldTopic.name) {
|
||||
oldTopic.name = 'Unnamed Topic' // Default fallback for topics with no name
|
||||
}
|
||||
|
||||
// Get assistantId from Redux mapping (Dexie topics don't store assistantId)
|
||||
// Fall back to oldTopic.assistantId in case Dexie did store it (defensive)
|
||||
const assistantId = this.topicAssistantLookup.get(oldTopic.id) || oldTopic.assistantId
|
||||
if (assistantId && !oldTopic.assistantId) {
|
||||
oldTopic.assistantId = assistantId
|
||||
}
|
||||
|
||||
// Get assistant for meta generation
|
||||
const assistant = this.assistantLookup.get(assistantId) || null
|
||||
|
||||
// Get messages array (may be empty or undefined)
|
||||
const oldMessages = oldTopic.messages || []
|
||||
|
||||
// Build message tree structure
|
||||
const messageTree = buildMessageTree(oldMessages)
|
||||
|
||||
// === First pass: identify messages to skip (no blocks) ===
|
||||
const skippedMessageIds = new Set<string>()
|
||||
const messageParentMap = new Map<string, string | null>() // messageId -> parentId
|
||||
|
||||
for (const oldMsg of oldMessages) {
|
||||
const blockIds = oldMsg.blocks || []
|
||||
const blocks = resolveBlocks(blockIds, this.blockLookup)
|
||||
|
||||
// Track block statistics for diagnostics
|
||||
this.blockStats.requested += blockIds.length
|
||||
this.blockStats.resolved += blocks.length
|
||||
if (blockIds.length === 0) {
|
||||
this.blockStats.messagesWithEmptyBlocks++
|
||||
} else if (blocks.length < blockIds.length) {
|
||||
this.blockStats.messagesWithMissingBlocks++
|
||||
if (blocks.length === 0) {
|
||||
logger.warn(`Message ${oldMsg.id} has ${blockIds.length} block IDs but none found in message_blocks`)
|
||||
}
|
||||
}
|
||||
|
||||
// Store parent info from tree
|
||||
const treeInfo = messageTree.get(oldMsg.id)
|
||||
messageParentMap.set(oldMsg.id, treeInfo?.parentId ?? null)
|
||||
|
||||
// Mark for skipping if no blocks
|
||||
if (blocks.length === 0) {
|
||||
skippedMessageIds.add(oldMsg.id)
|
||||
this.skippedMessages++
|
||||
}
|
||||
}
|
||||
|
||||
// === Helper: resolve parent through skipped messages ===
|
||||
// If parentId points to a skipped message, follow the chain to find a non-skipped ancestor
|
||||
const resolveParentId = (parentId: string | null): string | null => {
|
||||
let currentParent = parentId
|
||||
const visited = new Set<string>() // Prevent infinite loops
|
||||
|
||||
while (currentParent && skippedMessageIds.has(currentParent)) {
|
||||
if (visited.has(currentParent)) {
|
||||
// Circular reference, break out
|
||||
return null
|
||||
}
|
||||
visited.add(currentParent)
|
||||
currentParent = messageParentMap.get(currentParent) ?? null
|
||||
}
|
||||
|
||||
return currentParent
|
||||
}
|
||||
|
||||
// === Second pass: transform messages that have blocks ===
|
||||
const newMessages: NewMessage[] = []
|
||||
for (const oldMsg of oldMessages) {
|
||||
// Skip messages marked for skipping
|
||||
if (skippedMessageIds.has(oldMsg.id)) {
|
||||
continue
|
||||
}
|
||||
|
||||
try {
|
||||
const treeInfo = messageTree.get(oldMsg.id)
|
||||
if (!treeInfo) {
|
||||
logger.warn(`Message ${oldMsg.id} not found in tree, using defaults`)
|
||||
continue
|
||||
}
|
||||
|
||||
// Resolve blocks for this message (we know it has blocks from first pass)
|
||||
const blockIds = oldMsg.blocks || []
|
||||
const blocks = resolveBlocks(blockIds, this.blockLookup)
|
||||
|
||||
// Resolve parentId through any skipped messages
|
||||
const resolvedParentId = resolveParentId(treeInfo.parentId)
|
||||
|
||||
// Get assistant for this message (may differ from topic's assistant)
|
||||
const msgAssistant = this.assistantLookup.get(oldMsg.assistantId) || assistant
|
||||
|
||||
const newMsg = transformMessage(
|
||||
oldMsg,
|
||||
resolvedParentId, // Use resolved parent instead of original
|
||||
treeInfo.siblingsGroupId,
|
||||
blocks,
|
||||
msgAssistant,
|
||||
oldTopic.id
|
||||
)
|
||||
|
||||
newMessages.push(newMsg)
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to transform message ${oldMsg.id}`, { error })
|
||||
this.skippedMessages++
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate activeNodeId based on migrated messages (not original messages)
|
||||
// If no messages were migrated, set to null
|
||||
let activeNodeId: string | null = null
|
||||
if (newMessages.length > 0) {
|
||||
// Use the last migrated message as active node
|
||||
activeNodeId = newMessages[newMessages.length - 1].id
|
||||
}
|
||||
|
||||
// Transform topic with correct activeNodeId
|
||||
const newTopic = transformTopic(oldTopic, assistant, activeNodeId)
|
||||
|
||||
return {
|
||||
topic: newTopic,
|
||||
messages: newMessages
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
138
src/main/data/migration/v2/migrators/README-ChatMigrator.md
Normal file
138
src/main/data/migration/v2/migrators/README-ChatMigrator.md
Normal file
@ -0,0 +1,138 @@
|
||||
# ChatMigrator
|
||||
|
||||
The `ChatMigrator` handles the largest data migration task: topics and messages from Dexie/IndexedDB to SQLite.
|
||||
|
||||
## Data Sources
|
||||
|
||||
| Data | Source | File/Path |
|
||||
|------|--------|-----------|
|
||||
| Topics with messages | Dexie `topics` table | `topics.json` |
|
||||
| Topic metadata (name, pinned, etc.) | Redux `assistants[].topics[]` | `ReduxStateReader.getCategory('assistants')` |
|
||||
| Message blocks | Dexie `message_blocks` table | `message_blocks.json` |
|
||||
| Assistants (for meta) | Redux `assistants` slice | `ReduxStateReader.getCategory('assistants')` |
|
||||
|
||||
### Topic Data Split (Important!)
|
||||
|
||||
The old system stores topic data in **two separate locations**:
|
||||
|
||||
1. **Dexie `topics` table**: Contains only `id` and `messages[]` array (NO `assistantId`!)
|
||||
2. **Redux `assistants[].topics[]`**: Contains metadata (`name`, `pinned`, `prompt`, `isNameManuallyEdited`) and implicitly the `assistantId` (from parent assistant)
|
||||
|
||||
Redux deliberately clears `messages[]` to reduce storage size. The migrator merges these sources:
|
||||
- Messages come from Dexie
|
||||
- Metadata (name, pinned, etc.) comes from Redux
|
||||
- `assistantId` comes from Redux structure (each assistant owns its topics)
|
||||
|
||||
## Key Transformations
|
||||
|
||||
1. **Linear → Tree Structure**
|
||||
- Old: Messages stored as linear array in `topic.messages[]`
|
||||
- New: Tree via `parentId` + `siblingsGroupId`
|
||||
|
||||
2. **Multi-model Responses**
|
||||
- Old: `askId` links responses to user message, `foldSelected` marks active
|
||||
- New: Shared `parentId` + non-zero `siblingsGroupId` groups siblings
|
||||
|
||||
3. **Block Inlining**
|
||||
- Old: `message.blocks: string[]` (IDs) + separate `message_blocks` table
|
||||
- New: `message.data.blocks: MessageDataBlock[]` (inline JSON)
|
||||
|
||||
4. **Citation Migration**
|
||||
- Old: Separate `CitationMessageBlock` with `response`, `knowledge`, `memories`
|
||||
- New: Merged into `MainTextBlock.references` as `ContentReference[]`
|
||||
|
||||
5. **Mention Migration**
|
||||
- Old: `message.mentions: Model[]`
|
||||
- New: `MentionReference[]` in `MainTextBlock.references`
|
||||
|
||||
## Data Quality Handling
|
||||
|
||||
The migrator handles potential data inconsistencies from the old system:
|
||||
|
||||
| Issue | Detection | Handling |
|
||||
|-------|-----------|----------|
|
||||
| **Duplicate message ID** | Same ID appears in multiple topics | Generate new UUID, log warning |
|
||||
| **TopicId mismatch** | `message.topicId` ≠ parent `topic.id` | Use correct parent topic.id (silent fix) |
|
||||
| **Missing blocks** | Block ID not found in `message_blocks` | Skip missing block (silent) |
|
||||
| **Invalid topic** | Topic missing required `id` field | Skip entire topic |
|
||||
| **Missing topic metadata** | Topic not found in Redux `assistants[].topics[]` | Use Dexie values, fallback name if empty |
|
||||
| **Missing assistantId** | Topic not in any `assistant.topics[]` | `assistantId` and `assistantMeta` will be null |
|
||||
| **Empty topic name** | Both Dexie and Redux have empty `name` (ancient bug) | Use fallback "Unnamed Topic" |
|
||||
| **Message with no blocks** | `blocks` array is empty after resolution | Skip message, re-link children to parent's parent |
|
||||
| **Topic with no messages** | All messages skipped (no blocks) | Keep topic, set `activeNodeId` to null |
|
||||
|
||||
## Field Mappings
|
||||
|
||||
### Topic Mapping
|
||||
|
||||
Topic data is merged from Dexie + Redux before transformation:
|
||||
|
||||
| Source | Target (topicTable) | Notes |
|
||||
|--------|---------------------|-------|
|
||||
| Dexie: `id` | `id` | Direct copy |
|
||||
| Redux: `name` | `name` | Merged from Redux `assistants[].topics[]` |
|
||||
| Redux: `isNameManuallyEdited` | `isNameManuallyEdited` | Merged from Redux |
|
||||
| Redux: (parent assistant.id) | `assistantId` | From `topicAssistantLookup` mapping |
|
||||
| (from Assistant) | `assistantMeta` | Generated from assistant entity |
|
||||
| Redux: `prompt` | `prompt` | Merged from Redux |
|
||||
| (computed) | `activeNodeId` | Last message ID or foldSelected |
|
||||
| (none) | `groupId` | null (new field) |
|
||||
| (none) | `sortOrder` | 0 (new field) |
|
||||
| Redux: `pinned` | `isPinned` | Merged from Redux, renamed |
|
||||
| (none) | `pinnedOrder` | 0 (new field) |
|
||||
| `createdAt` | `createdAt` | ISO string → timestamp |
|
||||
| `updatedAt` | `updatedAt` | ISO string → timestamp |
|
||||
|
||||
**Dropped fields**: `type` ('chat' | 'session')
|
||||
|
||||
### Message Mapping
|
||||
|
||||
| Source (OldMessage) | Target (messageTable) | Notes |
|
||||
|---------------------|----------------------|-------|
|
||||
| `id` | `id` | Direct copy (new UUID if duplicate) |
|
||||
| (computed) | `parentId` | From tree building algorithm |
|
||||
| (from parent topic) | `topicId` | Uses parent topic.id for consistency |
|
||||
| `role` | `role` | Direct copy |
|
||||
| `blocks` + `mentions` + citations | `data` | Complex transformation |
|
||||
| (extracted) | `searchableText` | Extracted from text blocks |
|
||||
| `status` | `status` | Normalized to success/error/paused |
|
||||
| (computed) | `siblingsGroupId` | From multi-model detection |
|
||||
| `assistantId` | `assistantId` | Direct copy |
|
||||
| `modelId` | `modelId` | Direct copy |
|
||||
| (from Message.model) | `modelMeta` | Generated from model entity |
|
||||
| `traceId` | `traceId` | Direct copy |
|
||||
| `usage` + `metrics` | `stats` | Merged into single stats object |
|
||||
| `createdAt` | `createdAt` | ISO string → timestamp |
|
||||
| `updatedAt` | `updatedAt` | ISO string → timestamp |
|
||||
|
||||
**Dropped fields**: `type`, `useful`, `enabledMCPs`, `agentSessionId`, `providerMetadata`, `multiModelMessageStyle`, `askId` (replaced by parentId), `foldSelected` (replaced by siblingsGroupId)
|
||||
|
||||
### Block Type Mapping
|
||||
|
||||
| Old Type | New Type | Notes |
|
||||
|----------|----------|-------|
|
||||
| `main_text` | `MainTextBlock` | Direct, references added from citations/mentions |
|
||||
| `thinking` | `ThinkingBlock` | `thinking_millsec` → `thinkingMs` |
|
||||
| `translation` | `TranslationBlock` | Direct copy |
|
||||
| `code` | `CodeBlock` | Direct copy |
|
||||
| `image` | `ImageBlock` | `file.id` → `fileId` |
|
||||
| `file` | `FileBlock` | `file.id` → `fileId` |
|
||||
| `video` | `VideoBlock` | Direct copy |
|
||||
| `tool` | `ToolBlock` | Direct copy |
|
||||
| `citation` | (removed) | Converted to `MainTextBlock.references` |
|
||||
| `error` | `ErrorBlock` | Direct copy |
|
||||
| `compact` | `CompactBlock` | Direct copy |
|
||||
| `unknown` | (skipped) | Placeholder blocks are dropped |
|
||||
|
||||
## Implementation Files
|
||||
|
||||
- `ChatMigrator.ts` - Main migrator class with prepare/execute/validate phases
|
||||
- `mappings/ChatMappings.ts` - Pure transformation functions and type definitions
|
||||
|
||||
## Code Quality
|
||||
|
||||
All implementation code includes detailed comments:
|
||||
- File-level comments: Describe purpose, data flow, and overview
|
||||
- Function-level comments: Purpose, parameters, return values, side effects
|
||||
- Logic block comments: Step-by-step explanations for complex logic
|
||||
- Data transformation comments: Old field → new field mapping relationships
|
||||
1168
src/main/data/migration/v2/migrators/mappings/ChatMappings.ts
Normal file
1168
src/main/data/migration/v2/migrators/mappings/ChatMappings.ts
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user