fix(memory): CJK keyword search + vector search optimization

- Add trigram FTS5 table for CJK/mixed-language search with BM25 ranking
- Fix three-step search routing: unicode61 (ASCII) → trigram (CJK/mixed) → LIKE fallback
- Fix _bm25_rank_to_score: abs(rank)/(1+abs(rank)) instead of max(0,rank)
- Fix INSERT OR REPLACE → UPSERT to preserve FTS5 content table rowid stability
- Fix FTS5 JOIN to use rowid instead of id column
- Fix _search_like: single-char CJK match, dynamic scoring, merged CJK+ASCII path
- Add numpy vectorized cosine similarity + BLOB embedding storage (6x smaller)
- Add _decode_embedding backward compat for legacy JSON embeddings
- Add threading.RLock for concurrent write safety
- Add _meta table to avoid trigram backfill re-running on every startup
- Activate EmbeddingCache in MemoryManager for session-level query deduplication
- Add numpy>=1.24 to requirements.txt
- Merge upstream master (embedding package refactor, FTS5 self-healing methods)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
yangluxin613
2026-05-25 08:56:08 +08:00
parent 73bf83d2ff
commit bc9c1691f5
3 changed files with 408 additions and 131 deletions

View File

@@ -1,3 +1,4 @@
numpy>=1.24
aiohttp>=3.8.6,<3.10
requests>=2.28.2
chardet>=5.1.0