作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

关键字与语义融合的高效密文检索方案

  • 发布日期:2025-09-29

Keyword-semantic fusion based efficient ciphertext retrieval scheme

  • Published:2025-09-29

摘要: 可搜索加密的提出为云端数据的加密搜索提供了有效的解决方案,有效缓解了本地存储与计算资源受限的问题。然而,目前大多数方案主要依赖关键字词频统计或单一语义检索,不能同时支持关键字和语义并存的检索任务;且大多数方案普遍采用树形存储结构,对于大规模数据集的检索效率不高。因此,本文基于Milvus向量数据库及其内置的分层可导航小世界图(Hierarchical Navigable Small World,HNSW)数据结构,提出一种高效的混合密文检索方案。方案采用北京智源研究院推出的第三代通用文本嵌入模型(BAAI General Embedding Model v3,BGE-M3)提取高质量的文档语义向量和关键字向量,通过AES、基于哈希的消息认证码密钥派生函数(HMAC-based Extract-and-Expand Key Derivation Function,HKDF)及随机矩阵变换等密码学技术对原始向量进行加密处理,利用加密后的向量构建HNSW索引,并存储到Milvus向量数据库。检索时,通过动态加权融合排序对语义与关键字检索结果进行重排序,在大规模数据环境下实现实时、高效的密文检索。同时,方案支持动态插入、更新和删除操作,具有良好的扩展性。在真实数据集上的实验结果表明,所提出的方案在保障数据安全的同时,提升了检索效率和检索精度,降低了计算开销。

Abstract: The proposal of searchable encryption provides an effective solution for encrypted search of cloud data, effectively alleviating the problem of limited local storage and computing resources. However, most current solutions mainly rely on keyword frequency statistics or single semantic retrieval, and cannot support retrieval tasks with both keywords and semantics; and most solutions generally adopt a tree storage structure, which is not efficient for retrieval of large-scale data sets. Therefore, this paper proposes an efficient hybrid ciphertext retrieval scheme based on the Milvus vector database and its built-in Hierarchical Navigable Small World (HNSW) data structure. The scheme uses the third-generation general text embedding model (BAAI General Embedding Model v3, BGE-M3) launched by Beijing Zhiyuan Research Institute to extract high-quality document semantic vectors and keyword vectors, encrypts the original vectors through cryptographic techniques such as AES, HMAC-based Extract-and-Expand Key Derivation Function (HKDF) and random matrix transformation, and uses the encrypted vectors to construct HNSW indexes and store them in the Milvus vector database. During retrieval, the semantic and keyword retrieval results are reordered through dynamic weighted fusion sorting, achieving real-time and efficient ciphertext retrieval in a large-scale data environment. At the same time, the scheme supports dynamic insertion, update and deletion operations and has good scalability. Experimental results on real data sets show that the proposed scheme improves retrieval efficiency and retrieval accuracy while ensuring data security and reducing computational overhead.