Binary-seek-by-word Dictionary Mechanism Based on All-Hash

doi:10.3969/j.issn.1000-3428.2011.21.014

Computer Engineering ›› 2011, Vol. 37 ›› Issue (21): 40-42. doi: 10.3969/j.issn.1000-3428.2011.21.014

• Networks and Communications • Previous Articles Next Articles

Binary-seek-by-word Dictionary Mechanism Based on All-Hash

PENG Huan-feng¹, DING Song-tao ²

(1. School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China; 2. Software Institute, Nanjing University, Nanjing 210093, China)

Received:2011-05-16 Online:2011-11-05 Published:2011-11-05

一种基于全Hash的整词二分词典机制

彭焕峰¹，丁宋涛²

(1. 南京工程学院计算机工程学院，南京 211167；2. 南京大学软件学院，南京 210093)

作者简介:彭焕峰(1978－)，男，讲师、硕士，主研方向：大数据量处理，搜索引擎；丁宋涛，讲师、硕士
基金资助:
南京工程学院科研基金资助项目“基于Lucene的全文搜索引擎研究”(QKJB2009026)

Abstract

Abstract: According to the low efficiency of the traditional binary-seek-by-word dictionary mechanism for word segmentation, this paper gives a binary-seek-by-word dictionary mechanism for word segmentation based on all-Hash by analyzing many old dictionary mechanisms. The new mechanism divides the dictionary entry into some groups by character number the entry has, it uses the Hash value of word to reduce the number of string finding. Theoretical analysis and experiment results show that the new mechanism improves the efficiency of word segmentation.

Key words: Chinese segmentation, Hash function, binary-seek-by-word, verbatim binary search, maximum match

摘要： 为提高整词二分词典机制的分词效率，分析现有分词词典机制，提出一种基于全Hash的整词二分词典机制。该机制将首字相同的词条按字数分组，并进行全词Hash，对Hash值相同的词条进行二分查找，从而减少词条匹配的次数。理论分析和实验结果表明，该机制的分词效率较高。

关键词: 中文分词, Hash函数, 整词二分, 逐字二分, 最大匹配

CLC Number:

TP391.1

BANG Huan-Feng, DING Song-Chao. Binary-seek-by-word Dictionary Mechanism Based on All-Hash[J]. Computer Engineering, 2011, 37(21): 40-42.

彭焕峰, 丁宋涛. 一种基于全Hash的整词二分词典机制[J]. 计算机工程, 2011, 37(21): 40-42.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2011.21.014

http://www.ecice06.com/EN/Y2011/V37/I21/40

[1]	WEI Chengjing, LI Guodong. Encryption Algorithm of Video Images Combining Hyper-Chaotic System and Logistic Mapping [J]. Computer Engineering, 2022, 48(5): 263-271.
[2]	GE Binghui, ZHAO Zongqu, HE Zheng, QIN Panke. Ring Signature Scheme of Programmable Hash Function on Lattices [J]. Computer Engineering, 2020, 46(10): 131-136.
[3]	SHI Zhicai, WANG Yihan, ZHANG Xiaomei, CHEN Jiwei, CHEN Shanshan. An RFID Grouping-proof Protocol with Privacy Protection and Forward Security [J]. Computer Engineering, 2020, 46(1): 108-113.
[4]	YE Qing, WANG Mingming, TANG Yongli, QIN Panke, WANG Yongjun. HIBE Scheme Based on Programmable Hash Function on Lattices [J]. Computer Engineering, 2020, 46(1): 129-135,143.
[5]	LI Mengdong,SHAO Yufang,SUN Yuqing,LI Jie. Efficiency Analysis of SWIFFT Algorithm [J]. Computer Engineering, 2019, 45(1): 109-114.
[6]	ZHAI Jinfeng,SUN Libo,LU Kai,LIN Xueyong,QIN Wenhu. Research on Flow Sampling Algorithm Based on Counting Bloom Filter [J]. Computer Engineering, 2018, 44(8): 273-278.
[7]	TAN Yuesheng,XING Chenshuo,WANG Jingyu. A Cloud Access Control Scheme Supporting Fine-grained Attribute Change [J]. Computer Engineering, 2018, 44(8): 7-13.
[8]	YANG Kang,YUAN Haidong,GUO Yuanbo. Two-dimensional Code Hierarchical Encryption Algorithm Based on Attribute Encryption [J]. Computer Engineering, 2018, 44(6): 136-140.
[9]	BAI Xueyi,GUO Binghui,LI Jiahui,ZHENG Zhiming. Research on Controllability of Gene Regulatory Network Based on Longest Control Chain [J]. Computer Engineering, 2018, 44(11): 202-208.
[10]	SUN Ning,ZHAO Weiping,HEN Mei,LI Chao. An Improved Algorithm of Philips Audio Fingerprint Retrieval [J]. Computer Engineering, 2018, 44(1): 280-284.
[11]	MA Min,LI Zhihui,XU Tingting. Verifiable (n,n) Threshold Quantum Secret Sharing Scheme [J]. Computer Engineering, 2017, 43(8): 169-172.
[12]	YANG Xiaodong,GAO Guojuan,ZHOU Qixu,LI Yanan,WANG Caifen. E-government Data Security Exchange Scheme Based on Proxy Re-signature [J]. Computer Engineering, 2017, 43(2): 183-188.
[13]	LIN Yi,LIAO Qinzhi. Tamper Detection of DICOM File Header Information Based on Lossless Watermarking [J]. Computer Engineering, 2016, 42(5): 151-155,162.
[14]	NIU Shufen,WANG Caifen,ZHANG Yulei,CAO Suzhen. Data Integrity Verification Scheme for Multi-source Network Coding [J]. Computer Engineering, 2015, 41(3): 21-25.
[15]	FENG Jie,LAN Caihui,JIA Borong,YANG Xiaodong. Bidirectional Proxy Re-signature Scheme with Strong Unforgeability [J]. Computer Engineering, 2015, 41(3): 116-119,124.

Please choose a citation manager

Content to export

Binary-seek-by-word Dictionary Mechanism Based on All-Hash

一种基于全Hash的整词二分词典机制

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Binary-seek-by-word Dictionary Mechanism Based on All-Hash

一种基于全Hash的整词二分词典机制

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments