Design and Implementation of Bilingual Parallel Web Page Mining System

doi:10.3969/j.issn.1000-3428.2009.14.093

Computer Engineering ›› 2009, Vol. 35 ›› Issue (14): 267-269. doi: 10.3969/j.issn.1000-3428.2009.14.093

• Developmental Research • Previous Articles Next Articles

Design and Implementation of Bilingual Parallel Web Page Mining System

CHEN Wei, HUANG Lei, LIU Feng, ZHAO Zhi-hong

(Institute of Software, Nanjing University, Nanjing 210089)

Received:1900-01-01 Revised:1900-01-01 Online:2009-07-20 Published:2009-07-20

双语平行网页挖掘系统的设计与实现

陈伟，黄蕾，刘峰，赵志宏

(南京大学软件学院，南京 210089)

Abstract

Abstract: Aiming at bilingual corpora is critical resources for developing statistical machine translation system, this paper presents a method which automatically mines bilingual parallel Web page form Web. Different from mining data from pre-specified Web sites, the system is developed to mine parallel Web page from the entire Web, it is greatly suitable for new content domains and language pairs. It implements a parallel Web page mining system. Experimental results show that the system can provide large scale and high quality parallel Web page for statistical machine translation.

Key words: natural language processing, statistical machine translation, bilingual corpora, Web mining

摘要： 针对双语语料是开发统计机器翻译系统的重要资源，提出一种从网络中自动挖掘双语平行网页的方法。与传统从指定网站中挖掘平行网页的方法不同，该方法从整个互联网中自动挖掘平行网页，对新的语言对和内容领域有很强的适应能力，实现双语平行网页挖掘的系统。实验结果显示，该系统可以为统计机器翻译系统提供大量高质量的平行网页。

关键词: 自然语言处理, 统计机器翻译, 双语语料, 网络挖掘

CLC Number:

TP312

CHEN Wei; HUANG Lei; LIU Feng; ZHAO Zhi-hong. Design and Implementation of Bilingual Parallel Web Page Mining System[J]. Computer Engineering, 2009, 35(14): 267-269.

陈伟;黄蕾;刘峰;赵志宏. 双语平行网页挖掘系统的设计与实现[J]. 计算机工程, 2009, 35(14): 267-269.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2009.14.093

http://www.ecice06.com/EN/Y2009/V35/I14/267

[1]	Yanxia GUO, Yong JIN, Hong TANG, Jinzhi PENG. Multi-modal Emotion Recognition Based on Dynamic Convolution and Residual Gating [J]. Computer Engineering, 2023, 49(7): 94-101.
[2]	LI Jingwen, ZHAO Kui. Password Guessing Method Based on Improved PCFG Algorithm [J]. Computer Engineering, 2023, 49(5): 38-47.
[3]	YANG Wenzhong, DING Tiantian, KANG Peng, BU Wenxiu. Review of Chinese Keyword Extraction Based on Public Opinion News [J]. Computer Engineering, 2023, 49(3): 1-17.
[4]	CAI Ruichu, ZHANG Shengqiang, XU Boyan. Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model [J]. Computer Engineering, 2023, 49(2): 61-69.
[5]	WANG Chundong, SUN Jiaqi, YANG Wenjun. Method for Generating Chinese Text Adversarial Examples Based on Rectification Understanding [J]. Computer Engineering, 2023, 49(2): 37-45.
[6]	SI Yichen, GUAN Youqing. Chinese Named Entity Recognition Model Based on Transformer Encoder [J]. Computer Engineering, 2022, 48(7): 66-72.
[7]	ZHANG Jixiang, ZHANG Xiangsen, WU Changxu, ZHAO Zengshun. Survey of Knowledge Graph Construction Techniques [J]. Computer Engineering, 2022, 48(3): 23-37.
[8]	SONG Xuhui, YU Hongtao, LI Shaomei. Chinese Named Entity Recognition Based on Word Fusion of Graph Attention Network [J]. Computer Engineering, 2022, 48(10): 298-305.
[9]	JIANG Xu, QIAN Xuezhong, SONG Wei. Distantly Supervised Relationship Extraction Combined with Residual BiLSTM and Sentence Bag Attention [J]. Computer Engineering, 2022, 48(10): 110-115,122.
[10]	LI Yuze, LUAN Xin, KE Zunwang, LI Zhe, Wushour Silamu. Survey of Knowledge-Aware Pre-Trained Language Models [J]. Computer Engineering, 2021, 47(9): 18-33.
[11]	XU Zhenlei, DONG Hongwei. Video Question Answering Scheme Based on Prior MASK Attention Mechanism [J]. Computer Engineering, 2021, 47(2): 52-59.
[12]	HAN Hu, ZHAO Qitao, SUN Tianyue, LIU Guoli. Contextual Sarcasm Detection Model for Social Media Comments [J]. Computer Engineering, 2021, 47(1): 66-71.
[13]	DING Chenhui, XIA Hongbin, LIU Yuan. Short Text Classification Model Combining Knowledge Graph and Attention Mechanism [J]. Computer Engineering, 2021, 47(1): 94-100.
[14]	LI Guanyu, ZHANG Pengfei, JIA Caiyan. An Attention-enchanced Natural Language Reasoning Model [J]. Computer Engineering, 2020, 46(7): 91-97.
[15]	FENG Dujuan, YANG Lu, YAN Jianfeng. Research on Automatic Text Summarization Based on Dual-Encoder Structure [J]. Computer Engineering, 2020, 46(6): 60-64.

Please choose a citation manager

Content to export

Design and Implementation of Bilingual Parallel Web Page Mining System

双语平行网页挖掘系统的设计与实现

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Design and Implementation of Bilingual Parallel Web Page Mining System

双语平行网页挖掘系统的设计与实现

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments