[1] |
IKVIK L.Information extraction from World Wide Web:a survey[M].Oslo,Norway:Norweigan Computing Center,1999:8-9.
|
[2] |
VAPNIK V N.The nature of statistical learning theory[M].Berlin,Germany:Springer,1995.
|
[3] |
HAMMER J,MCHUGH J,GARCIA-MOLIN H.Semistructured data:the TSIMMIS experience[C]//Proceedings of East-European Conference on Advances in Databases and Information Systems.Swindon,UK:British Computer Society,1997:1-8.
|
[4] |
LIU Ling,PU Caltm,HAN Wei.XWRAP:an XML-enabled wrapper construction system for Web information sources[C]//Proceedings of International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2000:611-621.
|
[5] |
CRESCENZI V,MECCA G,MERIALDO P.RoadRunner:automatic data extraction from data-intensive web sites[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2002:624-624.
|
[6] |
FINN A,KUSHMERICK N,SMYTH B.Fact or fiction:content classification for digital libraries[EB/OL].[2018-03-01].https://www.ercim.eu/publication/ws-proceedings/DelNoe02/AidanFinn.pdf.
|
[7] |
MANTRATZIS C,ORGUN M,CASSIDY S.Separating XHTML content from navigation clutter using DOM-structure block analysis[C]//Proceedings of ACM Conference on Hypertext and Hypermedia.New York,USA:ACM Press,2005:145-147.
|
[8] |
孙承杰,关毅.基于统计的网页正文信息抽取方法的研究[J].中文信息学报,2004,18(5):18-23.
|
[9] |
SONG Ruihua,LIU Haifeng,WEN Jirong,et al.Learning important models for Web page blocks based on layout and content analysis[J].ACM SIGKDD Explorations Newsletter,2004,6(2):14-23.
|
[10] |
胡国平,张巍,王仁华.基于双层决策的新闻网页正文精确抽取[J].中文信息学报,2006,20(6):1-9.
|
[11] |
GIBSON J,WELLNER B,LUBAR S.Adaptive Web-page content identification[C]//Proceedings of ACM International Workshop on Web Information and Data Management.New York,USA:ACM Press,2007:105-112.
|
[12] |
CAI Deng,YU Shipeng,WHEN Jirong,et al.VIPS:a vision based page segmentation algorithm[EB/OL].[2018-03-01].https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2003-79.pdf.
|
[13] |
李蕾,王劲林,白鹤,等.基于FFT的网页正文提取算法研究与实现[J].计算机工程与应用,2007,43(30):148-151.
|
[14] |
朱泽德,李淼,张健,等.基于文本密度模型的Web正文抽取[J].模式识别与人工智能,2013,26(7):667-672.
|
[15] |
王辉,郁波,洪宇,等.基于知识图谱的Web信息抽取系统[J].计算机工程,2017,43(6):118- 124.
|