[1] |
IKVIK L.Information extraction from World Wide Web:a survey[M].Oslo,Norway:Norweigan Computing Center,1999:8-9.
[2] |
VAPNIK V N.The nature of statistical learning theory[M].Berlin,Germany:Springer,1995.
[3] |
HAMMER J,MCHUGH J,GARCIA-MOLIN H.Semistructured data:the TSIMMIS experience[C]//Proceedings of East-European Conference on Advances in Databases and Information Systems.Swindon,UK:British Computer Society,1997:1-8.
[4] |
LIU Ling,PU Caltm,HAN Wei.XWRAP:an XML-enabled wrapper construction system for Web information sources[C]//Proceedings of International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2000:611-621.
[5] |
CRESCENZI V,MECCA G,MERIALDO P.RoadRunner:automatic data extraction from data-intensive web sites[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2002:624-624.
[6] |
FINN A,KUSHMERICK N,SMYTH B.Fact or fiction:content classification for digital libraries[EB/OL].[2018-03-01].https://www.ercim.eu/publication/ws-proceedings/DelNoe02/AidanFinn.pdf.
[7] |
MANTRATZIS C,ORGUN M,CASSIDY S.Separating XHTML content from navigation clutter using DOM-structure block analysis[C]//Proceedings of ACM Conference on Hypertext and Hypermedia.New York,USA:ACM Press,2005:145-147.
[8] |
[9] |
SONG Ruihua,LIU Haifeng,WEN Jirong,et al.Learning important models for Web page blocks based on layout and content analysis[J].ACM SIGKDD Explorations Newsletter,2004,6(2):14-23.
[10] |
[11] |
GIBSON J,WELLNER B,LUBAR S.Adaptive Web-page content identification[C]//Proceedings of ACM International Workshop on Web Information and Data Management.New York,USA:ACM Press,2007:105-112.
[12] |
CAI Deng,YU Shipeng,WHEN Jirong,et al.VIPS:a vision based page segmentation algorithm[EB/OL].[2018-03-01].https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2003-79.pdf.
[13] |
[14] |
[15] |
王辉,郁波,洪宇,等.基于知识图谱的Web信息抽取系统[J].计算机工程,2017,43(6):118- 124.