[1]CRESCENZI V,MECCA G,MERIALDO P.RoadRunner:towards automatic data extraction from large Web sites[C]//Proceedings of the 27th International Conference on Very Large Data Bases.[S.l.]:Morgan Kaufmann Publishers,2001:109-118.
[2]LIU B,GROSSMAN R,ZHAI Y.Mining data records in Web pages[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2003:601-606.
[3]ZHAI Y,LIU B.Web data extraction based on partial tree alignment[C]//Proceedings of the 14th International Conference on World Wide Web.New York,USA:ACM Press,2005:76-85.
[4]BUTTLER D,LIU L,PU C.A fully automated object extraction system for the World Wide Web[C]//Proceedings of the 21st International Conference on Distributed Computing Systems.Washington D.C.,USA:IEEE Computer Society,2001:361-370.
[5]CHANG C,HSU C,LUI S,et al.Automatic information extraction from semi-structured Web pages by pattern discovery[J].Decision Support Systems,2003,35(1):129-147.
[6]LU Y,HE H,ZHAO H,et al.Annotating structured data of the deep Web[C]//Proceedings of the 23rd International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2007:376-385.
[7]扬少华,林海略,韩燕波.针对模板生成网页的一种数据自动抽取方法[J].软件学报,2008,19(2):209-223.
[8]LIU W,YAN H,XIAO J.Automatically mining review records from forum Web sites[C]//Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery.Washington D.C.,USA:IEEE Press,2010:2450-2455.
[9]刘伟,严华梁,肖建国,等.一种Web评论自动抽取方法[J].软件学报,2010,21(12):3220-3236.
[10]ZHAO H,MENG W,WU Z,et al.Fully automatic wrapper generation for search engines[C]//Proceedings of the 14th International Conference on World Wide Web.New York,USA:ACM Press,2005:66-75.
[11]KAI S,LAUSEN G.ViPER:augmenting automatic information extraction with visual perceptions[C]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management.New York,USA:ACM Press,2005:381-388.
[12]GUPTA S,KAISER G,NEISTADT D,et al.DOM-based content extraction of HTML documents[C]//Proceedings of the 12th international conference on World Wide Web.New York,USA:ACM Press,2003:207-214.
[13]曹冬林,廖祥文,许洪波,等.基于网页格式信息量的博客文章和评论抽取模型[J].软件学报,2009,20(5):1282-1291.
[14]MCCALLUM A,NIGAM K,RENNIE J,et al.Building domain-specific search engines with machine learning techniques[EB/OL].[2017-08-01].https://www.researchgate.net/publication/228738940_Building_Domain-Specific_Search_Engines_with_Machine_Learning_Techniques.
[15]BERGMARK D,LAGOZE C,SBITYAKOV A.Focused crawls,tunneling,and digital libraries[C]//Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries.Berlin,Germany:Springer,2002:91-106. |