[1] 刘昱然.多模态领域知识图谱构建方法及应用研究[D].银川:北方民族大学, 2020. LIU Y R.Research on the construction method and application of multi-modality domain knowledge graph[D].Yingchuan:Northern University for nationalities, 2020.(in Chinese) [2] 何俊, 张彩庆, 李小珍, 等.面向深度学习的多模态融合技术研究综述[J].计算机工程, 2020, 46(5):1-11. HE J, ZHANG C Q, LI X Z, et al.Survey of research on multimodal fusion technology for deep learning[J].Computer Engineering, 2020, 46(5):1-11.(in Chinese) [3] BALTRUSAITIS T, AHUJA C, MORENCY L P.Multimodal machine learning:a survey and taxonomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2):423-443. [4] 卢超.基于多模态知识图谱的图像描述[D].石家庄:河北科技大学, 2020. LU C.Image description based on multimodal knowledge graph[D].Shijiazhuang:Hebei University of Science and Technology, 2020.(in Chinese) [5] MOUSSELLY S H, BOTSCHEN T, GUREVYCH I, et al.A multimodal translation-based approach for knowledge graph representation learning[C]//Proceedings of the 17th Joint Conference on Lexical and Computational Semantics.Stroudsburg, USA:Association for Computational Linguistics, 2018:225-234. [6] ZHU X R, LI Z X, WANG X D, et al.Multi-modal knowledge graph construction and application:a survey[EB/OL].[2022-01-10].https://arxiv.org/abs/2202.05786. [7] 霍书全.人工智能符号接地问题研究的意义和挑战[J].上海师范大学学报(哲学社会科学版), 2019, 48(3):98-107. HUO S Q.The research significance and challenge of the AI symbol grounding problem[J].Journal of Shanghai Normal University (Philosophy &Social Sciences Edition), 2019, 48(3):98-107.(in Chinese) [8] Wikipedia[M].[S.l.]:PediaPress, 2004. [9] COMMONS W.Wikimedia[EB/OL].[2022-01-10].https://commons.wikimedia.org. [10] FERRADA S, BUSTOS B, HOGAN A.IMGpedia:a linked dataset with content-based analysis of Wikimedia images[C]//Proceedings of the 21st International Symposium on Wearable Computers.Berlin, Germany:Springer, 2017:84-93. [11] ALBERTS H, HUANG N Y, DESHPANDE Y, et al.VisualSem:a high-quality knowledge graph for vision and language[C]//Proceedings of the 1st Workshop on Multilingual Representation Learning.Stroudsburg, USA:Association for Computational Linguistics, 2021:1-15. [12] WANG M, WANG H F, QI G L, et al.Richpedia:a large-scale, comprehensive multi-modal knowledge graph[J].Big Data Research, 2020, 22:1-10. [13] GUNJAN V K, KUMARI M, KUMAR A, et al.Search engine optimization with Google[J].International Journal of Computer Science Issues, 2012, 9(1):206. [14] OÑORO-RUBIO D, NIEPERT M, GARCÍA-DURÁN A, et al.Answering visual-relational queries in Web-extracted knowledge graphs[EB/OL].[2022-01-10].https://arxiv.org/abs/1709.02314. [15] LIU Y, LI H, GARCIA-DURAN A, et al.MMKG:multi-modal knowledge graphs[C]//Proceedings of European Semantic Web Conference.Berlin, Germany:Springer, 2019:459-474. [16] TORRALBA A, FERGUS R, FREEMAN W T.80 million tiny images:a large data set for nonparametric object and scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11):1958-1970. [17] CHEN X L, SHRIVASTAVA A, GUPTA A.NEIL:extracting visual knowledge from Web data[C]//Proceedings of 2013 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2013:1409-1416. [18] RADFORD A, KIM J W, HALLACY C, et al.Learning transferable visual models from natural language supervision[EB/OL].[2022-01-10].https://arxiv.org/abs/2103.00020. [19] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2022-01-10].https://arxiv.org/abs/1409.1556. [20] BREUNIG M M, KRIEGEL H P, NG R T, et al.LOF:identifying density-based local outliers[C]//Proceedings of 2000 ACM SIGMOD International Conference on Management of Data.New York, USA:ACM Press, 2000:93-104. [21] ZHAO Z H, GUO S Q, XU Q L, et al.G-Means:a clustering algorithm for intrusion detection[M]//KOPPEN M, KASABOV N, COGHILL G.Advances in neuro-information processing.Berlin, Germany:Springer, 2009:563-570. [22] UYAR A, KARAPINAR R.Investigating the precision of Web image search engines for popular and less popular entities[J].Journal of Information Science, 2017, 43(3):378-392. [23] VRANDEČIĆD.Wikidata:a new platform for collaborative data collection[C]//Proceedings of the 21st International Conference on World Wide Web.New York, USA:ACM Press, 2012:1063-1064. [24] JEONG J W, WANG X J, LEE D H.Towards measuring the visualness of a concept[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management.New York, USA:ACM Press, 2012:2415-2418. [25] LIU P F, YUAN W Z, FU J L, et al.Pre-train, prompt, and predict:a systematic survey of prompting methods in natural language processing[EB/OL].[2022-01-10].https://arxiv.org/abs/2107.13586. [26] BOLLACKER K, EVANS C, PARITOSH P, et al.Freebase:a collaboratively created graph database for structuring human knowledge[C]//Proceedings of 2008 ACM SIGMOD International Conference on Management of Data.New York, USA:ACM Press, 2008:1247-1250. [27] TSIMPOUKELLI M, MENICK J, CABI S, et al.Multimodal few-shot learning with Frozen language models[EB/OL].[2022-01-10].https://arxiv.org/abs/2106.13884. [28] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2022-01-10].https://arxiv.org/abs/1810.04805. |