Dynamic Reputation Measurement for Online Services Combining Large Language Models and Rainbow DQN

doi:10.19678/j.issn.1000-3428.0253433

Abstract

Abstract: Online service reputation measurement aggregates user feedback to generate service reputation, which helps users judge service credibility in the absence of sufficient information. However, due to the dynamic evolution of the service environment, service quality, user quantity and user preferences keep changing over time. Reputation measurement methods that only focus on a single time point cannot reflect these changes timely and accurately. In addition, service reputation measurement mechanisms that fail to consider the maximization of user group satisfaction are difficult to attract users to give evaluations consistent with their real experience. This leads to some services being assigned false reputation values. To address these issues, this paper proposes an online service reputation measurement method for maximizing user group satisfaction. First, this paper models online service reputation measurement in dynamic environments as a Partially Observable Markov Decision Process (POMDP) optimization problem for maximizing user group satisfaction. Second, aiming at the inconsistency of user group evaluation criteria, this paper adopts large language models to calculate the reward function and measure user group satisfaction accordingly. Finally, this paper uses the Rainbow DQN algorithm to solve the optimization problem. Experiments are conducted on two public datasets, namely Movielens and Yelp, and multiple large language models are used for evaluation. Results show that the proposed method can generate reputation measurement results consistent with the preferences of most users, thus achieving the maximization of user group satisfaction and verifying the effectiveness of the method.

摘要： 在线服务信誉度量通过聚合用户反馈形成服务信誉，帮助用户在缺乏充分信息的情况下判断服务可信度。然而，由于服务环境的动态演变，服务质量、用户数量及其偏好等会随时间持续变化，只关注单一时间点的信誉度量方法难以及时准确反映这些变化。此外，未考虑用户群体满意度最大化的服务信誉度量机制难以吸引用户群体做出符合其真实体验的评价，从而导致某些服务被赋予不实信誉值。为此，提出了一种最大化用户群体满意度的在线服务信誉度量方法。首先，将动态环境下在线服务信誉度量建模为用户群体满意度最大化的部分可观测马尔科夫决策(Partially Observable Markov Decision Process, POMDP)优化问题。其次，针对用户群体评价标准不一致的情况，采用大语言模型计算奖励函数并以此度量用户群体满意度。最后，通过Rainbow DQN算法求解该优化问题。实验在Movielens与Yelp两个公开数据集上进行，并采用多种LLM进行评测。结果显示提出的方法能够给出符合多数用户偏好的信誉度量结果，从而实现用户群体满意度最大化，验证了所提方法的有效性。

HE Yaojie, FU Xiaodong. Dynamic Reputation Measurement for Online Services Combining Large Language Models and Rainbow DQN[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253433.

何瑶杰, 付晓东. 结合大语言模型和Rainbow DQN的在线服务动态信誉度量[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253433.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253433

References

[1] TABRIZCHI H, KUCHAKI RAFSANJANI M. A survey on security challenges in cloud computing: issues, threats, and solutions[J]. The Journal of Supercomputing, 2020, 76(12): 9493-9532.
[2] 赵时海, 付晓东, 岳昆, 等. 用户群体满意度最大化的Top-k在线服务评价[J]. 软件学报, 2021, 32(11): 3388-3403. ZHAO S H, FU X D, YUE K, et al. Top-K online service evaluating to maximize satisfaction of user group[J]. Journal of Software, 2021, 32(11): 3388-3403.
[3] RUAN Y, DURRESI A. A survey of trust management systems for online social communities–trust modeling, trust inference and attacks[J]. Knowledge-Based Systems, 2016, 106: 150-163.
[4] WANG J, JING X, YAN Z, et al. A survey on trust evaluation based on machine learning[J]. ACM Computing Surveys, 2020, 53(5): 1-36.
[5] Gao Y, Gong M, Xie Y, et al. An attention-based unsupervised adversarial model for movie review spam detection[J]. IEEE Transactions on Multimedia, 2020, 23, 784-796.
[6] YAN S R, ZHENG X L, CHEN D R. A User-Centric Trust and reputation method for service selection[C]//Proceedings of International Symposium on Intelligence Information Processing and Trusted Computing. Piscataway, NJ, USA: IEEE Press, 2010: 101-105.
[7] JØSANG A, ISMAIL R, BOYD C. A survey of trust and reputation systems for online service provision[J]. Decision Support Systems, 2007, 43(2): 618-644.
[8] ZHOU J, LIU Y F, SUN H L. A reputation ranking method based on rating patterns and rating deviation[C]//Proceedings of International Conference on Data Science and Information Technology. Washington D. C., USA: IEEE Press, 2022: 1-6.
[9] SALEH R A, DRISS M, ALMOMANI I. CBiLSTM: A hybrid deep learning model for efficient reputation assessment of cloud services[J]. IEEE Access, 2022, 10: 35321-35335.
[10] FU X D, YUE K, LIU L, et al. Reputation measurement for online services based on dominance relationships[J]. IEEE Transactions on Services Computing, 2018, 14(4): 1054-1067.
[11] SAÚDE J, RAMOS G, BORATTO L, et al. A robust reputation-based group ranking system and its resistance to bribery[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 16(2): 1-35.
[12] BARAKAT L, TAYLOR P, GRIFFITHS N, et al. A reputation-based framework for honest provenance reporting[J]. ACM Transactions on Internet Technology, 2022, 22(4): 1-31.
[13] CHEN X, DU Y, TANG G, et al. A QoS prediction framework via utility maximization and region-aware matrix factorization[J]. IEEE Transactions on Services Computing, 2025, 18(2): 557-571.
[14] HUANG T, TAN S, TANG Q, et al. Contract theory-based customized service scheduling for predictable QoS in WANs[C]//Proceedings of IEEE International Conference on Communications. Piscataway, NJ, USA: IEEE Press, 2024: 3664-3669.
[15] RASHID M M, XIANG Y, UDDIN M P, et al. Trustworthy and fair federated learning via reputation-based consensus and adaptive incentives[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 2868-2882.
[16] AMAN B, CIOBANU G. Dynamics of reputation in mobile agents systems and weighted timed automata[J]. Information and Computation, 2022, 282: 104653.
[17] WANG B, YU X. A behavior-based dynamic reputation management model in p2p applications[C]//Proceedings of IEEE International Conference on Electronics Information and Emergency Communication. Piscataway, NJ, USA: IEEE Press, 2018: 136-141.
[18] HARSANYI J C. Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility[J]. Journal of Political Economy, 1955, 63(4): 309-321.
[19] SUN H, FANG Y, HSIEH J J P A. Consuming information systems: an economic model of user satisfaction[J]. Decision Support Systems, 2014, 57: 188-199.
[20] 郑苏苏, 付晓东, 岳昆, 等．基于 Kendall tau 距离的在线服务信誉度量方法[J]. 计算机研究与发展, 2019, 56(4): 884-894. ZHENG S S, Fu X D, YUE K, et al. Online service reputation measurement method based on Kendall tau distance[J]. Journal of Computer Research and Development, 2019, 56(4): 884-894.
[21] HICKS J R, ALLEN ＲＧＤ. A reconsideration of the theory of value. part I[J]. Economica, 1934, 1(1): 52-76.
[22] LOVEJOY W S. A survey of algorithmic methods for partially observed Markov decision processes[J]. Annals of Operations Research, 1991, 28(1): 47-65.
[23] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2025-11-10]. https://arxiv.org/pdf/1412.3555.
[24] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 2018: 3215-3222.
[25] 曾俊杰, 秦龙, 徐浩添, 等. 基于内在动机的深度强化学习探索方法综述[J]. 计算机研究与发展, 2023, 60(10): 2359-2382. ZENG J J, QIN L, XU H T, et al. Exploration approaches in deep reinforcement learning based on intrinsic motivation: a review[J]. Journal of Computer Research and Development, 2023, 60(10): 2359-2382.
[26] WATKINS C J, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
[27] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 2016: 2094-2100.
[28] HARPER F M, KONSTAN J A. The movielens datasets: history and context[J]. ACM Transactions on Interactive Intelligent Systems, 2015, 5(4): 1-19.
[29] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2025-11-10]. https://arxiv.org/abs/1707.06347.
[30] WANG S, ZHENG Z, WU Z, et al. Reputation measurement and malicious feedback rating prevention in web service recommendation systems[J]. IEEE Transactions on Services Computing, 2014, 8(5): 755-767

Please choose a citation manager

Content to export