Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Dynamic Reputation Measurement for Online Services Combining Large Language Models and Rainbow DQN

  

  • Published:2026-05-29

结合大语言模型和Rainbow DQN的在线服务动态信誉度量

Abstract: Online service reputation measurement aggregates user feedback to generate service reputation, which helps users judge service credibility in the absence of sufficient information. However, due to the dynamic evolution of the service environment, service quality, user quantity and user preferences keep changing over time. Reputation measurement methods that only focus on a single time point cannot reflect these changes timely and accurately. In addition, service reputation measurement mechanisms that fail to consider the maximization of user group satisfaction are difficult to attract users to give evaluations consistent with their real experience. This leads to some services being assigned false reputation values. To address these issues, this paper proposes an online service reputation measurement method for maximizing user group satisfaction. First, this paper models online service reputation measurement in dynamic environments as a Partially Observable Markov Decision Process (POMDP) optimization problem for maximizing user group satisfaction. Second, aiming at the inconsistency of user group evaluation criteria, this paper adopts large language models to calculate the reward function and measure user group satisfaction accordingly. Finally, this paper uses the Rainbow DQN algorithm to solve the optimization problem. Experiments are conducted on two public datasets, namely Movielens and Yelp, and multiple large language models are used for evaluation. Results show that the proposed method can generate reputation measurement results consistent with the preferences of most users, thus achieving the maximization of user group satisfaction and verifying the effectiveness of the method.

摘要: 在线服务信誉度量通过聚合用户反馈形成服务信誉,帮助用户在缺乏充分信息的情况下判断服务可信度。然而,由于服务环境的动态演变,服务质量、用户数量及其偏好等会随时间持续变化,只关注单一时间点的信誉度量方法难以及时准确反映这些变化。此外,未考虑用户群体满意度最大化的服务信誉度量机制难以吸引用户群体做出符合其真实体验的评价,从而导致某些服务被赋予不实信誉值。为此,提出了一种最大化用户群体满意度的在线服务信誉度量方法。首先,将动态环境下在线服务信誉度量建模为用户群体满意度最大化的部分可观测马尔科夫决策(Partially Observable Markov Decision Process, POMDP)优化问题。其次,针对用户群体评价标准不一致的情况,采用大语言模型计算奖励函数并以此度量用户群体满意度。最后,通过Rainbow DQN算法求解该优化问题。实验在Movielens与Yelp两个公开数据集上进行,并采用多种LLM进行评测。结果显示提出的方法能够给出符合多数用户偏好的信誉度量结果,从而实现用户群体满意度最大化,验证了所提方法的有效性。