LU Ti-guang, LIU Xin, LIU Ren-ren
Currently, Web crawler and microblog API which are used to grab data from the microblog are difficult to satisfy the public opinion system demands for microblog data. To settle the problem, this paper presents a feasible solution which is the similar as the browser login microblog to capture data from Web pages. It can easily get all data from any microblog users. On this basis, it constructs a microblogging network through interconnections among users, and discovers new users through it. In order to get high quality data, it builds mathematical models to calculate the user’s influence index by using posting number, posting frequency, fans number, forwarding number and comments number. Moreover, it builds priority queue according to the calculated influence factor, which let those that have bigger influence index have high acquisition frequency. Finally, it calculates time interval to balance the lower frequency of non-active microblog user. The experimental results show that this method not only processes easily and has higher speed but also can obtain high quality information and have huge versatility.