Author Login Editor-in-Chief Peer Review Editor Work Office Work

Collections

云计算与大数据专题
Journal
Publication year
Channels
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • WU Yinghao,LING Jie
    Computer Engineering. 2019, 45(3): 36-40. https://doi.org/10.19678/j.issn.1000-3428.0049086
    CSCD(1)

    Most existing cloud storage data integrity verification methods have low efficiency and high communication overhead.To solve this problem,an improved data integrity verification method for cloud storage is proposed.The bilinear pairing technique is used to verify the data integrity in order to realize the public verification function.The indexing table mechanism is designed for dynamic verification.The random mask technique is used to improve the security of the method.Analysis and experimental results show that this method can effectively resist malicious attacks on servers,and has lower communication overhead and higher computing efficiency.

  • GAO Quan,WAN Xiaodong
    Computer Engineering. 2019, 45(3): 32-35,40. https://doi.org/10.19678/j.issn.1000-3428.0049606
    CSCD(4)

    Aiming at the problem that the lookup operation of FP-Growth algorithm has a high time complexity,this paper proposes a new algorithm named LBPFP.The algorithm is based on PFP algorithm,which is added a hash table to the head table to achieve fast access to item and is designed a workload model based on the prefix length to optimize the parallel process and improve the efficiency of the algorithm.The comparison experiments in the webdocs.dat database show that the LBPFP algorithm has better performance than the PFP,HPFP and DPFP algorithms.

  • GAO Jun,HUANG Xiance
    Computer Engineering. 2019, 45(3): 26-31. https://doi.org/10.19678/j.issn.1000-3428.0049976
    CSCD(1)

    The traditional TF-IDF algorithm calculates the correlation weights between keywords and documents only by using the perspective of word frequency and reverse document frequency,which ignoes the influence of user interest on weight calculation.In order to meet the purpose of user information retrieval,a correlation weight algorithm based on journal association is proposed.From the perspective of user-oriented comelation,the user interest model is built by analyzing the user's search journal,and combined with the idea of distributed computing,the MapReduce programming framework is used to realize the parallel processing of computing tasks.Experimental results show that it can not only improve the efficiency of the algorithm when dealing with massive data,but also dynamically change the weight of retrieval word according to the user's historical retrieval records,so as to enhance the interaction ability between users and the system.

  • ZHANG Wei,WANG Zhijie
    Computer Engineering. 2019, 45(3): 20-25,31. https://doi.org/10.19678/j.issn.1000-3428.0052626

    Distributed system is an ideal choice for processing temporal large data join operation,but the existing distributed system cannot support the original temporal join query and cannot meet the processing requirements of temporal large data with low latency and high throughput.Therefore,a two-level index memory solution scheme based on Spark is proposed.The global index is used to prune the distributed partitions,and the local temporal index is used to query the partitions in order to improve the efficiency of data retrieval.A partition method is designed for temporal data to optimize global pruning.Experimental results based on real and synthetic datasets show that the scheme can significantly improve the processing efficiency of temporal join operation.

  • JIANG Meng,YU Minggang,WANG Zhixue
    Computer Engineering. 2019, 45(3): 14-19. https://doi.org/10.19678/j.issn.1000-3428.0052715

    Large-scale ontology mapping in the context of large data has high time complexity,low efficiency and accuracy.Therefore,a multi-strategy adaptive large-scale ontology mapping algorithm based on modularity and local confidence is proposed.Clustering and modularizing the inner part of the system,discovering the correlated sub-ontologies with high similarity between modules based on information retrieval strategy,calculating the local confidence under each mapping strategy among the correlated sub-ontologies,and adjusting the weight of the corresponding strategy adaptively based on the local confidence when combining the mapping results.On this basis,heuristic greedy strategy is used to extract mapping results and correct them based on mapping rules.Experimental results show that compared with Falcon and ASMOV methods,the proposed algorithm has higher recall,precision and F-measure value.

  • LIU Biao,WANG Baosheng,DENG Wenping
    Computer Engineering. 2019, 45(3): 7-13. https://doi.org/10.19678/j.issn.1000-3428.0049811
    CSCD(2)

    Cloud computing and container technology bring the convenience of the workflows operation,but there are problems such as management difficulties,insufficient resource utilization,and low intelligence and automation.Therefore,a containerized workflow framework supporting elastic scaling is proposed.On the basis of this,a workflow automatic scaling model based on CPU usage is presented,which can automatically expand the number of containers when the workflow process is overloaded,and reduce the task waiting time.When the task load is reduced,the process can be reduced while ensuring that the task is not lost to save resources and costs.Experimental results show that the number of expansions of the process is positively related to the processing time,which can better eliminate the bottleneck of the workflow.When the workflow is overloaded,the same amount of tasks can be completed in a shorter time.

  • ZHANG Haoshenglun,LI Chong,KE Yong,ZHANG Shibo
    Computer Engineering. 2019, 45(3): 1-6. https://doi.org/10.19678/j.issn.1000-3428.0050119
    CSCD(1)

    A distributed User Browse Click Model(UBM) algorithm is proposed to quickly mine user behavior from massive search click logs.The validation parameter E derived from the original UBM algorithm is only related to the ranking position of the search results and the click position of the previous document,and is very stable.Based on this characteristic,the EM iteration solution is transformed into a distributed UBM algorithm which estimates the test degree by sampling to solve the attraction degree.Results of simulation on Spark data platform show that compared with the original UBM algorithm,the proposed algorithm can solve the serious data skew problem in click log,and has higher efficiency.