Abstract:
Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will hel...Show MoreMetadata
Abstract:
Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will help us learn recent changes of browsing and shopping behavior of specific customers, which enables us to provide personalized recommendations. To accomplish such analysis, log data should have been loaded quickly into data warehouse without loss. This paper proposes a no loss staging and fast loading solution for log data. Based on open sourced tools such as Kafka, HDFS, and Spark, we have designed and implemented an entity fiber based log data partitioning and staging method, as well as a parallel loading algorithm. Our scheme achieves a data staging performance of around 390,000 records/s, and a data loading performance of around 160,000 records/s.
Published in: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)
Date of Conference: 16-18 December 2016
Date Added to IEEE Xplore: 08 June 2017
ISBN Information: