Entity Fiber Based Partitioning, No Loss Staging and Fast Loading of Log Data | IEEE Conference Publication | IEEE Xplore

Entity Fiber Based Partitioning, No Loss Staging and Fast Loading of Log Data


Abstract:

Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will hel...Show More

Abstract:

Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will help us learn recent changes of browsing and shopping behavior of specific customers, which enables us to provide personalized recommendations. To accomplish such analysis, log data should have been loaded quickly into data warehouse without loss. This paper proposes a no loss staging and fast loading solution for log data. Based on open sourced tools such as Kafka, HDFS, and Spark, we have designed and implemented an entity fiber based log data partitioning and staging method, as well as a parallel loading algorithm. Our scheme achieves a data staging performance of around 390,000 records/s, and a data loading performance of around 160,000 records/s.
Date of Conference: 16-18 December 2016
Date Added to IEEE Xplore: 08 June 2017
ISBN Information:
Conference Location: Guangzhou, China

Contact IEEE to Subscribe

References

References is not available for this document.