By Topic

Data engineering approach to efficient data warehouse: Life cycle development revisited

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Negin Daneshpour ; Department of Computer Engineering & Information Technology, Amirkabir University of Technology, Tehran, Iran ; Ahmad Abdollahzadeh Barfourosh

Data warehouse (DW) refers to technologies for collecting, integrating, analyzing large volume of homogeneous/heterogeneous data to provide information to enable better decision making. To achieve the main purpose of data warehouse to present analytical response to online queries it is necessary to consider many parameters in development life cycle. Among all factors involved in DW efficiency the quality of data should be taken more seriously. Today data warehouse architecture typically consists of several components which consolidate data from several operational and historical databases to support a variety of front-end query reporting and analytical tools. The back-end of the architecture is mainly relying on Extract-Transform-Load (ETL) process which we usually prefer to have it as a tool. The design and implementation application dependent ETL to pipeline validated and verified data is a labor intensive and typically consumes a large fraction of effort in data warehouse projects. Outcome of our experiment to build DW based on recommended methodology on thirty three million actual population records confirms that the life cycle of DW development has to be revisited. Many works have been reported regarding to data quality impact on efficiency of DW, but less attentions have been made to recognize data engineering aspects to revise the development life cycle for having efficient DW. Our investigation through last experiment shows 3 following steps facilitate life cycle process, and resulted DW is more tailored. 1) Data cleaning as a pre-process phase before data cleansing on ETL. 2) Identifying query type and their operation before transforming phase on ETL. 3) Identifying and materializing suited view for each query before load phase on ETL. The result regarding, to accuracy, effort and time has been tested and is significantly promising.

Published in:

Computer Science and Software Engineering (CSSE), 2011 CSI International Symposium on

Date of Conference:

15-16 June 2011