Skip to Main Content
Data warehouse (DW) refers to technologies for collecting, integrating, analyzing large volume of homogeneous/heterogeneous data to provide information to enable better decision making. To achieve the main purpose of data warehouse to present analytical response to online queries it is necessary to consider many parameters in development life cycle. Among all factors involved in DW efficiency the quality of data should be taken more seriously. Today data warehouse architecture typically consists of several components which consolidate data from several operational and historical databases to support a variety of front-end query reporting and analytical tools. The back-end of the architecture is mainly relying on Extract-Transform-Load (ETL) process which we usually prefer to have it as a tool. The design and implementation application dependent ETL to pipeline validated and verified data is a labor intensive and typically consumes a large fraction of effort in data warehouse projects. Outcome of our experiment to build DW based on recommended methodology on thirty three million actual population records confirms that the life cycle of DW development has to be revisited. Many works have been reported regarding to data quality impact on efficiency of DW, but less attentions have been made to recognize data engineering aspects to revise the development life cycle for having efficient DW. Our investigation through last experiment shows 3 following steps facilitate life cycle process, and resulted DW is more tailored. 1) Data cleaning as a pre-process phase before data cleansing on ETL. 2) Identifying query type and their operation before transforming phase on ETL. 3) Identifying and materializing suited view for each query before load phase on ETL. The result regarding, to accuracy, effort and time has been tested and is significantly promising.