Loading [MathJax]/extensions/MathMenu.js
Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis | IEEE Conference Publication | IEEE Xplore

Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis


Abstract:

Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of f...Show More

Abstract:

Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of frameworks or guidelines about how to carry out those big data quality initiatives. The starting point of any data quality work is to determine the properties of data quality, termed as data quality dimensions (DQDs). Even those dimensions lack precise rigour in terms of definition from existing literature. This current research aims to contribute towards identifying the most important DQDs for big data in the health industry. It is a continuation of a previous work, which already identified five most important DQDs, using a human judgement based technique known as inner hermeneutic cycle. To remove potential bias coming from the human judgement aspect, this research uses the same set of literature but applies a statistical technique known to extract knowledge from a set of documents known as latent semantic analysis. The results confirm only 2 similar most important DQDs, namely accuracy and completeness.
Date of Conference: 06-07 August 2018
Date Added to IEEE Xplore: 16 September 2018
ISBN Information:
Conference Location: Durban, South Africa

Contact IEEE to Subscribe

References

References is not available for this document.