Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis | IEEE Conference Publication | IEEE Xplore

Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis


Abstract:

Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of f...Show More

Abstract:

Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of frameworks or guidelines about how to carry out those big data quality initiatives. The starting point of any data quality work is to determine the properties of data quality, termed as data quality dimensions (DQDs). Even those dimensions lack precise rigour in terms of definition from existing literature. This current research aims to contribute towards identifying the most important DQDs for big data in the health industry. It is a continuation of a previous work, which already identified five most important DQDs, using a human judgement based technique known as inner hermeneutic cycle. To remove potential bias coming from the human judgement aspect, this research uses the same set of literature but applies a statistical technique known to extract knowledge from a set of documents known as latent semantic analysis. The results confirm only 2 similar most important DQDs, namely accuracy and completeness.
Date of Conference: 06-07 August 2018
Date Added to IEEE Xplore: 16 September 2018
ISBN Information:
Conference Location: Durban, South Africa
References is not available for this document.

Select All
1.
Aisling ODriscoll, J.D. R. D. S. “‘Big data’, Hadoop and cloud computing in genomics,” Journal of Biomedical Informatics, 2013. pp. 774–781.
2.
Batini, C., Rula, A., Scannapieco, M. Viscusi, G.. “FROM DATA QUALITY TO BIG DATA QUALITY,” Journal of Database Management, Volume 1, pp. 60–82, 2015.
3.
BMJ. “Evidence based medicine: what it is and what it isnt,” BMJ, 312 ( 71 ), 1996.
4.
Caballero, I., Serrano, M. Piattinni, M. “A data quality in Use model for Big Data ” ER workshops, pp. 65–74, 2014.
5.
Cai, L. Zhu, Y. “The Challenges of Data Quality and Data Quality Assessment in the Big Data Era ” Data Science Journal, 14 ( 2 ), 2015.
6.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T. Harshman, R. Indexing by Latent Semantic Analysis. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 41 ( 6 ): 391–407, 1990.
7.
Handler, D.J. “Small Data-Thinking Kills Big Data-Aspirations ”. http://www.wired.com/insights/2013/01/small-data-thinking-kills-big-data-aspirations/, 2012.
8.
Huang, H., Stvilia, B. Bass, H. “Prioritization of Data Quality Dimensions and Skills Requirements in Genome Annotation Work ” JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012.
9.
Juddoo, S. “Overview of Big Data quality challenges ” ICCCS IEEExplore conference proceedings. doi: 10.1109/CCCS.2015.7374131, 2015.
10.
Kulkarni, S.S., Apte, U.M. Evangelopoulos, N.E. The use of Latent Semantic Analysis in Operations Management Research. Journal of Decision Sciences Institute, Volume 45, No. 5, Oct 2014.
11.
Landauer, T.K., Foltz, P.W., Laham, D. ( 1998 ). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284.
12.
Pipino, L., Yang, L. Wang, R. “Data Quality Assessment ” Communications of the ACM, 2002.
13.
Saha, B. Srivastava, D. “Data Quality: The other face of Big Data ” AT Labs-Research, 2014.
14.
Serhani, M.A., Kassabi, H.T, Taleb, I. Nujum, A. An Hybrid Approach to Quality Evaluation Across Big Data Value Chain, 2016 IEEE International Congress on Big Data, 2016.
15.
Soares, S. “Big Data quality ” Big Data Governance: An emerging imperative. MC Press, pp. 101–112, 2012.
16.
Wang, R. Strong, D. “Beyond Accuracy: What Data Quality Means to Data Consumers ” Journal of Management Information Systems, 12 ( 4 ), pp. 5–33, 1996.
17.
Weiskopf, N.G. Chunhua, W. “Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research ” Journal of Americam Medical Information Association, Issue 20, pp. 144–151, 2013.
18.
Wickramaarachchi, W.u Karriaper R.K. An Approach to Get Overall Emotion from Comment Text towards a Certain Image Uploaded to Social Network Using Latent Semantic Analysis. 2017 2nd International Conference on Image, Vision and Computing, IEEE, 2017.
19.
Yesha, Y., Janeja, V., Rishe, N. Yesha, Y. “Personalized Decision Support System to Enhance Evidence Based Medicine through Big Data Analytics ” Healthcare Informatics (ICHI), 2014.
20.
Zolfaghar, K. “Big data solutions for predicting risk-of-readmission for congestive heart failure patients,” Big Data, 2013 IEEE International Conference on, pp. 64–79, 2013.

Contact IEEE to Subscribe

References

References is not available for this document.