Skip to Main Content
New generation of health care IT systems are collecting and storing more and more data of patients. Useful knowledge can be extracted from the data in EMR or PHR to provide medical advises to patients, while through data analysis the result statistics can be used to support the scientific research. However, RDBMSs-based framework is not able to support the requirements of massive health care data storage, management and analysis. To solve the problem, this paper proposes a massive data management and analysis solution based on Hadoop to archive better performance, scalability and fault tolerance. The data management framework is presented. Besides, 2 different data analysis methods based on MapReduce and Hive are proposed. Experiment results of data upload, data query and data analysis show that the performance of the proposed framework is greatly improved, and a brief summary of the performance and the differences between 2 methods of MapReduce and Hive is also discussed.