Cgmquantify: Python and R Software Packages for Comprehensive Analysis of Interstitial Glucose and Glycemic Variability from Continuous Glucose Monitor Data

Goal: Continuous glucose monitoring (CGM) is commonly used in Type 1 diabetes management by clinicians and patients and in diabetes research to understand how factors of longitudinal glucose and glucose variability relate to disease onset and severity and the efficacy of interventions. CGM data presents unique bioinformatic challenges because the data is longitudinal, temporal, and there are infinite ways to summarize and use this data. There are over 25 metrics of glucose variability used clinically and in research, metrics are not standardized, and little validation exists across studies. The primary goal of this work is to present a software resource for systematic, reproducible, and comprehensive analysis of interstitial glucose and glycemic variability from continuous glucose monitor data. Methods: Comprehensive literature review informed the clinically-validated functions developed in this work. Software packages were developed and open-sourced through the Python Package Index (PyPi) and the Comprehensive R Archive Network (CRAN). cgmquantify is integrated into the Digital Biomarker Discovery Pipeline and MD2K Cerebral Cortex. Results: Here we present an open-source software toolbox called cgmquantify, which contains 25 functions calculating 28 clinically validated metrics of glucose and glycemic variability, as well as tools for visualizing longitudinal CGM data. Detailed documentation facilitates modification of existing code by the community for customization of input data and visualizations. Conclusions: We have built systematic functions and documentation of metrics and visualizations into a software resource available in both the Python and R languages. This resource will enable digital biomarker development using continuous glucose monitors.


I. INTRODUCTION
Continuous glucose monitoring (CGM) systems provide realtime, dynamic glucose information by tracking interstitial glucose values throughout the day. CGMs are commonly used in Type 1 diabetes (T1D) management, with 1.2 million diabetic patients using a CGM [1], [2]. These devices have been used extensively by the T1D community, including in the Open Artificial Pancreas System Project (OpenAPS) [3], a project developed to create a patient-implemented closed loop system between a CGM and an insulin pump.
CGM data is commonly provided from CGM manufacturers as either raw glucose values (in a .csv format) or in summary reports that utilize proprietary methods to plot and summarize glucose statistics. Because these algorithms are proprietary, they cannot be externally validated by clinical researchers [4]. Additionally, the provided glucose summaries are limited and do not usually contain any information about an important clinical metric, glycemic variability.
Glycemic variability, also known as glucose variability, is an established risk factor for hypoglycemia [5] and has been shown to be a risk factor in other diabetes complications [6]. Glucose variability is mentioned in over 26000 publications indexed in PubMed at the time of this publication and is a key metric in clinical research [7]. Over 25 metrics of glucose variability have been used in the literature, which makes it difficult to examine and compare results across numerous research studies analyzing and drawing conclusions about glucose variability.
There is a need for an open-source software toolbox containing algorithms that are utilized and validated in clinical research studies. This would enable standardized glucose variability metrics and the ability to compare findings from studies that utilize different metrics of glucose variability. Its availability in an open-source programming language with a low barrier to entry will encourage researchers, clinicians, and patients alike to explore CGM data.   (Table 1). There is currently no comprehensive resource for CGM data in Python, the third most common programming language used globally and the leading language among newcomers to programming [11]. Additionally, previous implementations of open source CGM data analysis present limited metrics of glucose variability. Further, these methods are typically developed for a research specific purpose and without sufficient documentation to be extended to other purposes.
The primary goal of this work is to provide a comprehensive, open-source software resource available across both the Python and R programming languages for systematic and reproducible analysis of continuous glucose monitor data.

II. MATERIALS AND METHODS
cgmquantify is an open-source software Python package and R package composed of 25 functions with 28 clinically validated metrics of glucose and glucose variability, as shown in Table 2. These packages are published under the MIT license.
Customizable visualizations (Fig. 1, Fig. 2) are also included as easy to implement functions. cgmquantify is version controlled through GitHub, the Python Package Index (PyPI), and the Comprehensive R Archive Network (CRAN). This allows for single-line installation in either language. Source code, an extensive user guide, and issue tracking are available on GitHub to facilitate ease of use and enable customization based on user needs. Tests are available in GitHub under the tests subdirectory to allow for manual testing of all functions. Results of tests of the current software versions are available in GitHub. The current software passes all tests. The community can expand and contribute code to cgmquantify through the DBDP (cite).

III. RESULTS
We have included import functions to format data for use with the cgmquantify package. These functions currently support Dexcom and Abbott Freestyle Libre CGM devices, with ongoing work adding data import functions for other CGM manufacturers (e.g., Medtronic). Our user guide also outlines how new data can be easily formatted data to make it compatible with the functions in cgmquantify.
Functions are available for all of the commonly studied glucose and glucose variability metrics (Table 2). Additionally, functions for data visualization of the longitudinal CGM data are provided. These visualizations are easily customizable and can include personalized or standard clinical thresholds for hyperand hypoglycemia. We have also implemented a function that enables LOWESS smoothing over the CGM data to facilitate viewing and interpretation (Fig. 1).
We have integrated cgmquantify in the Digital Biomarker Discovery Pipeline, an open-source software resource aimed at making digital biomarker development and digital medicine more accessible [12]. cgmquantify has also been integrated in MD2K Cerebral Cortex in the MD2K Center of Excellence for turning mobile sensor data into reliable and actionable health information [13]. The cgmquantify software package is already being used in several research projects at multiple institutions.

IV. DISCUSSION
cgmquantify is a package that simplifies the process of calculating interstitial glucose metrics from CGM data and thus allows for easy comparison across different research studies that use different metrics summarizing glucose and glucose variability. Functions have been developed using equations from clinically validated research studies such that users can compare their results to previous findings. The cgmquantify package is easily implemented with a one-line installation and contains an extensive user guide in both the Python and R languages. Detailed documentation facilitates modification of existing code for customization of input and visualizations. With the aim of build a community of developers to contribute to the literature in this burgeoning field, we anticipate that many researchers will not only find this resource useful for their own work but will contribute code updates and improvements through the DBDP to ensure that this package remains both relevant and useful for the community. This is a much-needed resource for the community of researchers, clinicians, and patients using CGM. Currently, little is understood about the relationships between glucose and glucose variability metrics from CGM data to diseases including but not limited to prediabetes, Type 2 diabetes, and severity of symptoms in T1D. As more researchers and clinicians start looking to CGM data to answer these questions, the need for a standardized resource available in the most common programming languages is necessary. As we have seen with the Open APS community, analysis of CGM data is not limited to researchers and clinicians but includes patients as well [14]. By providing this toolbox as an open-source resource, we hope to encourage patients to interact with their own data, determine personalized insights, and make meaningful contributions to the digital health landscape.
While the primary use case for cgmquantify is in research, in future directions we may incorporate a graphical user interface to enable ease of use outside of the Python or R programming languages.

V. CONCLUSION
The cgmquantify software resource enables comprehensive analysis of continuous glucose monitor data with clinically validated metrics and easy to implement visualizations. Future implementations include incorporating food diary and physical activity data into the visualizations. The fields of digital biomarker development and diabetes research will benefit greatly from this resource.