High Sampling Rate Smartphone-PPG via Built-in Rolling Shutter Image Sensor

Recent advancement of CMOS camera image sensor (CIS) on smartphone brings a significant improvement to the IoT-based mobile healthcare technology in the form of CIS-photoplethysmography (CPPG). Nevertheless, most of the available smartphone is equipped with a limited sampling rate (Fs), typically 30 frame per second (fps), thus often resulting in a distorted CPPG signal acquisition. This distorted signal is hard to be utilized for different types of advanced photoplethysmography (PPG)-derived physiological analysis, and only useful in a simple pulse rate monitoring system. In this article, the rolling-shutter camera mechanism has been exploited to extract CPPG data points from CIS-pixel rows, thus allowing high-Fs CPPG signal extraction from a common built-in, low-fps smartphone CIS. Multiple experiments were conducted to prove the reliability of rolling-shutter CPPG (RSCPPG) signal. First, we conduct iterative experiments with different CIS parameters to find their correlation to the acquired RSCPPG signal quality. Results indicate that the short exposure time produces a high-SNR CPPG signal up to 25 ± 2.38 dB, and highly correlated signal morphology (average ${r}$ = 0.95) compared to the reference PPG signal. Then, we also demonstrated the proposed RSCPPG algorithm allowing a high CPPG data sampling with Fs = 150 Hz (that is ≥ 5 times CIS fps). Finally, a feasibility study has been conducted on multiple features extracted from RSCPPG that are potentially implemented for further physiological analysis application. These findings suggest that the proposed RSCPPG algorithm is a reliable bio-signal acquisition technique in smartphone-based healthcare technology.


I. INTRODUCTION
S MARTPHONE is one essential device which allows recent advancement in IoT system. It is equipped with multiple cutting-edge wireless communication interfaces, such as 5G, Bluetooth 5.0, near field communication (NFC), Wi-fi 6, and radio-frequency identification (RFID) module, thus allowing it to become a gateway of interconnection and control of a variety of smart devices, such as smart home appliances, smart cars, and smartwatches. Moreover, nowadays smartphones also come with number of built-in sensors, such as accelerometer, gyroscope, ambient light sensor, fingerprint sensor, pressure sensor, magnetometer, GPS sensors, and CMOS camera image sensor (CIS), which also makes it a perfect platform for smart sensing of IoT system. Embedded smartphone sensors have been valuable in many application areas, such as entertainment, navigation, security, and healthcare purposes. Although the smartphone has played a center role for IoT smart systems, it has not shown a clear success case for smart healthcare other than data gateway due to the fact that embedded sensors have not yet proven its meaningfulness for highly rigorous healthcare services. Thus, this study is focused on the utilization of embedded smartphone sensors, especially CIS sensors for personalized healthcare monitoring applications. Recent advances in the smartphone CIS technology have enabled early phase of IoT-based mobile healthcare application systems, such as cardiovascular activity monitoring [e.g., heart rate (HR) and HR variability (HRV)], skin health monitoring, and stress monitoring system [1], [2], [3]. In cardiovascular activity monitoring system, HR and HRV value are a piece of vital information to be derived continuously from the acquired bio-signals, however, a more advanced analysis on smartphone which gives a detailed information of our cardiovascular system is emerging for future mobile healthcare application, known as the Internet of Medical Things (IoMT) [4]. Therefore, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ smartphone-based IoMT system is targeted for personalized healthcare, connecting healthcare devices through networks and allowing remote real-time monitoring of bio-signals and its derived healthcare vital parameters for early detection and diagnosis of diseases [5], [6], [7].
One of the bio-signal that typically extracted from smartphone is called photoplethysmography (PPG). PPG is an optically based noninvasive sensing method that can be used to measure the blood volume pulse changes under the microvascular bed of tissue [8], [9]. It is proven that PPG is a powerful bio-signal, and different types of advanced information can be derived from PPG, such as for clinical physiological monitoring (e.g., blood oxygen saturation (SpO 2 ), HR, blood pressure, cardiac output, and respiration), for vascular assessment (e.g., arterial stiffness, venous assessment, and arterial diseases), and for autonomic function assessment (e.g., vasomotor function and thermoregulation) [10].
Typically, the PPG sensor system consists of the pair of light-emitting diode (LED), as an emitter, and a photodiode (PD), as a detector; or pair of LED emitter and CIS detector. LED is required to illuminate the skin tissue, and thus the detector (PD or CIS) is utilized to measure the small variations in the absorption of LED light due to the dynamic blood flow through the vessels. Different from the LED-PD configuration, which is mostly utilized for wearable healthcare technologies, the LED-CIS configuration allows both contact and noncontact (measuring in distance) PPG signal acquisition that ultimately opens a whole new application of PPG-based healthcare system [11], [12]. Moreover, LED-CIS configuration is available in any mobile device in the market that making it a practical backbone technology for mobile healthcare. However, recent studies only focus on CISphotoplethysmography (CPPG) smartphone system for the basic HR and HRV analysis, there are two main problems that hampered the development of smartphone CPPG for its application in multiple PPG-derived physiological monitoring applications. First, the limited low-capture rate [in frame per second (fps)] of the built-in smartphone CIS. The existing CPPG signal acquisition method, e.g., DSLR-based CPPG, remains highly technical and requires an expensive camera that can record the signal at high fps. For example, a previous study utilized multiple high-fps cameras (each 120 fps) to extract HR and HRV from the subject's face [13]. This approach shows good results however it is not applicable for real implementation in smartphone devices. Second, the high-fps video recording on smartphone devices will suffer from the unstable capture rate acquisition due to operating system processing. This unstable recording is not recommended for PPG signal recording, as it may lead to a bad PPG signal morphology.
In this study, we proposed a novel algorithm, named rollingshutter CPPG (RSCPPG), to solve the above-mentioned problem related to the low-Fs of smartphone CIS. Thus, provide a high-Fs CPPG signal using a built-in-low-fps smartphone CIS. Our proposed study is considered a practical solution as it works perfectly with only using the built-in smartphone CIS. As a comparison, other studies utilized additional components, such as thermal camera [14], external multiwavelength LED [15], and external pressure sensor [16] for stress, hemoglobin, and blood pressure monitoring, respectively. Hence, they are not robust for real implementation. Moreover, multiple experiments have been conducted to show the reliability of the proposed RSCPPG algorithm, therefore three key contributions of the proposed study can be summarized as follows.
1) We investigate the effects of the CIS parameter on the recorded RSCPPG signal quality. Here, we conduct an iterative experiment to determine the optimized CIS parameter for CPPG signal quality improvements. Different combination of CIS parameters has been tested related to the exposure time, finger positions, and output resolutions. 2) We develop a novel algorithm to increase the acquired PPG Fs >> the CIS fps. Our RSCPPG algorithm is capable of extracting multiple data points from one image frame. Here, we also proposed multirow combine with amplitude compensation (MRAC) algorithm based on the modified Beer-Lambert law. MRAC is introduced to combine the multiple data points from different pixel rows and compensate for the noise that comes from measurements, such as due to blooming effects, and motion artifacts. 3) We conduct a preclinical study with 44 out of 50 recruited subjects to find the effectiveness of the RSCPPG algorithm in a designed leg-press exercise protocol. Here, a 7-min leg-press-and-hold exercise is used to increase the subject's HR thus allowing the signal analysis in a wide range of HR value. The beat-bybeat and ensemble average-based PPG analysis has been conducted to calculate the overall performance of the proposed RSCPPG algorithm in this preclinical study experiment. The remainder of this article is organized as follows. Sections II and III discuss the related work and rolling-shutter mechanism, respectively. The proposed RSCPPG technique is discussed in Section IV. Section V discusses our experimental design. Section VI provides the detailed experimental results of this study, including the optimized CIS parameter for RSCPPG, and preclinical study, with RSCPPG. Section VII provides our discussion on the experimental results. Finally, we conclude our work in Section VIII.

II. RELATED WORKS
The prior method of extracting PPG information from CIS sensor is by spatially averaging the intensity level A over the pixels of the whole frame or the preselected region of interest in the recorded image frame, labeled as the frame averaging method It is assumed that the spatially averaged intensity signal y avg (t) is related to a small change in the skin color and would be proportional to the changes in the blood volume in the arteries and capillaries underneath the skin during the cardiac cycle [15], [17], [18]. The main problem of this method is its capability of PPG data sampling is limited to the maximum capture rate fps of the CIS, hence the acquired Fs of y avg is typically ≤ CIS fps [19], [20]. For Other researchers tried to solve this problem by not focusing on the low-sampling rate problem but instead developing an algorithm that detects and separate clean PPG signal from the noisy ones [18], [21]. For example, Tabei et al. [21] proposed a novel personalized motion and noise artifact detection algorithm using a neural network and signal quality index to reduce false positive alarm on atrial fibrillation (AF) detection. An empirical mode decomposition (EMD) has been proposed by Sun et al. [22], for a robust heartbeat detection from the noisy PPG signal. It showed the proposed EMD, and Hilbert transform could reduce the interference of motion artifacts. Liu et al. [18] proposed a sinusoidal function-based PPG quality index to filter out low-quality PPG data and perform HRV measurements from smartphone PPG signals. Moreover, A parabola approximation method is developed to increase the accuracy of PPG peak detection extraction from lowsampling-rate PPG signal, for further reliable HRV parameter analysis [23].
In the previous chapter, we mentioned that the low Fs PPG may result in poor performance for assessing physiological parameters that utilize pulse wave feature calculation. This statement was strengthened by some earlier studies [24], [25], [26]. Sun and Thakor [24] recorded 200 fps noncontact PPG and mentioned that the low Fs PPG might influence the PRV analysis, they also simulated that the interpolation approach was required to improve the timing estimation of pulse-to-pulse interval. Moreover, recent studies by Choi and Shin [25] determined that significant differences in most of the time-domain and frequency-domain PRV analysis occurred at low Fs PPG. In other study, Fujita and Suzuki [26] also proved that only features extracted from pulse wave measurements with the PPG Fs > 30 Hz can be considered meaningful for physiological monitoring. Table I shows the overview of the common extracted features from CPPG. As seen from the table, most of the CPPG was acquired in low Fs, thus only the systolic peak of CPPG signal can be extracted for analysis due to the poor CPPG signal morphology, as a result the CPPG analysis is limited for the simple HR, IBI (time-domain and frequency-domain HRV), and SpO 2 . Here, additional spline interpolation is usually required to increase the sampling rate of IBI for HRV analysis [24]. Moreover, compared to the wearable PPG system, all previous studies in wearable PPG signal acquisition always utilizing high-Fs. This high-Fs is required for further analysis, including a complete characteristic point detection in all original, first, and second derivatives of the CPPG signal. For example, Charlton et al. [30] studied PPG for mental stress assessment, their study recorded PPG in three different locations of the arm, and the wrist with the PPG Fs = 135 Hz. Here, a total of thirty-two PPG features were extracted and analyzed to assess human mental stress. Therefore, there is still a strong need for a new algorithm to extract high-rate and high-fidelity PPG signal for multiple type of physiological measurements on the common mobile devices. Thus, a novel RSCPPG method that can be implemented in any recent smartphone was proposed in this study.

III. CIS AND ROLLING-SHUTTER MECHANISM
Most of the modern image sensor is produced using silicon. Thus, if an incident photon hits the silicon, and the photon energy exceeds the silicon semiconductor's bandgap energy, then that photon will be absorbed in the silicon and produce charge (E photon ) related to the quantum efficiency of silicon at that wavelength, such that where h, c, v, and λ are Planck's constant, the speed of light, the frequency of light, and the wavelength of light, respectively. And, E g is the bandgap energy which is 1.1 eV in the case of silicon semiconductor [32]. The level of signal generated by the image sensor depends on the amount of light incident on the imager. In general, the image sensor consists of an array of PDs, with each sensor known as a pixel, to record the incoming light signals. Therefore, to measure how much light hits the sensor, the current could be directly measured, accordingly convert it to voltage with the technique called integration, where PD collects photons for a known period before the voltage is readout. The output voltage across the PD (V out ) may be calculated with V pd is the voltage across the PD, and i photo , t int , and C D are photocurrent, integration time, and capacitance of the PD, respectively. Finally, the acquired V out then are amplified pixel by pixel and produce the output images digitally as we have seen the output of the digital cameras.
Two kinds of technology have been proposed for light signal amplification, that are charged-coupled device (CCD), and CMOS. Different to CCD, CIS integrates the charge to voltage converter as well as the voltage amplification in each pixel itself, hence it made the CIS processing speed much higher than the CCD image sensor. Also, most of the smartphones in the market are equipped with at least two CISs nowadays.
CIS may produce images with two mechanisms, the global shutter, and the rolling shutter. In a global shutter, the sensor exposes all pixels to an image at the same time. Thus, at the end of an integration time, readout occurs simultaneously. Moreover, in the global shutter method, overall frame rate is limited by the rate that individual pixels can be transferred and digitized. Thus, if sensor has more pixels to transfer, the total frame rate is reduced. This limitation is one of the reasons of global shutter CIS is not integrated in nowadays smartphone. By contrast, in a rolling shutter mechanism, the CIS scans row-by-row of pixels of the entire image, with the delay between each row. Means that all pixel column in the same row is collecting the light during the same period, with the delay between rows is constant t row . The light collection time is when the pixel sensors are open and integrating lights noted as t exposure . The value of t row and t exposure is dependent on the desired output resolution and shutter speed value which are determined by the user, respectively.
Moreover, the rows of the pixel in this mode are being reset t reset in sequence, starting at the top of the image and proceeding row by row to the bottom. When the reset process has moved some distance down the image, the readout process begins with t readout at the same speed as the reset processing. Finally, when moving from the end time of the last row to the start time of the next frame, a frame-to-frame time gap t f 2f [33], [34] exists, during this time the CIS are completely blind and not recording any signal. Furthermore, to create video file, the CIS combine the recorded consecutive images output with the sampling time of 1/fps.
The overall rolling shutter mechanism process was shown in Fig. 1. In this illustration setup, we placed LED in front of the CIS, and control the "ON" time of the LED. Here, the LED is "ON" during the whole t exposure of the first row, 60% of t exposure of the second row, and 30% of t exposure of the third row. Therefore, the example received image of this scenario is described as follows: the first row of the image frame received the highest light intensity, followed by light gray on the second row, dark gray on the third row, and black on the remaining pixel rows. This different light intensity is based on the grayscale color domain. Thus, we showed that each pixel row is sampled at different times, and the intensity of light that the CIS receives depends on the amount of light that is transferred from the transmitter part.

A. Exploiting Rolling-Shutter CIS to Increase PPG Sampling Rate
In this study, we adopted a diffused reflectance type PPG measurement, as the LED positioning is side by side to the CIS location in the same fingertip tissue location. With this configuration, the mean path of light in tissue is of a bananashape before it reaches the detector [35]. According to the modified Beer-Lambert law [36], the incident light passing through human tissue is not only split into absorbed and transmitted light, but there are also some parts of the lights that are reflected and scattered under the tissue. Thus, the attenuation of light in the tissue was formalized by the Beer-Lambert law as follows: where I o and I are the incident light intensity and its reflected light intensity, respectively; L is the mean optical path length related to the tissue absorption coefficient μ a , the reduced scattering coefficient μ s , and source-detector separation distance l, and the molar extinction coefficient and the tissue chromophore concentration are noted as ε and c, respectively. From the above equation, it can be seen that the sourcedetector separation distance l largely influences the strength of the reflected light intensity I, resulting in different PPG signal amplitude. The PPG amplitude is exponentially reduced as the increase of source-detector separation distance l. On the other hand, we also know that CIS consists of an array of pixels, where each pixel row has a different distance related to the LED position. Hence, this relation can be exploited to extract multiple temporal PPG signal from each pixel row of the rolling-shutter CIS. The acquired PPG signal from different pixel row of the image frame can be modeled as follows: where i ∈ {1, 2, . . . , M} represents the corresponding pixel row number in image frame n. D i can be considered as the constant of the reflected light intensity in pixel row i (DC component), p(n) as a desired pulsatile part of PPG signal (AC component), and w i (n) denotes the noise component due to the camera quantization, camera blooming effects, and motion artifacts. Therefore, a further compensation modeling is also required, to recover the PPG signal and compensate for the effects of D i and w i (t) caused by the different LED to the row separation distance l i . In the previous subchapter, we showed that the pixel rows data sampling is done sequentially row-by-row, and most of the time-based parameters that are used in rolling-shutter CIS are controllable and determinable, such as t row , t exposure , and t f 2f . Thus, we reasoned that the rolling-shutter CIS can be exploited to get the multiple temporal pixel-row-based PPG signal from the same image frame. Therefore, by integrating the assumption that the reflected light I from the finger can be compensated with the modified Beer-Lambert law, and each of the temporal pixel row-PPG is sampled at different time, we may combine multiple temporal pixel-row PPG as a single high-Fs PPG signal. Our proposed method to estimate the rolling-shutter CIS parameters are going to be discussed again later in the following chapter.
IV. PROPOSED RSCPPG ALGORITHM Fig. 2 shows the flowchart of the proposed RSCPPG algorithm, it consists of multiple steps which are going to be explained in this chapter. The proposed RSCPPG algorithm starts with fingertip data recording, here PPG signal is monitored from the subject's index finger, using the back-CIS camera of the smartphone. Furthermore, each recorded image frame is similarly processed on the image preprocessing step. Here, we only implement an image rotation to the acquired image frame if necessary.
Moreover, each step of the proposed RSCPPG algorithm is detailed as follows.
Step 1 (Pixel-Time Information Extraction): In this step, we collect the time information of the RSCPPG signal from the image frame acquisition time information and pixel row index information. An additional experiment was conducted called time information extraction; in this experiment, we used external LED that is controlled with pulse wave modulation (PWM). Here, the LED is blinking with the known duty cycle δ, and are packaged as a header and data of different size. Fig. 3 shows the executive images that are received by the CIS. From this experiment, we may estimate the t row and t f 2f , and found out the time information extraction for each pixel row.
Step 2 (Temporal Pixel Row-PPG Extraction): Accordingly in this step, A(i, j, k) is used to denote the intensity of the pixel at pixel row i and pixel column y of the image frame. From the smartphone camera, k = 3 is obtained corresponding to the r, g, b channels of the standard video output. First, all column j in each row i is getting averaged, thus with the predetermined output resolution, for example with 176p × 144p (pixel columns × pixel rows), A (144, 1, 3) is obtained. Based on our validation study, the r-channel of CIS-PPG shows the most similar morphology compared to the infrared-PD of the reference PPG device with R = 0.99. This finding became the rationale to use only r-channel for this study. Thus, we got A (144, 1, 1). Accordingly, based on (4) and with the assumption that subject's finger is not moving during the experiment, we collected A (144, 1, 1) from each frame n of all recorded image frames and got y i corresponding to different light intensity of underlying PPG signal in each pixel row. Therefore, Fig. 4 shows the acquired y 1 , y 2 , . . . , y 144 from all pixel rows.
Step 3 (Pixel Row-PPG Selection (PRS) Algorithm): Here, we proposed the PRS algorithm to select the θ pixel row out of the 144-pixel rows, in the case of 176 × 144p output resolution. The acquired y 1 , y 2 , . . . , y 144 can be considered as a redundant signal of the same pulsatile part of the PPG signal but sampled at a different time and contained different overall signal strength due to the different reflected light intensity constant A i , and different levels of noise. Therefore, to minimalize the effects of the noise, in our experiments we limited the data acquisition timing to 1 min and instruct the subjects to sit down and stay still during the experiments. By doing this, the subject's finger motion noise may be neglected.
Moreover, in this step the quality metric SNR of each temporal pixel row PPG y i is computed, using We compute the ratio of the energy around the fundamental frequency plus the first harmonic of the pulse signal (SN1) and the remaining energy contained in the spectrum (SN). Due to the very controlled recording with stationary subjects, the spectra are relatively clean. Thus, the highest frequency peak detected from the frequency analysis is selected as the fundamental frequency and the second highest peak is selected as its first harmonics, accordingly. Based on this SNR calculation, we choose one temporal pixel row PPG signal with the best SNR, and automatically select the remaining θ − 1 number of temporal PPG signal. The output of the PRS algorithm are rawtemporal pixel row PPGs noted as z k , with k ∈ {1, 2, . . . , θ}. Fig. 5 shows the example output of the PRS algorithm, in this case we chose θ = 5, and selected five outputs out of the 144 input signals from the previous step.
Step 4 (MRAC Algorithm): Given the raw-temporal pixel row PPG of the PRS algorithm z 1 , z 2 , . . . , z θ , our first step is to create an amplitude compensation model (AmpComp) describing the relation between the z k amplitude to the pixel row index. The proposed MRAC algorithm is introduced here to recover and compensate the PPG signal based on the previously mentioned modified Beer-Lambert law. Here, the z k amplitude is fitted to the 2nd order exponential model with the root mean square error (RMSE) of 1.33 and R 2 of 0.99. The next step is to normalize the signal with where μ(z k ) and σ (z k ) are the mean and standard deviation of the signal z k . Therefore, using the calibrated AmpComp model with the normalized y s as an input, the full step of the MRAC algorithm is summarized in Algorithm 1.
Here, the X(j) is the collection of time information of each row x k that was extracted from the previous step of pixel row to time information, Y(j) is the amplitude of high-rate RSCPPG signal using the MRAC model; a, b, c, and d are AmpComp coefficient, and ψ is the weighting parameter with 0 < ψ < 1. In our model, ψ is calculated with where T is a sampling time interval and τ is the constant. Furthermore, a Savitzky-Golay digital filter is applied to remove the remaining small noise and earn a smooth PPG signal. The output of this step is pseudo RSCPPG, and this is because the signal is sampled depending on the selected pixel rows from the previous PRS algorithm. At the final step of our RSCPPG signal, the linear interpolation is used to resample the acquired signal to become a uniformly sampled RSCPPG signal. Fig. 6 shows the example of our MRAC algorithm output, here we show the extraction of 150-Hz RSCPPG signal from the 30-fps smartphone camera.

A. Experimental Protocol and Data Acquisition
In our experiment, all subjects using the same measurement setup as shown in Fig. 7, respectively. Written informed consent was obtained from participants after we had provided them with a complete description of the study. This study was approved by the ethics committee of the Pohang University of Science and Technology and conducted according to the Declaration of Helsinki ethical principles for medical research on human subjects (PIRB-2019-A004).
Three physiological waveforms were measured simultaneously during the experiments. From each participant, we acquired three reference value, such as: 1) BP value from OMRON HEM-7121; 2) an ECG signal measured with 3-gel electrodes in the Lead II configuration (ECG100C module, BIOPAC Systems, Goleta, CA, USA); and 3) a middle-finger PPG measured with an infrared emitter and PD detector. The LED/PD wavelength is 860 nm ± 60 nm with 3.81 nm spacing (TSD200 with PPG100C module, BIOPAC Systems, Goleta, CA, USA). RSCPPG was recorded using Samsung Galaxy S10 with 16MP ultrawide-angle Camera with ∼ 8 mm spacing to the built-in LED flash (center to center). All the devices other than the RSCPPG were converted to the PC via digital signal using an analog-to-digital converter (ADC) unit (MP150, BIOPAC Systems, Goleta, CA, USA) to record reference BP, ECG, and middle-finger PPG at Fs = 1 kHz. In addition, all devices including smartphones are controlled directly on the PC via universal serial bus (USB) cable with custom-built software to trigger all devices to start synchronized recording. In the current study, only two signals are used. ECG signal is used for HR analysis and PPG signal is used for waveform morphology and pulse wave analysis comparison. Our experiments were performed repeatedly in an indoor laboratory from July to October 2020 under an ambient temperature controlled at 20 • C and relative humidity of 60%. Each subject takes a total experiment time of approximately 30 min. Throughout the course of the experiment, the participants were asked to minimize movement with both their arms placed on the flat surface and on a sitting position. Multiple experiments have been conducted for 27 min with the following protocol. The protocol consists of five steps: 1) calibration step; 2) rest 1 step; 3) exercise step; 4) relax step; and 5) rest 2 step. In the calibration step, rest 1 step and rest 2 step, all signals were measured at 2-min intervals, three times each for 1-min while resting. In the exercise step, the subjects bent their legs slightly on a leg press machine for 7 min. In this step, after waiting 1 min for HR change, we measured all signals for 1-min three times every 2 min. Finally, in relax step, the subject lower their legs and rest for 2 min to recover their heart rate.

B. Performance Analysis
To evaluate our RSCPPG performance, multiple measurement indexes were used, such as average absolute error (AAE), SNR, and correlation coefficient (r). These indexes are used to analyze the RSCPPG features related to the reference PPG signal features, in this case here AAE is defined as follows: Denoted by t j idx are the features from the RSCPPG signal and denoted by t j ref are features from the reference PPG signal. Here, j = 7, t j is defined as the distance of several signal characteristic points from the systolic peak. Eight characteristic points have been derived from the PPG signal as seen in Table II, hence the calculated features for t1, t2, . . . , t7 is shown in Fig. 8.
In our second experiment which is related to the preclinical study, we analyze the signal morphology of RSCPPG signal compared to the reference PPG. In this case, two types of analysis were conducted, that are ensemble average-based morphology analysis and beat-by-beat-based morphology analysis. Here, we used 60-s signal from each experimental session in our experimental protocol. Moreover, we also analyzed the importance of a complete RSCPPG framework, comparing to the one without MRAC, the one without digital filtering, and the prior method frame averaging.

A. CIS Parameters Optimization for RSCPPG
The first experiment focuses on the understanding of camera parameter's effects on the output of the RSCPPG signal. Thus, we specifically designed our Android-based application with the capability of controlling the smartphone CIS camera parameters including its built-in LED flash. We assumed that the different CIS parameters may result in different quality of RSCPPG signal. In this first experiment, we analyzed the data from a total of ten healthy volunteers (24 ± 6 years; gender: eight males and two females; weight: 73 ± 8 kg; height: 172 ± 8) with no histories of cardiovascular disease (CVD).
In the previous chapter, we explained that the time of pixel sensors to open and collecting the light is t exposure . Here, we believed that the instantaneous PPG data points can be obtained if we use the shortest t exposure . The value of t exposure can be controlled when using the newest Android application programming interfaces (APIs) level 21+ and Camera2 API.
Moreover, we also hypothesized that different camera output resolutions may result in different PPG signal quality, more importantly, we are interested in finding out if the small camera output resolution results in high PPG signal quality. If so, then our proposed technique may solve the video data sharing size limitation and potentially reduce much of the smartphone processing energy and time.
1) RSCPPG Signal Quality: We recorded fingertip video signal and extract two kinds of PPG signal, named frame averaging PPG y avg and RSCPPG signal to prove the necessity of RSCPPG in achieving a high-quality bio-signal acquisition using the smartphone CIS sensor. As mentioned previously, the frame averaging method is the most common method to extract PPG signal from the CIS sensor. Here, we compared both signals to the reference PPG signal.  Fig. 9(a) shows the signal comparison of the traditional frame averaging method and the RSCPPG method to the reference PPG signal. For a fair comparison, the y avg were resampled from 30 into 150 Hz using linear interpolation, thus we also showed our pseudo RSCPPG and uniformly sampled RSCPPG 150 Hz here. From this figure, the noticeable difference is those related to the shifted systolic peak PPG detected points. This is due to the distorted signal that has been introduced by the frame averaging method. Here, we also can see that the detected peak of the RSCPPG signal is on the same time window of the reference PPG signal. Then, we calculate the SNR value to compare the signal quality of the traditional method and our proposed RSCPPG method. Fig. 9(b) shows the improvement of SNR from y avg with SNR = 16.2 dB, to the RSCPPG signal with SNR = 25 dB.
2) Effects of Different Camera Parameter: In the previous subchapter, we proved that the smartphone PPG signal quality can be increased using our proposed RSCPPG method. For most of the physiological monitoring applications, full morphology of the PPG signal is used. Thus, we compared the correlation coefficient r of the RSCPPG features related to the reference PPG signal features (noted as t 1 , t 2 , . . . , t 7 and illustrated in Fig. 8).
In this experiment, we categorized two groups of exposure, named short and long exposure. The overall variation of the CIS parameter is described in Table III. From Fig. 9(c), we can see that, the short exposure time group has a relatively high correlation coefficient r with small variation, while the long exposure time has a lower correlation and relatively more unstable (high standard deviation). For example, the shortest t exposure = 1/25.000 have the 0.95 ± 0.03, 0.94 ± 0.03, and 0.92 ± 0.01 for r t1 , r t2 , and r t3 , respectively. Compared to the longest t exposure = 1/100 that only achieve the 0.63 ± 0.06, 0.61 ± 0.06, and 0.51 ± 0.09 for the same parameter.
Most of the previous study did not consider the effects of this t exposure value as most of them is using auto value generated from the Android API [15], [16], [22]. The auto mode is very useful for common photography, and mostly set the t exposure to the longer value as the algorithm is proposed to create a balance in the created image based on the three values of exposure, focus, and white balance. However, based on the results, the shorter exposure time produced a more relevant RSCPPG signal to the reference PPG compared to the higher exposure time-based RSCPPG. Thus, we encourage to utilize shorter exposure to record PPG signal from smartphone CIS. Fig. 9(d) shows the performance comparison of morphology results in different finger positions. Here, we tested two-finger relation position to the light source, which is the vertical and horizontal position. From this experiment, we found out that the horizontal position produces a highly correlated signal morphology compared to the vertical position. Table IV compares the effects of different output resolutions on the PPG signal morphology. Moreover, the representative example of the acquired RSCPPG signal in these different output resolutions can be seen in Fig. 10. Two output resolution with 1080-pixel rows and 144-pixel rows are used. As seen from the table, the correlation coefficient of both resolutions is comparable, with the average of r > 0.95 for all time points. However, if we compared the file data size of the 1920p × 1080p to the 176p × 144p file data size, the higher resolution is having a much bigger file with 3-10 MB, while the lower one is having only ∼ 500 kB files. This is due to higher resolution values pack more pixels into a linear inch, called pixel per inch (PPI), thus resulting in more pixel information, and creating a high-quality image. From this experiment, we can see that the proposed RSCPPG algorithm is working fine with the lower output resolution.

B. Preclinical Study Results
In total, 50 healthy volunteers (32 ± 8 years; gender: 35 males and 15 females) with no histories of CVD took part in our preclinical study. Out of 50 subjects, four subjects are failed to finish the exercise experimental protocol, and two subjects refused to finish the experiment. Thus, only data from 44 subjects are being analyzed in this study. First, we compared the correlation coefficient r of the acquired RSCPPG  Representative ensemble average comparison of RSCPPG and reference PPG signal. signal from each subject to the referenced PPG signal. The results can be seen in Fig. 11. From all the subjects, we got the average r (r) of ensemble averaged and a beat-by-beat signal from all subjects of 0.94 ± 0.04 and 0.85 ± 0.05, respectively. Here, the r is remained relatively high from all the subjects in both ensemble-averaged and beat-by-beat RSCPPG signal, it also can be seen from the acquired ensemble-averaged RSCPPG signal as seen Fig. 12.
Tables V and VI show the representative AAE calculation on features extracted from ensemble-averaged RSCPPG and beat-by-beat RSCPPG signal, respectively. Here, we can see that the average AAE of ensemble-averaged RSCPPG signal of all features is ∼ 0.009 s, meaning the average AAE ∼ = 1 data points with 150-Hz sampling rate. However, for the beat-by-beat signals, the average AAE is reduced to ∼ = 17 data points. The reduced average AAE of beat-bybeat analysis is possibly due to multiple reasons, such as shaking hands, sweaty finger, and subjects tend to push the CIS sensor. Fig. 13 shows the example of acquired RSCPPG signal when the subject pushing the smartphone CIS, here we can see that hard push (left picture) may results in reduced SNR thus showing bad PPG signal morphology. Therefore, ensemble-averaged RSCPPG is recommended for further feature extraction from our RSCPPG signal. Furthermore, we also extracted frame averaging from our preclinical study database. The comparison of the two methods, RSCPPG and the traditional frame averaging to the reference PPG signal is shown in Table VII. Until now, due to the limited CPPG Fs, only the frame averaging method can be used as a benchmark algorithm for our research. As previously mentioned, we hypothesized that the main problem of a limited usability of CPPG is due to the low sampling rate of the prior extraction method. Thus, here we proved that hypothesis in Table VII  AAE is reduced using our proposed RSCPPG method, in percentage the average reduction of AAE of all features is ∼ 30%. This experiment proved that our RSCPPG algorithm is achieving high-quality PPG signal, highly correlated signal to the reference PPG signal, and producing a highly comparable morphology, which shows from the less AAE error value.

VII. DISCUSSION
This study proposed the RSCPPG algorithm which targeted to allow a high-Fs CPPG signal a built-in-low-fps smartphone CIS (in this case 30-fps smartphone CIS). Before we started the RSCPPG experiments, we did some validation studies related to the utilization of the color channel of CIS. As mentioned previously, in CIS three output channels in different optical wavelengths that are red, green, and blue channel Moreover, we demonstrated the camera parameters, such as exposure time t exposure , finger positions, and output resolutions can be optimized to acquire a high-quality CPPG signal. Based on our results, short exposure time, e.g., t exposure = 1/25.000 give a highly correlated CPPG signal compare to the reference PPG signal. Here, we showed that using a long exposure time, which is commonly used by other research, e.g., automode [15], [16], may result in lowly correlated extracted features (e.g., r = ∼ 0.6 for r t1 , r t2 , r t3 , . . . , r t7 , to the features from the reference CPPG signal. These lowly correlated overall signal morphologies may indicate the reasons why the previous CPPS algorithm is only capable of extracting HR value from the CPPG signal. In short exposure time RSCPPG, the CIS lens is open in a very short time hence allowing an instantaneous reflected light collection of the subject's PPG signal. For measuring pulse rates, the switching time of the PD components had to be drastically reduced so that the light signal modulated with the heart could be time resolved [37]. Furthermore, we performed a preclinical study on 44 of the total 50 recruited subjects. We analyzed the calculated features from the ensemble-averaged RSCPPG and beat-by-beat RSCPPG signal. Moreover, we also compared our proposed RSCPPG signal to the traditional frame averaging method, hence our experimental results showed that our RSCPPG have a lower AAE value in all extracted time features t 1 , t 2 , . . . , t 7 (with average AAE of RSCCPG is lower by ∼30% compared to the traditional frame averaging). From these results, we can see that our proposed RSCPPG algorithm tackles the typical problem of CPPG which is only useful for a simple HR analysis. The proposed RSCPPG algorithm is capable of increasing the CPPG data sampling with Fs equal to five times of CIS fps (Fs = 150 Hz).
Some examples of the potential use of these features in multiple physiological assessment are discussed as follows. First, related to the position of the characteristic points, here the use of features from the second derivative of PPG is well-investigated in [38], [39], [40], [41], [42], and [43]. The ratio multiple combinations of a, b, c, d, and e waves are examined as an indicator of arterial stiffness/vascular aging [38], [39] and for arteriosclerotic disease screening [38]. Moreover, the distance between features is also important for further analysis. For example, t 6 (the distance between systolic peak to diastolic peak), is found to be useful to establish the risk factor of CVD [40]. The cuffless BP and cardiac output value estimation algorithm also can be generated using some features from PPG, such as pulse arrival time (PAT) [41], pulse transit time (PTT) [42], and inflection point area ratio (IPAR) [43]. Therefore, we also conducted a preliminary analysis to check the feasibility of the extracted RSCPPG, the details of this experiment are explained in Supplementary Note 1. In total, five features are extracted related to the arterial stiffness and stress assessment, and cuffless BP assessment, which are d/a ratio, crest time (CT), T, stiffness index (SI), and PAT value. Thus, we found that all extracted RSCPPG-features achieved a relatively high correlation to the reference PPG features with the average of overall features r = 0.88 (see Supplementary Table II). Finally, the representative results which show the relation to the d/a ratio to the subject age, and PAT to the systolic BP value are shown in Fig. S2(a), and Fig. S2(b), respectively. Combining high-Fs CPPG with MRAC algorithm allowed a good quality PPG signal with extractable PPG features or characteristic points for multiple physiological application. Table VIII also shows the importance of the full RSCPPG framework compared to the spatial RSCPPG framework, such as method 1 that is RSCPPG with removing MRAC and method 2 that is RSCPPG with removing digital filter on MRAC step. The results show that if MRAC or digital filtering was missing, the performance of RSCPPG was reduced. Thus, it showed the importance of full RSCPPG frameworks. Therefore, when we compare the full RSCPPG framework with the traditional frame averaging method, we found that the full RSCPPG framework achieves a high correlation coefficient r of 0.94 ± 0.04 and 0.85 ± 0.05 for ensemble averaged and beat-by-beat signal, respectively. On the other hand, the traditional frame averaging method that only achieves correlation coefficient r of 0.89 ± 0.06 and 0.79 ± 0.08 for ensemble averaged and beat-by-beat signal, respectively. Finally, the limitation and the direction of the future works of our proposed RSCPPG method are mentioned as follow. In this study, we only tested the time-based morphological features to check the overall signal morphology compared to the reference PPG signal. For the richer assessment in physiological measurements, other features in the frequency and energy domain need to be analyzed further. As this manuscript focused on enabling smartphone to produce a highly correlated PPG signal to the reference PPG with the RSCPPG method, thus a further clinical study is also required to validate the proposed RSCPPG signal for different type of assessment, including, but not limited to, arterial stiffness, stress monitoring, and cuffless BP monitoring. In our preclinical study, all subjects are instructed to minimize movement during the data recording, thus further experiments are required to analyze the effects of external interference on the acquired signal quality thus its chain effects on the feature extraction. Also, one of the emerging methods for PPG-derived physiological monitoring in mobile devices is related to the BP monitoring system [41], [44]. Thus, for future works, we would like to examine the capability of the proposed RSCPPG signal to estimate BP continuously and cuffless from only the CIS built-in sensor of the smartphone.

VIII. CONCLUSION
This study proposed a novel method named RSCPPG to produce a high-quality PPG signal on smartphone using an unmodified built-in camera. The main goal of this study is to acquire a high-rate and high-quality PPG signal using the common smartphone for IoT-based healthcare by exploiting the rolling shutter camera mechanism. Here, we prove the importance of controlling t exposure for smartphone PPG, to get a high correlated signal compared to the reference PPG. Results indicate that the short exposure time t exposure = 25 000 shows a high correlated signal with the average r = 0.95. The proposed RSCPPG algorithm also allows a 150-Hz PPG data extraction from multiple pixel rows. Moreover, we conduct a preclinical study with 44 out of 50 recruited subjects to find the effectiveness of the RSCPPG algorithm in a designed leg-press exercise protocol. The beat-by-beat and ensemble average-based PPG analysis has been conducted to calculate the overall performance of the proposed RSCPPG algorithm in this preclinical study experiment. Our results show that the RSCPPG correlated well with reference PPG for all subjects with the r of ensemble averaged and beat-by-beat signal of 0.94 ± 0.04 and 0.85 ± 0.05, respectively. Moreover, compared to the prior method, which is frame averaging, our RSCPPG shows the reduced average AAE on all extracted features with an average of reduction 30%. From our experimental results, we proved that the proposed RSCPPG algorithm is a reliable bio-signal acquisition technique in the IoT-based mobile healthcare technology. In conclusion, our RSCPPG signal is utilizing a common built-in smartphone CIS without any modification, hence it may transform any of nowadays smartphone into a powerful healthcare analysis platform, such as clinical physiological monitoring, vascular assessment monitoring, and autonomic function assessment monitoring.