A leak in PRNU based source identification? Questioning fingerprint uniqueness

Photo Response Non Uniformity (PRNU) is considered the most effective trace for the image source attribution task. Its uniqueness ensures that the sensor pattern noises extracted from different cameras are strongly uncorrelated, even when they belong to the same camera model. However, with the advent of computational photography, most recent devices of the same model start exposing correlated patterns thus introducing the real chance of erroneous image source attribution. In this paper, after highlighting the issue under a controlled environment, we perform a large testing campaign on Flickr images to determine how widespread the issue is and which is the plausible cause. To this aim, we tested over $240000$ image pairs from $54$ recent smartphone models comprising the most relevant brands. Experiments show that many Samsung, Xiaomi and Huawei devices are strongly affected by this issue. Although the primary cause of high false alarm rates cannot be directly related to specific camera models, firmware nor image contents, it is evident that the effectiveness of PRNU-based source identification on the most recent devices must be reconsidered in light of these results. Therefore, this paper is to be intended as a call to action for the scientific community rather than a complete treatment of the subject.


Introduction
Photo Response Non Uniformity (PRNU) is considered the most distinctive trace to link an image to its originating device [1].Such a trace has been studied and improved for more than a decade and can be used for several tasks: (i) attribute an image to its source camera [2]; (ii) determine whether two images belong to the same camera [3]; (iii) cluster a large amount of images based on the originating device [4]; (iv) determine whether an image and a video have been captured with the same camera [5]; (v) detect and localize the presence of a manipulated image region [2].
After its introduction [1], several refinements were introduced to improve the usage of the PRNU trace under challenging scenarios.Non-unique artifacts introduced by color interpolation and JPEG compression were studied and removed [2]; a more general approach was proposed to manage the case of cropped and resized images and the peak-to-correlation-energy (PCE) metric was introduced as a more robust way to measure the peak value of the correlation [6], [7]; several filters and preprocessing steps have been proposed to further improve its effectiveness and efficiency [8], [9], [10], [11], [12]; PRNU compression techniques have been developed to enable very large scales operations [13], [14], previously impossible due to the size of the PRNU and to the complexity of the matching operations.Such a trace has also been studied under more complicated setups, e.g. when the media is exchanged through a social media [15], or when it is acquired with the digital zoom [6].
All the mentioned works have been carried out under rather controlled scenarios, where the images used in the experiments were taken by camera devices or smartphones that follow the standard acquisition pipeline [16].
The first step towards more modern acquisition devices have been represented by the study of PRNU detection in presence of Electronic Image Stabilization (EIS); EIS introduces a pixel grid misalignment that, if not taken into account, leads PRNU based testing to failure [17] [18].Anyhow, the PRNU effectiveness has been never put in doubt since, in all the above scenarios, the main risk was related to the increase of the missing detection rate only (i.e. the source camera is not identified), whereas the false alarm rate (i.e. the image is attributed to a wrong source camera) has always been negligible.
With the advent of computational photography, even more new challenges appear since the image acquisition pipeline is rapidly changing, and it appears to be strongly customized by each brand: the main novelties include the design of pixel binning [19], [20] and of customized HDR algorithms [21] in the image acquisition process; the exploitation of artificial intelligence for the application of customized filters [22] in the in-camera processing step.
These new customized pipelines can possibly introduce new non-unique artifacts, shared among different cameras of the same models, or even among cameras of different models, that could disturb the PRNU extraction process and thus pose the serious risk that images belonging to different cameras expose correlated pattern, thus increasing in an unexpected way the false alarm rate in PRNU detection.This is a very serious issue since PRNU based source identification is currently used as evidence in court 1 and is implemented in several forensic software supplied to law enforcement agencies and intelligence services, like PRNU Compare Professional software 2 developed by the Netherlands Forensic Institute (NFI), and Amped Authenticate 3 from Amped Software.
To the best of our knowledge, this paper represents the first study where the PRNU-based source identification is tested under a large set of images taken by the most recent devices that exploit the newest imaging technologies.In particular, we will highlight that this forensic technique, applied as it is on modern cameras, is not reliable anymore since it is strongly affected by unexpected high correlations among different devices of the same smartphone model and/or brand.Considering that PRNU-based source camera identification is currently used by law enforcement agencies worldwide, often to investigate serious crimes such as child sexual exploitation, we believe it is fundamental that the scientific community cross-verifies the results we obtained (all datasets presented in the paper are made available) and tries to shed light on this potentially disruptive discovery as promptly as possible.
The paper is organized as follows: in Section 2 the theoretical framework and the main pipeline for the PRNU extraction and comparison is summarized; in Section 3 the collected images and cameras are described; in Section 4 the state-of-the-art performance are verified on the currently available benchmark datasets.Fingerprints collision analysis is reported in Section 5: we highlight that high values of correlation can be found on some very recent smartphone cameras; then, we assess how widespread the problem is by performing large scale test on Flickr images.Section 6 highlights the achieved conclusions.
Everywhere in this paper vectors and matrices are indicated in bold as X and their components as X(i) and X(i, j) respectively.All operations are element-wise, unless mentioned otherwise.Given two vectors X and Y, ||X|| is the euclidean norm of X, X • Y is the dot product between X and Y, X is the mean value of X, ρ(s 1 , s 2 ; X, Y) is the normalized cross-correlation between X and Y calculated at the spatial shift (s 1 , s 2 ) as where the shifts [i, j] and [i + s 1 , j + s 2 ] are taken modulo the horizontal and vertical image dimensions 4 .Furthermore, we denote maximum by ρ(s peak ; X, Y) = max s1,s2 ρ(s 1 , s 2 ; X, Y).The notations are simplified in ρ(s) and in ρ(s peak ) when the two vectors cannot be misinterpreted.

PRNU based source identification
PRNU defines a subtle variation among pixels amplitude due to the different sensitivity to light of the sensor's elements.This defect introduces a unique fingerprint onto every image the camera takes.Then, camera fingerprints can be estimated and compared among images to determine the originating device.The camera fingerprint is usually estimated from n images I 1 , . . ., I n as follows: a denoising filter [1], [23] is applied to the images to obtain the noise residuals W 1 , . . ., W n ; the reference camera fingerprint estimate K is derived by the maximum likelihood estimator [2]: Two further processing are applied to K to remove demosaicing traces, JPEG blocking and other non-unique artifacts [2].
The most common source identification test tries to determine whether a query image I belongs to a specific camera.Given W the noise residual extracted from I and the reference camera fingerprint estimate K, the two-dimensional normalized cross-correlation ρ(s 1 , s 2 ; X, Y) is computed with X = I K, Y = W for any plausible shift (s 1 , s 2 ); then the peak-to-correlation energy (PCE) ratio [7] is derived as where V is a small set of peak neighbours and (m, n) is the image pixel resolution.When P CE > τ , for a given threshold τ , we decide that W is found within I, i.e. the image belongs to the reference camera.A threshold of 60 is commonly accepted by the research community since it guarantees a negligible false alarm rate (FAR) [7], [17].[24], [18].When multiple query images are available from the same camera, a query fingerprint K Q can be estimated (through Eq. 1) and the test can be performed by using X = K Q and Y = K.This test is preferable since it allows suppressing most of the image content on both reference and query side.However, multiple query images from the same camera may not be available in practical cases.
The above test can be also applied to determine whether two images belong to the same camera, in order to perform a device linking.In this case the query and reference fingerprints are simply estimated by a single image.In device linking it is expected to expose a higher missing detection rate (MDR) since the strength of PRNU on a single image can be negligible and strongly contaminated by image content.However, the FAR is still negligible (see Section 4 for more details).
In the next section we will show that fingerprints estimated from different devices exhibit unexpected high correlations on most recent devices.

Data collection
In order to understand the impact of the technological novelties in the imaging field on the PRNU detection, we considered three independent datasets: [15], used as benchmark to verify the effectiveness of the source identification task; VISION is currently the most adopted dataset for the assessment of the current image forensics algorithms.It includes images acquired by 35 devices belonging to 11 different brands: Apple, Asus, Huawei, Lenovo, LG electronics, Microsoft, OnePlus, Samsung, Sony, Wiko, and Xiaomi.It is worth to note that all these devices have been released on or before 2016, so their imaging technology is starting to be obsolete.For each device, there are at least 60 flat images representing flat surfaces (e.g., skies or walls) and 130 natural images representing generic scenes.
• Control dataset: it is a collection of images acquired from 23 more recent smartphones, including 17 different models belonging to the brands Huawei, Xiaomi, Samsung and Apple.As shown in Table 1, each device is uniquely identified with an ID, since for some models we collected more than one device.
For each smartphone 5 flat images and 10 natural images were collected.In this case, these devices have been released after 2018, with the only exception of Apple iPhone X and Huawei P10, presented in 2017; in any case, all these models are more recent than the ones in VISION.
• Flickr dataset: we collected Flickr images from the same models considered in the Control dataset, plus 31 other camera models chosen among the most widespread in the market5 (see Table 2).For each targeted device model, we selected and downloaded 100 images (or less, when such number was not available) ensuring they met the following constraints: 1. Exif Make and Model metadata were present and matched the targeted device; 2. image resolution matched the maximum resolution allowed by the device.For some devices, especially those featuring pixel binning, more than one resolution was selected for download.In such cases, 100 images were dowloaded for each resolution; 3. image metadata did not contain traces of processing software.To achieve this, we checked the Exif software tag against a blacklist of over 90 image processing software; 4. no more than 10 images from the same Flickr user were downloaded.This was an upper bound, however, in most cases the abundance of available images allowed selecting pictures from tens of different users.
Eventually, we collected 6719 flickr images.Let us highlight that, while in the Control dataset each ID represents a specific exemplar, in the Flickr dataset we gathered images belonging to multiple exemplars for each model, corresponding to the number of users indicated in Table 2. Images belonging to Control and Flickr datasets will be made available to researchers upon request to the corresponding author.

Performance on benchmark dataset
Given a good reference fingerprint, an image can be effectively linked to its originating device.This fact is already highlighted in [15] where Shullani et al. show that when a good reference is available, the source identification task on a single image can be achieved with great accuracy.Indeed, tests on native contents from the VISION Dataset produce an Area Under Curve (AUC) of 0.99.Thanks to these results and to similar experiments carried out in other datasets, forensic experts, law enforcement agencies and the research community agree on the effectiveness of the source identification task.More specifically, a false positive is expected to be a very rare event, thus assuring that a high PCE value represents a very strong and reliable finding.
It's worth noticing that good performance can be achieved even in device linking, when only two images are compared.To demonstrate this fact, we randomly selected 20 natural images per each VISION device, and we computed the PCE between fingerprints estimated from single images, thus obtaining 6650 matching image pairs and 238000 mismatching image pairs.In Fig. 1 the achieved PCE statistics are reported in boxplot form for each device.Matching and mismatching cases are reported in green and red respectively.When the commonly agreed threshold of 60 is exploited to determine whether two images belong to the same device, only 30 mismatching couples overcome the threshold.As expected, the method then assures a low Table 2: List of tested camera models composing the Flickr dataset.In bold, we highlight the camera models also present in the control dataset.For each model, the number of collected images from the given number of Flickr users is shown.In case of devices for which multiple resolutions were considered, both images with maximum resolution (first number) and with lower resolution (second number) are indicated.(≤ 0.001) false alarm rate even when single images are compared.Furthermore, if we consider a more careful threshold of 100, false positives completely disappear.
In the next section we apply the same scheme on images belonging to the Control dataset.

Fingerprints collision
For each device of the Control dataset we estimated the camera fingerprint from 5 flat images, and we computed the PCE among all fingerprint couples.In Fig. 2 we report the achieved PCE values in a confusion matrix.Since the fingerprints belong to different sensors, we expect that all achieved peaks are below 60. Surprisingly, we obtained high PCE values among the three available Huawei P20 Pro and among the Samsung S9, the Samsung S9+ and the Samsung S10e.These results highlight that unexpected high correlation can happen among different cameras of the same model (Huawei P20 Pro) and among different camera models of Figure 1: PCE statistics computed between couple of images from VISION devices.Matching and mismatching couples are reported in green and red respectively.The threshold of 60 is highlighted by the red dotted line.

Discussion
The results presented in the previous sections deserve some comments.Firstly, false positives are not independent on the resolution settings: the Galaxy S20 Ultra, for example, suffers from false positives at the 12M P binned resolution, but not at the maximum resolution.Similarly, false positives are found on the Samsung Galaxy S10+ at 12M P but not at 16M P .Noticeably, results do not allow establishing am apparent link between false positives and devices featuring the pixel binning technique; for example, the Redmi Note 8T model features such technique but has a low false positive rate, while the Samsung Galaxy S10 does not use pixel binning but is strongly affected by false positives.Finally and remarkably, some devices, e.g. the Samsung Galaxy S9, yield false positives in the Control dataset (see Figure 2) but not in the Flickr dataset (bottom of Figure 3).Even comparing image metadata (e.g.firmware version, picture acquisition date, camera settings) we could not find an explanation for this fact, which remains an open question.The existence of devices (e.g., the Samsung Galaxy S9) whose fingerprint uniqueness is questionable in the Control dataset but seems not questionable in the Flickr dataset is worrying, as it suggests that, when working on a real investigation, even obtaining images from many exemplars of the same camera model could not suffice to compute a reliable threshold with the classical Neyman-Pearson approach.

Conclusions
In this paper, we have conducted a large scale validation of the classical PRNU-based source camera identification on a dataset of modern devices, with a particular focus on image-vs-image matching.After ensuring, using the well-known VISION dataset, that the image-vs-image noise residual comparison was a meaningful test, we repeated the experiment both on the Control dataset, containing 23 devices, and on a dataset obtained from Flickr, containing 54 devices.Results show that fingeprint uniqueness is not guaranteed for many device models: for the widely adopted PCE threshold of 60, false positive rates larger than 1% were observed for popular devices belonging to Huawei, Samsung, Nokia, and Xiaomi.Based on our results, instead, recent Apple devices seem not affected, their results being totally in line with those observed on the older VISION dataset.
After examining and discussing results, we believe this papers opens two fundamental problems: 1) understanding if there is a single discriminating element which hinders fingerprint uniqueness for some models; 2) devising a general test that, starting from some images of an investigated device, tells whether PRNU source identification can be considered reliable for that specific device.In the scope of this paper, we were not able to answer the above questions in general, but we were able to reasonably exclude some possible explanations, such as pixel binning.
Considering the widespread, worldwide application of PRNU-based source identification by law enforcement agencies, we believe it is of paramount importance to shed light on the issues raised in this paper.Therefore, this paper is to be intended as a call to action for the scientific community, which is invited to reproduce and validate our results and answer the questions that remained open.

Figure 2 :
Figure 2: PCE statistics computed among different camera fingerprints in the Control dataset.

Figure 3 :
Figure 3: PCE statistics computed between couple of images from Apple (top), Huawei (middle), and Samsung (bottom) devices.Matching and mismatching couples are reported in green and red respectively.The threshold of 60 is highlighted by the red dashed line.

Figure 4 :
Figure 4: PCE statistics computed between couple of images from Xiaomi and other devices.Matching and mismatching couples are reported in green and red respectively.The threshold of 60 is highlighted by the red dotted line.

Table 1 :
List of devices belonging to the Control Dataset.