Loading [MathJax]/extensions/MathZoom.js
TVOR: Finding Discrete Total Variation Outliers Among Histograms | IEEE Journals & Magazine | IEEE Xplore

TVOR: Finding Discrete Total Variation Outliers Among Histograms


TVOR is a method for finding histogram outliers in terms of smoothness described by discrete total variation regardless of the underlying distribution of the histograms' ...

Abstract:

Pearson's chi-squared test can detect outliers in the data distribution of a given set of histograms. However, in fields such as demographics (for e.g. birth years), outl...Show More
Notes: Editor's Note: Concerns have been raised by readers about this article and the context in which it is being discussed publicly. We do not share or endorse the views of the authors. The article should be read in conjunction with this published Comment: "Comment on "TVOR: Finding Discrete Total Variation Outliers Among Histograms"" by Melkior Ornik, in IEEE Access, Volume 9, May 2021, DOI: 10.1109/ACCESS.2021.3082900. (Derek Abbott, Editor-in-Chief, IEEE Access)

Abstract:

Pearson's chi-squared test can detect outliers in the data distribution of a given set of histograms. However, in fields such as demographics (for e.g. birth years), outliers may be more easily found in terms of the histogram smoothness where techniques such as Whipple's or Myers' indices handle successfully only specific anomalies. This paper proposes smoothness outliers detection among histograms by using the relation between their discrete total variations (DTV) and their respective sample sizes. This relation is mathematically derived to be applicable in all cases and simplified by an accurate linear model. The deviation of the histogram's DTV from the value predicted by the model is used as the outlier score and the proposed method is named Total Variation Outlier Recognizer (TVOR). TVOR requires no prior assumptions about the histograms' samples' distribution, it has no hyperparameters that require tuning, it is not limited to only specific patterns, and it is applicable to histograms with the same bins. Each bin can have an arbitrary interval that can also be unbounded. TVOR finds DTV outliers easier than Pearson's chi-squared test. In case of distribution outliers, the opposite holds. TVOR is tested on real census data and it successfully finds suspicious histograms. The source code is given at https://github.com/DiscreteTotalVariation/TVOR.
Notes: Editor's Note: Concerns have been raised by readers about this article and the context in which it is being discussed publicly. We do not share or endorse the views of the authors. The article should be read in conjunction with this published Comment: "Comment on "TVOR: Finding Discrete Total Variation Outliers Among Histograms"" by Melkior Ornik, in IEEE Access, Volume 9, May 2021, DOI: 10.1109/ACCESS.2021.3082900. (Derek Abbott, Editor-in-Chief, IEEE Access)
TVOR is a method for finding histogram outliers in terms of smoothness described by discrete total variation regardless of the underlying distribution of the histograms' ...
Published in: IEEE Access ( Volume: 9)
Page(s): 1807 - 1832
Date of Publication: 24 December 2020
Electronic ISSN: 2169-3536

References

References is not available for this document.