Loading web-font TeX/Math/Italic
NPT-Loss: Demystifying Face Recognition Losses With Nearest Proxies Triplet | IEEE Journals & Magazine | IEEE Xplore

NPT-Loss: Demystifying Face Recognition Losses With Nearest Proxies Triplet


Abstract:

Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the design of a...Show More

Abstract:

Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the design of a loss function that ensures discrimination between various identities. The state-of-the-art (SOTA) solutions utilise normalised Softmax loss with additive and/or multiplicative margins. Despite being popular and effective, these losses are justified only intuitively with little theoretical explanations. In this work, we show that under the LogSumExp (LSE) approximation, the SOTA Softmax losses become equivalent to a proxy-triplet loss that focuses on nearest-neighbour negative proxies only. This motivates us to propose a variant of the proxy-triplet loss, entitled Nearest Proxies Triplet (NPT) loss, which unlike SOTA solutions, converges for a wider range of hyper-parameters and offers flexibility in proxy selection and thus outperforms SOTA techniques. We generalise many SOTA losses into a single framework and give theoretical justifications for the assertion that minimising the proposed loss ensures a minimum separability between all identities. We also show that the proposed loss has an implicit mechanism of hard-sample mining. We conduct extensive experiments using various DCNN architectures on a number of FR benchmarks to demonstrate the efficacy of the proposed scheme over SOTA methods.
Page(s): 15249 - 15259
Date of Publication: 28 March 2022

ISSN Information:

PubMed ID: 35344485

Funding Agency:


1 Introduction

Face recognition (FR) has a wide variety of applications including surveillance, access-control, health-care, advertisement etc. Owing to its significance, it is a widely studied topic in computer vision literature. Recently, deep convolution neural network (DCNN) based solutions [1], [2], [3], [4], [5], [6], [7] have seen remarkable success in FR applications and these methods have replaced the classical FR techniques altogether. Generally, all state-of-art CNN based systems rely on the following procedure: in the training phase, a deep CNN is trained using a large scale datasets such as CasiaWeb [8] and/or MS-Celeb1M [9]. Some preprocessing such as face detection and alignment is carried out before training, and a suitable loss function, such as triplet loss [6], normalised Softmax [4], ArcFace [2] etc., is used to train the network. Once the training is complete, the loss layer is discarded and the output of the CNN (an n-dimensional vector with n usually equal to 512 or 2048) is treated as the feature vector corresponding to a given input face image. In the testing phase, a pair of inputs is fed to the trained network and the cosine similarity of the resulting feature vectors is evaluated. If the score is greater than a given threshold, then the image pair is recognised as belonging to the same identity.

Contact IEEE to Subscribe

References

References is not available for this document.