Can Reverse Nearest Neighbors Perceive Unknowns?

A novel open set classifier is presented in this work, where the neighborhood of a test instance is determined using the principles of Reverse <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-nearest neighbors (<inline-formula> <tex-math notation="LaTeX">$\text{R}k$ </tex-math></inline-formula>NN). The <inline-formula> <tex-math notation="LaTeX">$\text{R}k$ </tex-math></inline-formula>NN count of an instance can have any non-negative value less or equal to the size of the training set. While dealing with an open dataset, consisting of <italic>known</italic> and <italic>unknown</italic> classes, the zero count can provide a possible solution for detecting the unknown class. Positive <inline-formula> <tex-math notation="LaTeX">$\text{R}k$ </tex-math></inline-formula>NN count along with the nearest <inline-formula> <tex-math notation="LaTeX">$\text{R}k$ </tex-math></inline-formula>NN distance information are used to determine the known class classifications. Experiments are carried out on ten real world datasets, with various openness values on five state-of-the-art open set learners and the proposed scheme. Their performance is measured on three evaluating metrics namely <italic>accuracy, average</italic> <inline-formula> <tex-math notation="LaTeX">$F_{1}$ </tex-math></inline-formula> <italic>over known and unknown classes, and Known class</italic> <inline-formula> <tex-math notation="LaTeX">$F_{1}$ </tex-math></inline-formula>. Empirical results indicate comparable to superior performance delivered by the proposed method over the state-of-the-art approaches on all but one dataset.


I. INTRODUCTION
A conventional classification task aims to assign the instances to any one of the known classes whereas unknown class detection deals with recognition of the instances belonging to unknown classes in addition to the known ones. An unknown class is discrepated from the known classes on the basis of the non-availability of its (unknown class's) instances during the training phase. Though classification and detection are performed simultaneously by humans, machines often fail to accomplish the latter efficaciously. Perception and consequent detection of unknowns pose a serious challenge for the machine, which is designed to operate in a 'closed' world. Classifier design and presumptions made by us primarily account for such a disparity. We grow and learn in an unknown world with an incrementally growing known subspace whereas our classifiers are trained in a 'closed' setting of known distributions and classes. Traditionally, it is considered ideal when the training set and the test set have as similar distributions as possible. On assuming the above, a classifier is forced to restrict its prediction into the set of training classes. While predicting a test set consisting of seen and unseen class instances, the unseen instances get camouflaged as seen instances and thus get misclassified.
The associate editor coordinating the review of this manuscript and approving it for publication was Zhao Zhang .
The above mentioned problems can be generalized as follows. At the training phase we have instances belonging to any one of the c possible classes where c ≥ 1. Unlike regular classification, during testing the instances can be a member of any one of the c + u classes, u ≥ 1, the c known classes are seen during the training as well as test phase while the remaining u classes which constitute the set of unknown class/es appear in the test phase only. Before proceeding with further details, we should distinguish an open world recognition from anomaly detection and outlier detection. While dealing with the latter, one has to detect the rare events or instances which deviate from the available population. In an open set scenario, we have an universe. During training, we are provided information about only a few aspects (known classes) of the universe but in the test phase we have to classify what we have seen before (known classes) and detect the ones that that we have not encountered earlier (unknown classes). We may also have some classes which we do not encounter in either training or test phases. Openness of a dataset is the degree of unknownness in the dataset. For quantifying this characteristic the following definition is provided by [1].
Target class consists of all the training and test classes as well as the leftover unknown classes that do not participate in the training and testing. The goal is to predict the known class correctly along with recognition and prediction of the unknown class test instances. In practical scenario, we cannot really quantify the degree of openness because the unknown remains unknown.
Extant classifiers predict 'closed' class-memberships in terms of the known classes only. A true open set solution has to possess the capability of saying 'no' or 'unknown' when a test point is coming from an unknown class. In this work, we attempt to answer this by raising a simple question. Instead of querying a test instance p about its nearest neighbors in a given search space, we query about the reverse nearest neighbors of p. Reverse k-nearest neighbors of an instance p are all those points in a given search space whose knearest neighborhood contains p. When n is the total number of training points, p's RkNN count can be anything between n and 0. When all the instances in search space has p as one of its k-NN, R-kNN count of p is n and it indicates sufficient belongingness of p to the given search space. On the contrary, when p does not lie in any point's neighborhood, it indicates significant disharmony between p and all others members of the search space. The latter situation is our motivation for rejecting and unfolding the prediction into the unknown class. In this paper, we present a novel reverse k-nearest based classification scheme which performs simultaneous classification into the known classes as well as to the unknown class. The key aspect of our work is the simplicity of the scheme. We do not require to provide any information other than the training or known class instances and their respective class labels. The proposed scheme does not require any distance based thresholding for demarcation of the known and unknown spaces, the only user-modulated parameter is neighborhood size k. In the next section, we present the literature review.

II. OPEN SET CLASSIFICATION
A closed set classifier makes it's prediction within the set of classes that it encounters in the training phase. It assumes that all classes of the test data (queries) were well represented at the training phase. Closed set classifiers, mostly built on Bayesian Optimal Posterior Probability model assumes that a fixed set of classes shares the real space and it (the classifier) has to predict to any of these classes according to the class boundaries. If the number of classes is c, it computes P(C j |x) for j = 1, 2, . . . , c and assigns the query instance to the class i which gives maximum value of P(C i |x), i = 1, 2, . . . , c.
Open set classification is a type of classification problem where an instance belonging to the unknown class appears at the test phase. Unknown class denotes a class which had zero or no representation at the training time. An open set classifier can encounter instances from such unrepresented class/es at the test phase and should predict them as unknown instead of classifying them into the known classes. Open set classification is different from anomaly detection as well as incremental learning. In incremental learning, the scheme is to add the newly encountered classes to the database of seen classes on encountering it's instance. On the contrary, in open set classification it is not desirable for us to add the unknown/ unrepresented classes in the seen domain. What is unknown should remain unknown but should be recognized as unknown. Anomaly detection is a task in which a a rare event or observation like outliers is identified as different from the regular ones. In anomaly detection, we do not need to discriminate the known classes. Unlike anomaly detection, in open set classification, the classifier does not make any assumption about the cardinalities of the unknown/ unknowns, no information is available about the unknowns at the training time (as well as the test time). The problem of open set classification requires us to have provision for the unknown and unrepresented class besides discriminating between and correctly predicting the known classes.
In an efficient open set classifier, two of these characteristics should be consequential of the scheme. Reverse nearest neighborhood provides an elegant way of solving these two issues simultaneously. Our scheme based on reverse nearest neighborhood principles is presented in Section V. We discuss the extant works on open set classification and the machine learning applications of reverse nearest neighbor principles in the next section.

III. LITERATURE REVIEW
This work deals with open set classification using the principles of reverse nearest neighborhood. Reverse k-nearest neighbor (RkNN) principle has been used in various applications but RkNN based classification has not been implemented or addressed in any existing piece of work so far. Keeping in mind these two aspects, the literature review of this work is presented in two contexts. First, we discuss extant works in the field of open set classification. In the second part, we present a brief discussion on works that have used principles of reverse nearest neighborhood to achieve some machine learning goals.
Open set recognition in a mixed bag of seen and unseen classes has appealed to the data science community for quite some time. Despite the number of works not being numerous till date, the techniques applied are quite diverse. Reference [2] implemented unknown class recognition through estimation of prior probability of the known classes and posterior probabilities for the known as well as unknown classes. One class classifiers which try to model a class only through its positive instances has been one of the foremost solutions to deal with open world problem. Though it is sufficient to deal with a setup having one known class and the rest as unknown class, the need for more refined scheme which can tackle two or more known classes along with the unknown is natural. Reference [1] addressed this issue by implementing open set recognition in the context of two known and the remaining as unknown class. They modified the conventional SVM for this. Besides drawing a decision boundary between the two known classes, [1] added one more hyperplane which separated the unknown class from the known subspace. The learning of the classifier model followed by incorporating Compact Abating Probability (CAP) is another solution. An amalgamation of the extreme value theory and the probabilistic CAP model is implemented in [3] to classify the instances from the known class/es and subsequently recognize the unknowns. CAP model considers a decreasing confidence of class membership as one moves away from a known class instance into the unmarked space. Regions beyond a thresholded radius are subsequently categorized as the unknown or open space. In [4], a posterior probability estimator is implemented for each training class. A test instance is predicted into a known class only if the maximum probability surpasses the threshold. If none is found, the point is recognized as unknown. Distribution learning of the known classes through Extreme Value Theory (EVT) and incremental learning are incorporated in [5] to implement open set classification. Object detection under openset constraints are solved using drop-out sampling approach in [6].
A few recent schema have incorporated neural networks to recognize samples from unseen classes along with classification of samples into seen or known classes. The scheme by [7] is based on an ensemble of Convolutional Neural Network with a provision for open set recognition. It separates plant images from unknown non-plant images. Open set recognition through weightless neural network has been explored in [8]. In [9], a neural network based classifier detects the unknown samples through comparison and computation of the similarity between the unknown data and the stored or bounded knowledge. Reference [10] on the other hand proposed a theoretically sound method to estimate the 'sampling window' of the training data. Samples generated from regions outside the sampling window are used to represent the unknown world (class). They have trained a neural network to learn the known and unknown classes. In [11], Generative Adversarial Network (GAN) based approach is to separate the differential identity components of face to generate an-identity preserving open set face synthesizer.
Reference [12] has tweaked traditional k-NN based classifier to facilitate open set recognition. It has proposed two schema. In the first variant, an instance is classified as unknown on non-agreement in class labels of its first two neighbors, agreement assigns the instance to its first (as well second) neighbor's class. The second considers looks at the distances of the test instance's from its two nearest neighbors belonging to different classes and calculates their ratio (nearer/ farther). If the ratio is beyond a threshold, the instance is classified as unknown and vice versa. Reference [13] has employed a data fusion technique by integrating open-set graph-based optimum-path forest (OSOPF) classifier with genetic programming (GP) and majority voting fusion techniques for open set recognition. Reference [14] explores the technique of classification-reconstruction learning for open set recognition.
Reverse nearest neighborhood might just seem a flip side of the k-nearest neighborhood, but it has been used to solve a number of data mining subtasks. Outlier detection in an unsupervised context and in data streams is implemented using reverse-nearest neighborhood by [15] and [16] respectively. Efficient reverse nearest search in metric spaces is achieved by [17]. Reference [18] explored reverse nearest neighbor principles for protein information mining in bioinformatics. Problems on spatial data search is also addressed by the same in [19]. Reverse-nearest neighbor based algorithms have solved spatio-temporal query and range queries in [20]. A work by [21] has implemented data clustering algorithm via RkNN. RkNN explores the locality of the instances to obtain meaningful data mining. In recent years, the techniques of local information exploration, feature embedding and lower rank and sparse subspace recovery have been used as a backbone in a number of diversified domains. In [22], a technique of adaptive embedded label propagation with weight learning is used for classification of real-world image datasets. For efficient classification of images, [23] integrates incorporation of embedded low-rank and sparse principal features with feature coding error and classification error. [24] uses analysis-based trained dictionary learning model for retrieval of query images. Reference [25] is another important work on the same context. It introduces a structured and scalable dictionary learning framework to handle image analysis.
A technical elaboration of the backbone of our work, the reverse-nearest neighbor principles is presented in the next section.

IV. REVERSE NEAREST NEIGHBORHOOD
Definition 1: Given a set of instances X = {x 1 , x 2 , . . . , x n } (X ⊂ IR N ) and a point p (p ∈ IR N ), a Reverse Nearest Neighbor query concerning p in search space X retrieves all the points x i ∈ X that have p as their nearest neighbor. Thus, a Reverse k-Nearest Neighbor (RkNN) search returns all those points x i ∈ X ( i=1, 2, . . . , n) whose k-nearest neighborhood contain p.
Extant neighborhood estimators estimate the neighborhood of a query instance p through the distribution of the neighboring instances around p. Neighborhood demarcation is made via a surrounding hypersphere or through the encompassment of a fixed number of nearest instances around the query point. They do not take into account the locale of the query instance p in the neighborhood of the other search points. Reverse k-nearest neighborhood realizes the neighborhood paradigm in the latter light. To obtain reverse nearest neighbors of a query point p, all points in a given search space are queried about their k-nearest neighbors to find if p is one of them. It is interesting to note that unlike k-NN (where a query point has exactly k neighbors), the number of RkNNs of query instance p can be anything between 0 and n (the search space cardinality). Depending on the data distribution, a query instance p can remain absent from the k-nearest neighborhood of all the instances in the training data, subsequently the RkNN count of p would be zero, if distance(p, x i ) >distance of x i from its k th -nearest neighbor, ∀i. The other extreme case arises when the query point p has the RkNN count of n, the size of search space by virtue of its presence within the k-nearest neighborhood of all the instances in the search space. 0 ≤ RkNN count ≤ n is the possible range of RkNN values. For p, its RkNNs constitute its neighborhood in the given search space X. More the RkNN count of p in X, more is its agreement with the instances in X. A zero RkNN cardinality indicates a significant disharmony between the query point p and the instances in the training set, and it will be fair to assume that p comes from an entirely different distribution. This is our principal motivation for predicting the unknown class instances (along with the usual prediction for the known classes) in a mixed bag of known and unknown class instances.

A. KNOWN AND UNKNOWN SPACE MODULATION
According to our scheme, a region of positive RkNN count constitutes the known subspace (subspace covered by the known classes). We have a search space X (as defined in the previous subsection) and a query instance p. Let d k (x i ) be the distance of x i from its k th nearest neighbor in the given search space X (excluding itself). A hypersphere S kx i of radius d k (x i ), centered at x i is assumed as the k th -nearest neighborhood of x i . S kx i constitutes the known space corresponding to instance x i . If p lies inside S kx i , x i becomes RkNN of p. Let d(p, x i ) be the distance between p and x i . p can lie within S kx i if d k (x i ) > d(p, x i ). Let S be the subspace that is covered by the known class. S = x i ∈X S kx i . If x i is a vector in IR N , then S is a subset of IR N . Here, S implicitly defines the sampling window of the training data and hence can be viewed as defining the boundary of the known classes. The volume of S kx i or the known subspace spanned by x i is dependent on d k (x i ). In Figure 1, we scatter-plot 100 points each from two Gaussian distributions N 1 (µ 1 , 1 ) and N 2 (µ 2 , 2 ) where µ 1 = [50, 50], µ 2 = [20, 15], 1 = 49 0 0 49 and 2 = 9 0 0 9 . The points of N 1 are labeled in red while the ones from N 2 are labeled in blue. The k th nearest neighbor distance or d k (x i ) for points in N 1 are usually greater than that of points in N 2 . Accordingly, the points from N 1 spans a larger volume of known space than that of N 2 . Thus, the RkNN gives an automatic modulation of the known class spaces depending on the local distribution of the training data points. In Figure 1, the spaces marked with yellow color corresponds to the unknown region. It is auto-adaptive to the class boundaries which vary from class to class. This is desirable property while dealing with variable data distributions.

B. PRINCIPLES OF REVERSE NEAREST NEIGHBORHOOD AND CLASSIFICATION
Mathematically, the principles of reverse-nearest neighborhood is another way of quantifying the neighborhood of the points. But reverse k-nearest neighbor principles have not been used for handling problems of classification. knearest neighborhood principles has a framework of classifying test data points. In k-nearest neighborhood based classifier, the confidence of the contending classes is calculated from the class membership of the k nearest neighbors. A test point is likely to belong to a class which has the highest number of it's (test point's) neighbors. The working principles of reverse k-nearest neighborhood is analogous to that of k-nearest neighbor's. We can easily extend a similar classification protocol using reverse nearest neighborhood. For a certain k value, we can find the reverse k-nearest neighbors of a test point p and classify p to the class with highest number of reverse nearest neighbors. It is indeed true that getting an reverse nearest neighborhood is also possible. A RkNN based classifier has to possess proper strategies for handling the zero neighborhood count according to the devoir of the problem. In our case, the zero RkNN count allows us to solve the issue of open set recognition in a natural manner, hence we allow it as it is in our scheme.
The approach and its algorithm is elaborated in the next section.

V. PROPOSED WORK A. APPROACH
While classifying a test instance, classifiers operating on principles of density estimation predict the class having the VOLUME 8, 2020 White space denotes the known subspace and the yellow colored region denotes the unknown subspace. It can be noted that spread of known subspace around each class increases with the sparsity of the distribution. Class 2, being a dense class with lower value of sigma spans a smaller area representing the known subspace. On the other hand, known space volume around class 1 points is high since the relative distribution of the points is sparser. 1 Fig A and Fig B shows the known-unknown subspace delimitation at k = 5 and k = 10 respectively for the same set of data points. It can be noted that the known space volume increases with increasing the k value. At k value 10, known subspaces of the two classes expand and we get an overlap between the two.
highest density estimate (that is the class with the highest number of neighbors) as the test class. Now, let us assess their potential to address an unknown class classification task. For a window based classification paradigm [26], the number of neighbors inside the window volume can vary from zero to maximum cardinality of the search space. Though a zero neighbor count can be used for unknown class detection, when the density distribution is highly skewed, a single volume threshold is not expected to work well across the entire dataset. In addition to this, the volume thresholding is not automatic and needs empirical and manual modulations. In k-NN based classification motivated by [27], the k-nearest neighbors of a query point are searched in the training space. Consequently, k-NN classifier can predict only one of the known classes. There is no provision for unknown class detection in this scheme unless some thresholding is involved.
An efficient neighborhood based solution of open set detection should detect test instances which falls into zero neighborhood zones of a given known space and subsequently reject them as unknown class instances. On a similar note, a positive neighbor count of a test instance indicates a finite known class membership and should be predicted to a class from the training instances. It is desirable that both these tasks (unknown class rejection and known class classification) should be consequential of the scheme and without any thresholding. The scheme should be uniform as well as robust to non-uniform class distributions in a dataset. In order to design a scheme satisfying the said requirements, we propose a neighborhood based classifier where the neighborhood definition is a bit different from the one assumed in the above paragraph. Reverse k-nearest neighbors (RkNN) of a query instance p is searched in the training space X. When the RkNN count of p is zero, we classify p to the unknown class. In other words, if p ∈ S C (the complement of the known subspace or sampling window, S), then p is coming from some unknown class. When RkNN count of p is > 0, then the class-specific membership scores are computed. Membership score of p for a class depends on the number of RkNNs count from that class and the distance between p and the nearest RkNN in that class. The membership value increases with increase in the RkNN count and a decrease in the distance of the nearest RkNN. The instance p is assigned to the training class with the highest membership score.

B. THE PROPOSED METHOD
We have an open instance set D, consisting of two mutually exclusive partitions D tr and D te . D tr and D te represent the training set and test set respectively. The respective number of classes in D tr and D te are c and c + u. The extra u classes in D te remain unseen during the training. We consider u unseen classes together as a single unknown class resulting in c + 1 classes for the test set, D te . Classes 1, 2, . . . , c correspond to the known classes and c + 1 th class correspond to the unknown class. We also assume that the neighborhood size is a fixed positive integer k.
We will classify a test instance p ∈ D te in IR N into any one of the known classes 1, 2, . . . , c or to the unknown class, c + 1 on the basis of the training set D tr only.
Let D tr = {(x i , y i )| 1 ≤ i ≤ n}, where x i is a training instance vector in IR N and y i is its corresponding class label. Hence, the number of training instances is n. The instances in D tr belong to the known classes only, hence their memberships lie in {1, 2, . . . , c}. Next we provide a stepwise description of the algorithm. Algorithm 1 depicts the same.
Step 1: We find the RkNN of p in D tr . The outputs of the lookup is stored in R p (.) and M p (.).
Remarks: R p (i) is a vector which can take only two values 0 or 1. M p is a vector in R n .
Step 2: Now, we obtain the class-wise RkNN statistics for p in N p (j) and Mem p (j). By 'class' only the seen training classes are meant. We calculate the distance of p from its nearest RkNN in class j and store the same in N p (j). When p does not find a RkNN in class j, p is considered unreachable from the entire class j and N p (j) is set to ∞. Next, we compute Mem p (j). It indicates the overall membership of p to class j. Mem p (j) depends on the RkNN count from class j as well as N p (j), the distance from the nearest RkNN of p from class j. For each class j, j = 1, 2, .., c, class membership score of p, Mem p (j) is calculated.
Remarks: A higher value of class-specific RkNN count and smaller distance between p and the nearest class-specific RkNN indicates higher confidence of p to that class. A zero RkNN count from a class results in zero confidence of the instance to that class. Note that Mem p (j) could be greater than 1. By RkNN principles, even for the same k value, the neighbor count of different points vary (depending on their configurations). In such a scenario, it is difficult to adopt the distribution of their distances (as the number of neighbors would vary widely). So, in Mem p (j), we have considered the nearest neighbor distance from class j only.
A toy example of Mem p (j) calculation: Let us have two classes A and B. Let the test point be p. We have the information about p's RkNN counts and it's respective nearest neighbor distances from class A and class B also. Let the RkNN count from class A and class B be 2 and 3 respectively. Let the nearest neighbor distances N p (A) and N p (B) be 0.5 and 1 respectively.
This indicates the importance of nearest neighbor distance in our scheme. Though the RkNN count from class B is higher than that of class A, p's class-membership to A is greater than that of B by virtue of the smaller distance. Besides reverse nearest neighbor configuration, the nearest neighbor's proximity from a class plays a decisive factor in computing the class-memberships.
Step 3: In this step, we will classify p to any one of the known classes 1, 2, .., c or to the unknown class on the basis of class membership scores. Max_Mem(p) value 0 indicates a zero RkNN count from entire set of known (training) classes. It indicates remoteness of p from the training classes and p is classified to the unknown class. Max_Mem(p) > 0 signifies the presence of p within some known class space and p is assigned to the class with Max_Mem(p). Class_prediction(p) gives the final prediction for p, it can be the unknown class or any one of the known classes.   Classify p as unknown (c+1). 10: else 11: Classify p to Max_Mem_class(p) (known class). 12: end if 13: End

VI. EXPERIMENTAL SETUP
In this section,we propose a setup to make a comprehensive assessment of the proposed and competing method's performance on classification of the known classes and detection of the unknown class. A brief outline on the four essentials, namely Datasets, Comparing methods, Parameter Optimization and Evaluating Metrics are presented in the following subsections in order.

A. DATASETS
We have employed ten real-world multi-class datasets to evaluate the relative efficacies of the proposed and the comparing methods. Table 2 summarizes the basic statistics of their attributes. MNIST dataset is obtained from https://pjreddie.com/projects/mnist-in-csv/ while the source of the remaining ones is Keel Dataset Repository [28]. MNIST dataset has 784 features and we obtain a Reduced-MNIST version by extracting the top features whose eigenvalue value summation covers 90% feature variance. Reduced MNIST dataset has 79 features. We present the results of both MNIST and Reduced MNIST datasets individually in this work. These datasets are obtained in closed form that is they do not possess any openness and the class information of all the instances are known. In order to accommodate them for the purpose of open set recognition, we have generated open version of each dataset following the same protocol as done by [3]. The It depicts TP, TN, FP, FN for a toy scenario which has 2 known classes and an unknown class. Class 1 and class 2 It depicts TP, TN, FP, FN for a toy scenario which has 2 known classes and an unknown class. Class 1 and class 2 constitute the set of known classes and U denote the unknown class. The first two diagonal elements correspond to the correct predictions for class 1 and class 2 and belong to the TP set. The 3 rd diagonal cell corresponds to the correct predictions for the unknown class U and hence counted as TN. Remaining elements of row 1 and 2 corresponds to the FPs or false predictions into the known classes. For example, cell(2,1) counts the cases where the actual class is 1 but the prediction has been class 2. For cell(2,U) the actual class of the instances is unknown class U but class 2 is predicted. Non-diagonal elements of row 3 correspond to the cases where prediction as been made into the unknown class U but actual class is a known class (1 or 2). first step is to set the cardinalities of the known and unknown classes. For MNIST and Letter datasets, we have followed the recommended partition (by [3]) of 6 known, 1-4 unknown classes and 15 known, 1-11 unknown classes, respectively. For the remaining datasets, the following protocol is adopted.
Let the non-open or regular instance set be denoted by D. D = {(x i , y i )|, x i ∈ X, y i ∈ C}, X ⊂ IR N and C = {c 1 , c 2 , . . . , c n }. Hence the number of classes in the dataset is n. We randomly equi-partition D into a training set D tr and D te . D tr ∪ D te = D and D tr ∩ D te = φ. We will generate open training-test tuple (D o tr , D o te ) from D tr and D te . We will select the sets of known classes and unknown classes, C k and C u respectively from C. The instances belonging to C k will appear in both D o tr and D o te whereas the instances belonging to the unknown class set, C u will appear in D o te only. The cardinality of C k , denoted by c k is fixed to 0.5 × n . The cardinality of C u , c u is varied from 1 to 0.5 × n . Here,      the openness value of each such partition using the formula proposed in [1].
Openness = 1 − 2 2 * Training classes Target classes+ Test classes . Target class consists of all the training and test classes as well as the leftover unknown classes that do not participate in the training and testing. An example is illustrated below in Openness Calculation Example.
Remarks: As said earlier, we have followed openness generation protocol similar to the state-of-the-art methods.
We may note that the number of opennesses generated for a dataset depends on the the number of classes it originally has. Openness Calculation Example: Let us consider Dermatology dataset which has 6 classes. The number of target class for this dataset is always 6. Following the above-mentioned P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?

B. PARAMETER OPTIMIZATION
Most of the open set learners, including ours involve parameters whose values have to be determined empirically.
The optimized values of these parameters are determined via cross-validation on the training set. We carve out a P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  Let us illustrate this with an example. Let there be 6 classes and 100 instances in D o tr . We randomly partition the D o tr into cross-validation training set, T and validation set, V. Each of T and V has 50 instances. We randomly choose 3 classes as known classes and the remaining 3 classes fall into the unknown class. We remove the instances from unknown classes in the training set T. In the validation set, instances from the known classes as well unknown classes are present.
To optimize N parameters, we perform an N-dimensional grid search on the training set validation set tuple (T,V) and select the parameter value/s giving the best output on P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  the validation set. Accuracy is used for evaluation of the performance.

C. COMPARING METHODS
Open set recognition and classification have been accomplished efficaciously by a number of works in the past few years. For comparative assessment of performance of our scheme, we have selected five methods which are briefly described next.  • Multi-class probability of inclusion, PI-SVM [4]: Probability of inclusion or into the class probability is the foundation of this work. '1-vs-rest' binary SVM with threshold probability, P value 0.5*openness is considered for execution. Similar to [3], tuning of γ and C are required for this scheme. LETTER and MNIST datasets are run on the recommended values of C = 2, γ = 2, δ = 0.1 and C = 2, γ = 0.03125, δ = 0.1 respectively.
For the remaining datasets, parameters are fixed through grid search.
Nearest neighbor distance-ratio open set classifier by [12] has addressed open set recognition through a tweaked knn classifier. They proposed two slightly different schema which stand apart from each other in terms of performance. Since the interest of this work lies with classification through nearest P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  neighborhood, we consider both the versions for comparison.
• Nearest neighbor distance-ratio open set classifier (OSNN-CV) : An instance is classified as unknown on getting a class mismatch between its two nearest neighbors. No user defined parameter is involved.

• Nearest neighbor distance-ratio open set classifier (OSNN-NDR):
The distance between two nearest neighbors belonging to different classes are noted for a test instance. If the ratio of the distance (nearer to farther) is sufficiently large, the instance is classified as unknown.
The ratio of the two distances (nearer/ farther) is computed and compared with a threshold, namely T. For unknown class classification, T threshold range suggested by the authors is between 0.5 and 1. Through parameter optimization, a single value is selected from 0.5, 0.55, . . . , 1 for each dataset.
• The proposed method: The proposed scheme requires tuning of the neighborhood k. The value of k is chosen via cross-validation on the training set.

D. EVALUATING INDICES
In this piece of work, we deal with learners which detect unknown class instances alongside the usual classification of instances into one of the known classes. Accuracy, Average F 1 over known and unknown classes (AKUF 1 ) and Known class F 1 are employed to provide insight into known class classification as well as unknown class detection. Before going into the details, we describe a few notations.
The class of known classes (known or training classes taken together) is considered the positive class and the set of classes absent during training or the unknown class is dubbed as negative. Let the known classes set be, K = {1, 2, . . . , c} and the unknown class label be c + 1. A true positive prediction denotes that the classifier prediction is correct and the actual class is any one among 1, 2, . . . , c. In a similar fashion, a true negative is a correct prediction and the actual class is c + 1, the unknown class. A False positive prediction is incorrect P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  and the prediction is between 1 and c. There can be two cases of a false positive prediction -true class is the unknown class but the learner has misclassified into a known class. The other possible case is when the true class is some known class 1 (say) but the prediction has been made into some other known class 3 (say). False negative denotes that an instance from a known class has been incorrectly classified into the unknown class. The total and individual counts of true positive, true negative, false positive and false negative are represented as   Intricate details like individual class performance of a learner cannot be deduced from accuracy alone. The next metric is employed to address the same.
• Average F 1 over known and unknown classes (AKUF 1 ): In order to address the limitation of the above and provide a better glimpse of the class performances, AKUF 1 is computed. F 1 is measured for a single class where the possible classes can be more than one. F 1 calculates the harmonic mean of precision and recall for the concerned class. Below, the F 1 calculation for the positive class is demonstrated.  In the context of open set recognition, classes are broadly classified into known and unknown and the two are equally significant. F 1 is individually calculated on the known (positive) class as well as the unknown (negative) class. Mean of the above two are computed as the AKUF 1 and interpreted to evaluate the overall performance of the learners. A similar metric has been used by [12].
• Known class F 1 : This particular measure estimates the efficiency of the schemes in correct classification of the known class instances in a mixed bag of known and unknown instances.

VII. RESULTS AND DISCUSSION
This section of the paper is devoted to the summarization and comparative analysis of the experimental results.  Before proceeding to the discussion, we would like to clarify the figurative layout. The empirical results are obtained with different openness values where the range of openness varies across datasets depending on the number of classes. For LETTER and MNIST datasets, we have set the known class and unknown cardinalities according the experimental protocol of [3]. For a proper presentation, we have provided three graphical layouts for each dataset, one each for three evaluating metrics, namely Accuracy, Average F 1 over known and unknown classes (AKUF 1 ) and Known class F 1 .
Results on AKUF 1 are presented in Figure 4 to Figure 14).  following three paragraphs, we discuss the comparative performance of the methods on accuracy, AKUF 1 , and Known class F 1 in order with reference to their corresponding plots and tables. Accuracy is a primary choice when one has to evaluate a classifier. Table 3 records the number and percentage of best performance delivered by each of the comparing methods. Out of the 50 cases, the proposed method delivers best results on 39 scenarios (78%), followed by 6 (12%), P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?   observed that the performance of the proposed method lies above all others at all three openness values. But the degree of improvement over the other methods is more pronounced at openness values 0.2257 and 0.2929. Similar analysis for all datasets can be be made by consulting the remaining figures. Table 4 shows the overall statistics of best AKUF 1 performance by the methods. The proposed method delivers the best performance on 50% or more cases for all but one dataset.Out of the 50 cases, the proposed method wins in 40 cases (80%) followed by 6 (12%) and 4 (8%) cases by WSVM and PI-SVM respectively. These figures indicate the capability of the proposed scheme in correctly predicting the known classes as well as the unknown class. Now, we analyze the relative capability of the proposed method to correctly predict the known class instances or Known class F 1 . In practical scenario, this metric holds significance since its mimics the real world where we predict known things in a known and unknown world. Table 5 records P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  the data of best outcomes on each dataset and its respective opennesses. Similar to the previous two measures, the proposed method gets the major share 76% (38 out of 50) best outcomes. Remaining 24% is shared by WSVM (8%, 4 out of 50), PI-SVM (10%, 5 out 50) and OSNN-CV (6%, 3 out of 5). Detailed known class F 1 values are available in Comparative results presented in the above three paragraphs indicate the efficaciousness of the proposed scheme in both known and unknown aspects of open set learning. The proposed method maintains its superior performance on datasets with lesser number of classes (Dermatology, Vehicle) as well as on datasets with large number of classes (LETTER, P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?   Vowel, Texture). The intrinsic multi-class framework of the proposed scheme accounts for this robustness.
The performance of the proposed method on MNIST dataset is not as good as compared to a couple of methods (namely WSVM and PI-SVM). Moreover, it also shows a deviation from it's own (proposed method's) performance on the remaining datasets. We investigated the loss of performance on MNIST dataset and our findings direct to the high-dimensionality of this dataset. Our method is based on RkNN principles where distance and neighborhood relations are the only information that we cultivate for classification. Our method suffers from curse of dimensionality at 784 features and failed to perform as competently as on the remaining datasets. To validate our findings, we have generated outputs on a reduced version of MNIST dataset. The Reduced-MNIST version is obtained by extracting the top features which covers 90% feature variance. Reduced MNIST dataset has 79 features. Fig 7 shows the AKUF 1 performance of proposed and comparing methods on Reduced MNIST. It shows that the performance of the proposed method is better than that of all others. The results are also superior to that of the best performing methods (WSVM and PI-SVM) on regular MNIST (with all features) of 784 features (Refer to Figure 6 (for MNIST) and 7 (for Reduced MNIST)). Figures 18 and  29 show the accuracy and known class F 1 results of these experiments. The results are in congruence with AKUF 1 performance. VOLUME 8, 2020 P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?

B. LIMITATIONS OF THE PROPOSED SCHEME
The proposed work deals with RkNN in which distance and neighborhood relation is the only information that is interpreted. Like any other distance-based scheme, our method P. Sadhukhan: Can Reverse Nearest Neighbors Perceive Unknowns?  suffers from the curse of dimensionality at higher dimensions. The same phenomenon was observed for the original MNIST dataset with 784 features. To curb this problem, we suggest a reduction in feature dimension of a dataset with ≥ 100 features through feature extraction or selection before proceeding with the RkNN-based learning and classification. The improvement in performance on Reduced-MNIST dataset ( Figures 6, 17, 28) over original MNIST dataset ( Figures 5, 16, 27) manifests the same.

VIII. EXPERIMENT ON PARAMETER TUNING
On four datasets, namely, Dermatology, Vehicle, Segment and Vowel, we have conducted a parameter tuning experiment. Neighborhood size 'k' is the only tunable parameter of our  scheme. From our detailed experimental study and analysis, we have seen that a k value in the range 2, 3, 4, 5, 6 works well for all the datasets that we have used. Accordingly, we have reported the accuracy results of the four mentioned datasets across these five k values. Figures 37-40 shows the same. It is interesting to note that a single 'k' value may not work well on a dataset. So, it is advisable to tune the k value across different opennesses of a single dataset. The detailed procedure for parameter optimization is given Section VI. B

IX. CONCLUSION
In this paper, we have presented a novel reverse k-nearest neighbor based classifier. The elegance of this classifier lies with it's innate ability to address open set classification.
RkNN based neighborhood identification does the task of unknown class detection besides the regular known class classification naturally. Choice of k or neighborhood size is dataset dependent and it is determined through cross-validation on the training set. Apart from that, no thresholding or parameters are involved to distinguish the known and unknown subspaces. A unique attribute of the proposed scheme is that it estimates and explores the sampling window implicitly. The RkNN process itself adaptively adjusts the class boundaries, depending on the local sparseness of the training data and this contributes to the simplicity and efficiency of the scheme. The proposed classifier also operates on an intrinsic multi-class framework. A comprehensive empirical study affirms the capability of the proposed scheme deliver competent to superior performance on open set backdrop the competing learners.