QBB: Quantile-Based Binarization of 3D Point Cloud Descriptors

Local 3D point feature descriptors play an important role in many areas of computer vision, such as object recognition, registration, etc. There are many well-functioning feature descriptors, but they are typically real-valued and multidimensional vectors, leading to high computational complexity in nearest neighbor searches. To overcome this challenge, methods binarizing real-valued descriptors have emerged. In this paper, we first investigate the available binarization methods and standalone binary feature descriptors and show that existing binarization techniques cannot generally achieve good performance for arbitrary feature descriptors. To remedy this problem, we propose a new binarization method called quantile-based binarization (QBB) that can be applied to any real-valued feature descriptors. It analyses the distribution of feature descriptors that is then used to form meaningful groups along each dimension. To this end, QBB computes quantiles of the empirical distribution and the interval lengths (bin sizes) defined by quantile boundaries. Finally, it assigns a binary code to each group and concatenate them to get the final binary descriptor. QBB is able to adaptively compute the number of bits based on a capacity constraint, i.e., with the appropriate capacity setting, the resulting binary descriptor can be used on devices with lower computational power. We evaluate the descriptiveness of well-known descriptors binarized by QBB and compare them to state-of-the-art methods. According to our evaluation, QBB is able to create binary descriptors whose descriptiveness is closer to the real-valued descriptors than prior approaches. Finally, we also show that QBB can even compete with standalone binary feature descriptors.


I. INTRODUCTION
Many novel applications using 3D point clouds have recently emerged. For example, a robot manipulating the environment needs the ability to accurately sense the real world around it. Though stereo cameras and other approaches enable to obtain depth information from 2D images, similarly to human beings, point clouds collected by dedicated 3D depth sensors are more accurate. With the development of self-driving vehicles, the processing of 3D point clouds has become an even more important task. Detecting other vehicles or pedestrians around the vehicle is essential The associate editor coordinating the review of this manuscript and approving it for publication was Wenbing Zhao .
for safe and reliable control. Such methods (especially point cloud registration and object recognition algorithms) mostly rely on local feature descriptors. A local feature descriptor is a vector that characterizes the environment of a point in the point cloud and thereby makes points distinguishable from each other. The best known feature descriptors include FPFH [1], SHOT [2], SI [3], RoPS [4] and USC [5]. Survey papers like [6] have already collected and categorized these descriptors. Guo et al. [7] systematically compare local feature descriptors using different datasets. Their comparison covers a number of different aspects (such as computational effiency, time-crucial, and space-crucial applications, etc.) that count in real-world use cases. Their work concludes that in most application areas FPFH or SHOT VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ shows the best performance, depending on the number of points.
The most common operation with feature descriptors is the nearest neighbor search. The classic local descriptors are multi-dimensional real-valued features (e.g., SHOT has 352 dimensions), resulting in storage and computational costs that are too high for real-time applications. Augmented reality apps relying on 3D point clouds are very popular on mobile devices with high-quality imaging sensors. For these apps, real-time processing is utmost important for good quality of experience. Some works [8], [9] propose dimension reduction of feature descriptors to accelerate the expensive point cloud matching step. Dimension reduction techniques can compress feature descriptors so that their descriptiveness is not significantly limited.
Another way to solve this problem is to create binary descriptors (i.e., bit sequences) from existing feature descriptors. Algorithms that convert a real-valued feature vector into a binary feature vector are called binarization methods. A binary vector requires much less space than real valued ones. Furthermore, Hamming distance between bit sequences can be performed with bit operations, thus nearest neighbor searches can be executed significantly faster [10].
A typical approach for binarization is to replace the real numerical values of the original descriptor with one or more bits along a dimension. There are general binarization methods that can be applied for any real-valued descriptors. More advanced methods exploit the special properties of how a feature descriptor is computed. Finally, there are standalone binary feature descriptors, which create binary features directly.
In this work, we present a novel 3D point feature descriptor binarization method that creates groups by quantizing the distributions of each element in the original descriptor. These groups are then represented by bit sequences whose concatenation results in the binary descriptor. Using this method, the nearest neighbor searches can be performed much faster, and the descriptors require less storage. The main contributions of this paper are the following: • We propose a new parameter-free binarization method called QBB that can generally be used to binarize any classic feaure descriptors.
• We compare the performance and properties of QBB to the available binarization methods and standalone binary feature descriptors and show that QBB overperforms them on real-world point cloud data.
• We present possible variations of QBB which in some cases have a higher descriptiveness than the original descriptor.
• The source codes of QBB and other binarization methods used in the evaluation of this paper is publicly be available on GitHub. 1 1 https://github.com/ELTE-IK-Point-Cloud-Group/QBB FIGURE 1. Grouping of binarization methods and standalone binary descriptors. RCS is a standalone, real-valued descriptor, but the authors gave 3 methods to binarize it. Two of these methods are general enough to be applied to any arbitrary real-valued descriptor.

II. RELATED WORK
The first method for binarizing real-valued 3D point feature descriptors was only published in 2015; it was called B-SHOT [10]. Since then, a number of works have been published that also attempted to solve the task of binarization. Binary feature descriptors can be divided into two groups ( Fig. 1). One group includes methods that convert an existing real-valued descriptor into a binary one (such as B-SHOT). The advantage of these methods is that they can be applied to any real-valued feature descriptor. In general, binary feature descriptors created after binarization are less descriptive than the original descriptors, but their space requirements are much less, and nearest neighbor searches are faster on them. One drawback of this approach is that the real-valued feature descriptor must first be calculated and then converted to binary, which takes extra time. The binarization method by Prakhya et al. [10] aims at binarizing SHOT, a real-valued feature descriptor. The algorithm can binarize any real-valued vector, but it was inspired by the distribution of SHOT descriptors, and works best in combination with this descriptor method. The two parameters of the algorithm are encoding length (L) and encoding ratio E r (percentage). Each real value corresponds to one bit, i.e. the length of the descriptor does not change. The first step is to select L consecutive value from the original, real-valued descriptor and compute their sum. The corresponding bits of the values that contributed to E r % of the sum are assigned a bit with value 1, the other bits will be 0. By concatenating the bit sequences of length L, we obtain the final descriptor. According to the authors, the B-SHOT gives the best results with L = 4 and E r = 90%. Later, they compared B-SHOT with B-FPFH and B-ROPS using the same algorithm. Their results show that while B-SHOT's descriptiveness is close to the original descriptor, the other two binarized descriptors are far behind. The authors acknowledge that in some cases the loss of information could be high. 67840 VOLUME 10, 2022 Later, similar work was published, also aimed at binarizing the SHOT real-valued feature descriptor [11]. Lin et al. proposed a generic version of the B-SHOT algorithm (Gray-SHOT [12]) in which the encoding length (L) and the number of bits used to encode each element became input parameters. By increasing the number of bits used to represent an element, it was possible to binarize the real-valued descriptor with less loss of information, but the length of the bit sequence increased. When multiple bits are used to represent an element, Gray code was applied so that the Hamming distance of the consecutive groups is always 1. Their parameter tuning shows that the best results can be obtained when the encoding length is 352, and the number of bits used to represent each element is 2. The CI-SHOT [11] method is very similar to Gray-SHOT. This method also uses encoding length and can use more bits to represent an element. The main difference compared to the previously mentioned methods is that the CI-SHOT algorithm decides the value of bits based on a method inspired by Chebyshev's inequality. Based on their parameter tuning, the best choice for encoding length is 11, and the number of bits representing one element is 2. The authors only applied on the SHOT real-valued descriptor their binarization methods.
Yang et al. proposed a real-valued feature descriptor called Rotational Contour Signatures (RCS) [13]. In their work, they give three methods to convert their descriptor into a binary one. All of these methods are general enough to be applied to an arbitrary real-valued descriptor. The first method is called thresholding, which works by calculating a threshold t. For each element from the original descriptor, a bit is assigned depending on whether the current value is greater or less than the threshold. The authors calculated the median for an element and considered this value as the threshold. Their second approach is vector quantization, where one element is represented by several bits, resulting in 2 N number of groups, where N is the number of bits for one element. The advantage of this method is that it can store more information, but it uses more bits and is more sensitive to noise. The third method is called geometric binary encoding. It works with a series of real values and compares the adjacent values. It assigns bits for each real value depending on the result of the comparison. If a value is greater than the previous value, the corresponding bit is 1, otherwise 0. According to their results, the best binarization method was vector quantization, especially with 2 or 4 bits. In some cases, the binarized version performed almost as well as the real-valued descriptor itself. Our solution is based on vector quantization as well.
Another group of binary descriptors is the standalone binary descriptors. These are descriptors that produce a binary descriptor by default. The advantage of this approach is that no separate binarization step is required. The disadvantage compared to the other approach is that if a new, better descriptor is available, the binarization methods can easily produce a binary version of it too. The currently available standalone feature descriptors can be divided into two groups. The first group includes methods that build a voxel grid at the local neighborhood of the point and assign a bit to each voxel. The other group includes methods that project neighbor points onto planes or axes and process the resulting lower-dimensional data to calculate descriptors.
Two well-known voxel-based standalone descriptors are LoVS (Local Voxelized Structure) [14] and VBBD (Voxel-Based Buffer-Weighted Binary Descriptor) [15]. These two methods are very similar. In the first step, both methods compute a local reference frame (LRF) around the selected point and the points in the local neighborhood are transformed with respect to the LRF (although they use different LRF computations). In the next step, they build a voxel grid, and each voxel will correspond to one bit. In the case of LoVS, a bit value will be 1 if one or more point falls in the corresponding voxel, otherwise 0. The parameters of the method are the radius by which the neighborhood is determined and the number of voxels along an axis (m). The length of the descriptor is m 3 . In the case of VBBD, the number of bits is also determined by the number of voxels, but the calculation of the value of the bits is more complex. The points closer than a radius h to a voxel center are part of the buffer region of that voxel. The Gaussian kernel density is calculated for each voxel's buffer region. The value of a bit corresponding to a voxel will be 1 if the Gaussian kernel density of the voxel's buffer region is greater than the average Gaussian kernel density. By increasing the number of voxels, more information can be stored, but if the number is too high, they become sensitive to noise. For both methods, the authors recommend setting m = 9 number of voxels along each ax, so the length of the descriptors will be 729 bits.
Projection-based methods have in common that they project points in their environment onto planes or axes. For this reason they need to compute an LRF and transform their neighboorhood respect to it. The 3D Binary Signatures (3DBS [16]) takes the local environment, not by a radius, but the N nearest neighbors with angular constraints. This is important because in the case of 3DBS the number of neighbors determines the length of the descriptor. First, the algorithm projects the normal vector of each point onto the x, y, and z axes. In the second step, it creates ordered point pairs and compares the projected values of their normal vectors. Each ordered point pair correspond to 3 bits (compared projected values for each ax). After concatenating the bits, the length of the final descriptor is 3 · N 2 . The Binary Shape Context (BSC [17]) and the Binary Rotational Projection Histogram (BRoPH [18]) project the points in the local neighborhood onto xy, xz, and yz planes. The idea is to reduce the binarization of 3D points back to 2D binarization. BRoPH is similar to the RoPS [4] real-valued descriptor. The algorithm rotates the local point cloud around the x, y, and z axes with an angle (θ) and after each rotation projects the points onto three planes. These 2D image patches are then divided into L × L bins. For each bin, the points in the bin are counted (distribution matrix D) and the average of their depth values (depth image I ) is calculated. The bit sequence VOLUME 10, 2022 is calculated based on these 2D image patches. The length of the final descriptor is 3 × 3 × d broph r × d broph bl (broph r is the rotation size, broph bl is the bin length for every 2D image patches). BSC differs from BRoPH in that it projects the points onto the three planes without rotation, but also produces 2D image patches based on distribution and depth.

A. MOTIVATION
Guo et al. highlight in their work [7] that most feature descriptors use histograms. Though there are significant differences between them, the feature elements are derived from the weights of histogram bins.
The FPFH feature descriptor is based on three angular features: θ, α and φ [1] that are related to the normal vectors and the difference between the vectors of two points. For each point in the point cloud, FPPH takes all the neighboring points, calculate the three angular features for that point and its neighbors, and finally, for all the features calculated for all the points, the three histograms are formed (with 11 bins which is the default parameter in the PCL [19]). The elements of the feature descriptor of a given point are computed based on how many of the feature values for the point-neighboring point pairs fall within the intervals of the bins. (If none of the feature values for the point-neighboring point pairs fall into a given bin, the element associated with that bin is 0, if all of them fall into a given bin, the element is 200, based on the Open3D FPFH implementation [20].) Since φ is the most interesting (cf. Fig. 3), we only show its typical histogram and empirical density functions associated with two bins in Fig. 2. If a bin has a small weight, there will be many 0 values along the corresponding dimension of the feature descriptors. This is shown in Fig. 2/(b). For high bin weights, the values along the relevant dimension may give a more interesting density function than in the previous case, which is illustrated in Fig. 2/(c). Based on our experience, it can be said in general about the different feature descriptors that, for different reasons, it is typical that along many dimensions there will be many 0 values and only a few will have a really interesting density function.
In our binarization method, we will aim to group the values of feature descriptors along a dimension. We will not distinguish between values that form a group in the following. The grouping should therefore be done in a way that preserves as much as possible the descriptive power of the original feature descriptors. When grouping along a dimension, one of the goals will be to have roughly the same amount of adjacent values in the groups, since we want each group to be equally important. Therefore, the boundaries of the groups are based on quantiles. Assuming that for a fixed feature descriptor method per point cloud type, the values along a dimension come from a similar density function, a sufficiently large training set can be used to determine the boundaries of the groups in advance. The question is how many groups should be formed. This will be explained in more detail below. For now, we just note that • For those dimensions of the FPFH along which there are many values of 0, even the median value (or even the third quartile) may be 0, which means that we will not be able to define more than two groups. (Zero values form the first group and the the rest form the second group.) In this case, it will not even be satisfied that the same amount of values is added to the groups.
• In fact, if we take more and more groups, even for the more interesting dimensions, the boundaries of a group may be too close to each other, so there will be a stopping condition. The groups are then represented by bit sequences that takes some account the proximity between the groups. (Essentially, the values are quantized along a dimension and the quantized values are binarized, but the specific quantized values are irrelevant to our method.) For the full feature descriptor, binarization is achieved by concatenating the bit sequences obtained along the dimensions. Our method will be called quantile-based binarization (QBB for short).

B. QUANTILE-BASED BINARIZATION (QBB)
This subsection provides a detailed exposition of our method. First of all, let us introduce the notations. D dimensional feature descriptor space is assumed. The n feature descriptors, the d th element of i th feature descriptor and the list of the d th elements of feature descriptors are denoted by X 1 , . . . , X n , X 67842 VOLUME 10, 2022 Let Q (d) (p) be the empirical quantile function of X (d) for 0 ≤ p ≤ 1. The tuple-builder notation: 2 is modelled on the set-builder notation and this generates a list with first element f(0), second element f(1), etc. Let round(x) be the nearest integer function which is the integer closest to x ∈ R.
For the sake of simplicity, power-of-two groups will be formed. It is not necessarily true that the more groups we define along dimensions, the better the feature descriptor will retain its descriptive power, because we do not want to put very similar values into separate groups (for example, if the density function has a relatively narrow peak somewhere, we do not want to cut it in half). The choice of the number of groups is somewhat related to the choice of the number of bins when we want to approximate the distribution with a histogram. It is true that there should be enough bins to ensure that a bin does not hide any relevant information about the shape of the distribution, but ignores details due to random fluctuations [22].
If we want to avoid grouping the values mentioned above into two separate groups, care must be taken to ensure that the boundary of a group cannot be placed anywhere, but it should be expressed as a multiple of a unit. The unit is given by the uniform bin width of the histogram of the values, which can be determined by several methods. Because of its popularity, we have used the Freedman-Diaconis rule [23] with the addition of a limit below which the bin size cannot go in order to speed up the calculation. Let bw denote the bin width, then: where IQR X (d) is the interquartile range, i.e. Q (d) (0.75) − Q (d) (0.25).
In accordance with the above, on the one hand, Algorithm 1 determines, for a given dimension d, the maximum group number for which details due to random fluctuations are (expected to be) ignored. On the other hand, it computes the boundaries for each group up to the maximum group number. For a given group number gnum, this means that the values along dimension d of the feature descriptors are sorted in ascending order and divided into gnum equal parts, i.e. the gnum-quantiles are determined. With the addition that we can only put the boundaries where the unit boundaries are. In fact, the algorithm will only run with the specified loop as long as there is a group whose interval length is 0. The algorithm will return a set G d with one element representing a group number and its associated boundaries. The reason why the algorithm does not only return boundaries for the maximum number of groups is that we also want to consider a capacity limit. We will come back to this. 2 Similar notation can be found e.g. in Schrodt's thesis [21] Algorithm gnum ← 2 · gnum 10: until mdiff > 0 11: return G d In the case of the FPFH method, the maximum group numbers along the dimensions were determined separately for several point clouds (derived from the data set described in Sec. IV), which are shown in Fig. 3. It can be seen that for each dimension Algorithm 1 gives roughly similar group numbers for different point clouds. For this reason, a reasonable number of groups along the dimensions can be defined in advance, which will preserve a roughly similar descriptive power of the FPFH for future point clouds.
There are several ways to represent group indices with bit sequences. It is important to note that the similarity between binarized descriptors will be calculated by Hamming distance (and a modified version of it, see Sec. V-B). Our goal is to make the distance between binarized descriptors reflect the distance between real-valued descriptors as closely as possible while keeping the number of bits low. We need log 2 gnum bits to encode gnum group at least. On the other hand, to minimize information loss, if there are k groups between two groups, we want their Hamming distance to be k + 1. A weaker condition is that we expect the Hamming distance between bit sequences representing adjacent groups to be exactly 1. A suitable method to satisfy the weaker condition is Gray code [24], which uses log 2 (gnum) number of bits. However, if we want to keep the distance between the groups accurate, it is easy to see, that we need gnum−1 bits at least. The Mersenne numbers expressed in the binary numeral system are suitable for this [25]. We will be referred to it as Mersenne code. Fig. 4 and Table 1 illustrate the Gray and Mersenne code representations for 8 groups.
C. CAPACITY LIMIT By introducing a capacity limit, our algorithm allows us to maximize the amount of memory occupied by the binarized descriptor. In this case, we choose the Gray code because it is the most compact binary representation of the information VOLUME 10, 2022   Table 1 shows the Gray and Mersenne code representations of groups. and our evaluation shows that, in general, very similar accuracy can be achieved using the Gray code as with the Mersenne code. To avoid of information loss, each element from the original descriptor should receive one bit at least. It is assumed that D ≤ C, otherwise the number of elements of the original descriptor can be reduced by changing its internal parameter(s) (e.g. by decreasing the number of FPFH bins). We would like to distribute the remaining C − D bits among the elements in proportion to their requests. To solve the problem, let r d the number of bits requested by d th element, which is equal to below: (2) Also, denote the sum of all requested bits by R: If R ≤ C, then each element can get the requested bits, since it fits within the capacity limit. The interesting case is R > C. Then the d th element will get l d bits: because each element must recieve 1 bit and the remaining C − D bits must be distribute by the weight of the requested bits (r d − 1)/(R − D) (D ≤ C < R is satisfied by the precondition, i.e. l d will be a positive integer in any case).

IV. EVALUATION
Our method was compared with all known 3D point feature descriptor binarization methods (B-SHOT [10], CI-SHOT [11], Gray-SHOT [12]) and standalone binary descriptors (VBBD [15], LoVS [14]). To evaluate QBB, we selected real-valued descriptors from those available in Point Cloud Library [19] and Open3D [26]. The methods being compared were applied to the real-valued descriptors used by the authors themselves. Therefore not all binarization methods were run on all descriptors. The binarization methods (B-SHOT, CI-SHOT, Gray-SHOT) and the standalone binary descriptors (VBBD, LoVS) were implemented in Python by us. Our implementations and evaluation code are available via the following link: https://github.com/ ELTE-IK-Point-Cloud-Group/QBB. The runtime of teh various algorithms strongly depends on their implementation. As stated above, all methods were implemented in Python without parallelization. The runtime of the binarization methods for a point cloud is below a second (few seconds for B-SHOT) on a notebook with regular configuration (Intel Core i5-10300H 2,50GHz; 16GB RAM, 2933MHz). Therefore, we estimate the theoretical computational complexity of each method in Sec. V-D, instead of comparing the runtimes of unoptimized implementations.
An important parameter for real-valued and binary feature descriptors is the radius, which is used to select the neighborhood of the point. The size of the radius depends on the point cloud, its noisiness, the size of the surfaces represented by the cloud, etc. Usually, a descriptor with a larger radius is able to encapsulate more information. Following Guo et al. [7], to ensure similar conditions for all descriptors, the same radius was used in all cases.
The publicly available 7-Scenes RGB-D redkitchen [27] and Redwood livingroom [28] datasets were used for the evaluation. In these data set, a typical point cloud contains roughly 250 000 points. For faster evaluation, the point clouds were downsampled to a voxel leaf size of 0.01, reducing the typical size of the clouds to 100 000 points. The redkitchen dataset contains 60 overlapping point clouds with ground truth transformation. To make our evaluation accurate, we selected overlapping cloud pairs that overlap by at least 65% (45 point cloud pairs met this criterion). Fig. 5 shows an aligned point cloud pair from the dataset with different descriptors (the point clouds are cropped for better visualization).
The Precision-Recall Curve (PRC) is widely used to compare the descriptiveness of a descriptor [7]. To obtain the precision and recall values, we iterate over all of the 45 overlapping cloud pairs. From both clouds, we select sample_num points with random choice (we set sample_num to 5000, which is usually of 5% of all points). The descriptors are calculated for the selected points. In the next step, point pairs (correspondences) are created from the two clouds based on the nearest neighbor ratio [29]. We find the two nearest neighbors of each descriptor of one cloud from the descriptors of the other cloud. If the ratio between the first and the second nearest neighbor is greater than a threshold τ , we consider the point and its nearest neighbor a correspondence. We then need to check how many of the correspondences are correct. To do this, we transform the point clouds using the ground-truth transformation to align them. If two points of a correspondence are closer to each other in Euclidean space after the transformation, than a given support radius, this correspondence will count as a correct match (for support radius we used the same radius as for the descriptors: 0.06). In the next step we calculate the precision and recall values: To obtain the Precision-Recall Curves we iterate through the τ threshold values from 0.5 to 1. We start from 0.5 because this means that the second nearest neighbor is twice as far away as the first nearest neighbor, which is very rare for real point clouds. However, at lower values, precision and recall can take extreme values. To illustrate the descriptiveness of a method in a compact way, we will compare the area under curve (AUC) values for different descriptors (the AUC is calculated using metrics.auc function from the scikit-learn library [30]). Since we randomly select points for the evaluation, we run the evaluator 5 times for each pair of clouds and average out the results.

V. RESULTS
In this section, we present the results of our evaluations, where we compared QBB to other relevant methods. Unless otherwise indicated, QBB methods use the default Gray code and conventional Hamming distance without capacity limit. In Fig. 6 (a) we can see the precision and recall values of real-valued descriptors. It also includes the binarized version of FPFH with our method (QBB-FPFH). Spin Image has the worst performance of the real-value descriptors. This result is consistent with the work of Guo et al. [7] where the performance of Spin Image also worse compared to other descriptors (FPFH, SHOT, RoPS). When threshold τ is low, SHOT and RoPS can achieve high accuracy, but at a higher threshold, the accuracy decreases drastically. Only FPFH can consistently achieve good precision. It can be seen that the performance of QBB-FPFH is almost as good as the original descriptor, and better than the other real-valued descriptors.
We will see later that QBB achieves the best results by binarizing the FPFH descriptor. Fig. 6 (b) shows how QBB-FPFH performs against two standalone binary descriptors. According to the authors of VBBD and LoVS, their methods give the best results when using 729 bits (9 voxels along each ax). The size of the QBB-FPFH with no capacity limit is 77 bits, yet it has a higher AUC than standalone descriptors of 729 bits. If we reduce the number of bits of the standalone methods to 64 and 343 (only cubic numbers are possible), we can see that their AUC value is much smaller. VBBD and LoVS are very similar methods, but they calculate LRF in different ways, which may explain the difference in performance. In Fig. 6 (c), we can see how the descriptiveness of the QBB-FPFH changes as the capacity limit decreases. As expected, using a capacity limit has a negative effect on performance. The largest decrease in AUC occurs when the capacity limit is changed to 50. Based on the figure, if we want to use a descriptor on a device with very limited memory and computational capacity, QBB-FPFH is suitable even with limiting its length to 65 bits.

B. POSSIBLE VARIATIONS OF QBB
The advantage of binary feature descriptors is that 1) they require less memory usage and, 2) the Hamming distance computed by bitwise operations is much faster than computing Euclidean distances between real-valued descriptors. In this subsection, we would like to describe modifications to the QBB that improves its performance at the expense of the advantages mentioned above. As described in Sec. III, we considered several different methods for encoding groups. Table 1 shows the Gray and Mersenne codes in the case of 8 groups. In Fig. 7, we can see that using Mersenne code, QBB can achieve better performance. However, the number of bits needed for one element with real value increases from log 2 N to N − 1. Thus, it requires more memory. However, the increase in the number of bits also creates an anomaly. The QBB may represent each element from the original descriptor by bit sequences of very different lengths. Consequently, if one element is represented by a longer bit sequence than another, the weight of these elements may be increased compared to their weight in the original descriptor in the case of conventional Hamming distance. This affects the binary descriptor created with Gray and Mersenne code too. For example, if the values are gathered into four and eight groups along the two dimensions of the original descriptor, using Gray code we get 2 and 3 bits, respectively, and using Mersenne code we get 4 and 7 bits, respectively. Using conventional Hamming distance, the representation of the second element will have a weight of 3/2 = 1.5 times higher for the Gray-code version and 7/4 = 1.75 times higher for the Mersenne-code version. To solve this problem, we introduce the Modified Hamming Distance metric.
where HD is the Hamming distance function for bit sequences, D is the number of dimensions of the original descriptor, l d is the number of bits in the binarized descriptor corresponding to the d th element of the original descriptor.
To calculate the MHD, we calculate the Hamming distance for each bit sequence corresponding to each element of the original descriptor and then divide it by its length. Thus, each bit sequence associated with an element will contribute a value between 0 and 1 to the modified distance, i.e. each element will contribute an equal weight to the distance.
To do this, we need to store the information of how many bits are used for representing each element of the original descriptor. Unfortunately, Modified Hamming distance cannot be as efficient as conventional Hamming distance (which can calculated by efficient bitwise operation). However, in some cases, QBB descriptors using MHD for nearest neighbor searches can achieve better results than the original real-valued descriptors. Fig. 7 shows how different variants of the QBB perform. QBB-FPFH, QBB-RoPS and QBB-SI use Gray code and conventional Hamming distance, QBB-FPFH (G+MHD) / (M+MHD), QBB-RoPS (G+MHD) / (M+MHD) and QBB-SI (G+MHD) / (M+MHD) use Gray code / Mersenne code and Modified Hamming distance. The evaluation shows that QBB variations of FPFH and RoPS using Modified Hamming distance and even QBB-RoPS using conventional Hamming distance can achieve better results than the original descriptor while requiring much less storage space. However, in the case of the Spin Image descriptor, the QBB variations perform worse than the original, although better results can be obtained using the modified than the conventional Hamming distance. Consider the MHD with an optimized implementation, it can be much faster to calculate than the Euclidean distance. (It is noted that in this work we did not focus on measuring the matching speed of distance metrics, nor did we aim for a computationally optimal implementation.) We believe that the use of Modified Hamming distance, as opposed to the conventional Hamming distance, may be justified if optimizing the matching time is not the most important in the use case.

C. COMPARING BINARIZATION METHODS
The properties of the compared binarization methods are summarized in Table 3. Fig. 8 and Fig. 9 show the AUC values of original descriptors and their binarized versions using two different datasets (redkitchen and livingroom). The AUC values are on a logarithmic scale for proper visualization. For the methods with the QBB prefix, we used the Gray code and conventional Hamming distance. We also calculated the B-ROPS with different parameters (L = 4, 5), but precision and recall values were zeros. VOLUME 10, 2022  Properties of the compared binarization methods. The second column shows the number of comparisons for one feature vector. D is the length (dimension) of the input vector, L is the encoding length, and N is the bit number for one real-valued element. The third column shows whether precomputations are required for the method or not. The last column shows the descriptors to which the methods were applied, according to the papers cited.
The B-SHOT, CI-SHOT, and Gray-SHOT methods were specifically designed for binarizing SHOT descriptors. The methods are general, i.e. they can binarize any real-valued descriptor, but the algorithm works well for the distribution of the values of SHOT descriptors. However, Prakhya et al. [31] evaluated their method also on FPFH and RoPS. B-SHOT and QBB-SHOT use 352 bits (like the original descriptor), while Gray-SHOT and CI-SHOT use 704 bits. Fig. 8 shows that none of the binarized descriptors can achieve the same performance as the original descriptor. QBB-SHOT and B-SHOT perform similarly for the SHOT descriptor.
In the case of FPFH, the performance of the QBB-FPFH is much closer to the original descriptor than B-FPFH. For B-FPFH we got similar results like the authors in their work [31]. B-FPFH was with encoding lengths of 4 and 11.
On the redkitchen dataset, QBB-RoPS gives a slightly better result than the original descriptor (see Fig. 8). It can happen because the binarization and Hamming distance can smooth out the noise in the original descriptor. Note that the descriptiveness of QBB-FPFH is still way higher. For the Spin Image descriptor, QBB-SI performs worse than the original, although it still has a higher AUC than the binarized versions of SHOT.
On the livingroom dataset, every descriptor performs worse than on the redkitchen dataset (see Fig. 9). The probable reason behind this phenomenon is that the livingroom scene is very simple, only containing few interesting surfaces. Thus, it is more difficult to distinguish, e.g., 3D objects and surfaces from each other. Note that group boundaries of QBBs are the same for both datasets, and thus recalculation of group numbers and their endpoints (Algorithm 1) for the livingroom dataset is not needed. The ratio of the AUC values between different binarized descriptors is very similar for both datasets. An important difference compared to the redkitchen evaluation is that QBB-FPFH results in a much smaller AUC than the original descriptor, the AUC value of the Gray-SHOT method increased significantly, and this time the QBB-RoPS is not better than the original RoPS.
In summary, the QBB method performs better than other binarization solutions for the considered real-valued descriptors. QBB-FPFH has achieved the best results, which could be better than other real-valued descriptors.

D. COMPUTATIONAL COMPLEXITY
B-SHOT, CI-SHOT, and Gray-SHOT binarization methods are the most similar to our method, but there is an important difference between them and QBB. As discussed in Sec. III, the group numbers and boundaries required for QBB must be determined before the binarization. This step cannot be performed during the binarization of individual feature descriptors. The QBB can work well when multiple feature vectors are used to determine the groups. During the evaluation, the feature vectors of several point clouds were used to determine the group numbers, similarly to the training phase in machine learning. Accordingly, the goal is to consider as many different feature vectors as possible. Determining the necessary number of training data is not trivial and requires further work. Our experience shows that for a specific feature descriptor method it is sufficient to define the group numbers once, and these group numbers can be used well for other datasets. The group numbers we determined can be found in the given GitHub repository. For the CI-SHOT and Gray-SHOT methods, calculations need to be performed before the binarization of feature vectors as in the QBB method. For B-SHOT, no prior computations are required.
As a result of the above consideration, the cost estimates we provide in this section only take into account the computational cost of 'online' processing. In a binarization method, the input is a feature vector, so the running time depends on the length of the input feature vector (denoted by D). The most common operation in binarization is the comparison of real values. Therefore, we believe that it is sufficient to only consider those. Other important constants are the encoding length (L) and the number of bits used for encoding a dimension (N ). Let T (D) denote the cost of comparisons in a binarization algorithm.
The The value of L is usually 4 or 11, while N is usually 4 or 5, these parameters can be considered as constants. One can see that the running time of all four methods is O(D), i.e., it increases linearly with the length of the input feature vector. As a concusion, QBB provides better performance compared to other binarization methods without increasing the computational complexity of the 'online' processing phase.

VI. CONCLUSION
In this work, we proposed a quantile-based binarization method of 3D point feature descriptor, called QBB. We compared our method with other well-known binarization methods and standalone binary feature descriptors. Computations on real point clouds show that QBB can compete with standalone binary descriptors, and it gives better results than other binarized descriptors. We presented possible variants of our binarization algorithm and their performances. Our results suggest that QBB is a suitable replacement for real-valued feature descriptors in certain use cases.
Our conjecture is that for some dimensions less group would be enough. To prove this conjecture our future work includes applying different algorithms to determine the bin width. It would be interesting to see how the group numbers and performance would change if keypoint detection algorithms were used.