KRAMER: Interpretable Rarity Meter for Crypto Collectibles

People trade thousands of non-fungible tokens (NFT) daily. The NFT prices are expressed in cryptocurrency, and it is volatile. As the interest in the NFTs changes, their prices vary with time too. Is there an immanent meter to order NFTs by their value? Within a single collection, a vector of features–traits–characterizes NFT. People construct rarity meters based on the assumed value of the trait vector rarity. But this process lacks formalism. In this paper, we formulate the optimal rarity meter problem and provide a pipeline for optimal rarity meter design. A proposed tournament score function is an essential part of the construction. We demonstrate the approach for the Kanaria NFT collection.

Eagle from 1878 to 1921. The coins they use differ only in terms of the date and mint mark that appears on each coin. The consideration of multiple mintages of each type eliminates the design and beauty factors. A linear model with the logarithms of coin age and mintage as features demonstrated a good fit [14] to the trade data for 1988 and 1993, and the rarity more strongly influences the market price than the age. Jonathan E. Hughes considered an impact of rarity based on ''Magic: the Gathering'' (MTG) collectible card game. MTG has a primary market by Wizards of the Coast company, where one can buy a set of cards called booster, followed by a secondary market. Players can build a card deck and play the game. Each card has attributes that make it useful in game play, i.e. cards have utility. But, beyond the in-game utility, cards have different circulation and ''rarity'' indicator from common to mythic rare. Moreover, some cards are reprinted with a different rarity indicator. The author has collected price data for the secondary market from TCGplayer, an online marketplace for the collectible gaming market used by over 2, 500 local gaming store for a six-week period in March-April 2019 and cards from sixteen recent releases. The linear model based on card in-game attributes, card rarity indicator and odds to obtain the card demonstrates a moderate but statistically significant fit to the trade data. Rarity indicator by publishers influences the price, and the odds to obtain the card are important price factor for a small odds region.
NFT market growth [15] challenges enthusiasts to design a rarity meter for token collections. Compared to the coins and MTG cards, NFT collections have structured sets of traits.Rarity.tools [8] provides a rarity meter, which, de facto, is an industrial standard now. The formula is a summation of an inverse frequencies of trait values 3. Alternative services [16]-Rarity Sniper, NFT Stats and Fresh Drops to name a few-use the same approach for the rarity meters. NFTs in some collections can have utility like breeding, access to DAO or club [17], traits are dependent and have a different impact on the beauty of the associated picture. Nadini et al. demonstrate visual features to be good predictors for price [18], but do not go into the trait structure behind the features. The existing rarity approach does not allow to adjust the meter to the specific collection, thus can be improved. Moreover, the industry lacks a performance metric to compare rarity meters.
While the rarity meter by Rarity.tools is controversial, Mekacher et al. show that, on average and among other, rarer NFTs sell for higher prices. The analysis includes a dataset of 3.7M transactions collected between January 2018 and June 2022, involving 1.4M NFTs distributed across 410 collections. Their generalizations use the Spearman rank correlation coefficient, i.e., it does not catch the change of the interest in the collection over time, which may affect the results especially for expensive NFTs with a few sells.
The main contributions of our work to the rarity meter design problem are 1) Introduce the performance metric for rarity meters based on pair-wise weighted correlation between deals. 2) Provide a solution pipeline, where the optimal rarity meter is a combination of expert-driven score functions.

3) Construct artificial collection examples, where the
state-of-the-art score function provides controversial results. 4) Propose tournament-based modification to the state-ofthe-art score function. 5) Demonstrate the approach with an Kanaria NFT collection.

III. OPTIMAL RARITY METER PROBLEM
Consider a single NFT collection. Let the collection consists of N tokens. Some collections allow minting new tokens.
We consider them at a fixed moment and design the rarity meter for a given blockchain snapshot. So the number of tokens N is the number of tokens minted in the given snapshot, and it is fixed. Let be tokens in the collection. Let each token X n has T traits: The values of the traits are categorical variables-have no order. The traits are ''gens'' of the NFT, and their combination defines the ''phenotype'' of the token, for example, object shape or background color. We say that an arbitrary function is a rarity meter. One expects the bigger the value of R(X n ) the rarer is the NFT X n ∈ X N . Different people may have different views on the rarity. So how can one compare two rarity meters and find the one which better meets expectations-the optimal rarity meter? The only answer we have is ''the market knows.'' Users trade NFTs, and one expects to see a greater rarity meter for a more expensive NFT. Let D deals be in a blockchain snapshot. Each deal d = 1, . . . , D characterizes with the deal moment t d , the sold NFT index i d of and the deal price p d . The price is expressed in cryptocurrency, and it is volatile. The interest in the collection changes over time as well. Thus, the NFTs' prices change with time too. We want such a final rarity meter that if two deals for different NFTs are close in time, the rarity meter value of the more expensive NFT is greater. The consideration of deals pairs (d 1 , d 2 ) with weights k(t d 1 , t d 2 ) is a possible formalization, where weights decrease with time between deals |t d 1 − t d 2 | increase. Let a function and ϕ is a skew-symmetric function: Let a function ψ : (0, ∞) × (0, ∞) → R compare prices of the deals of two NFTs, where and ψ is a skew-symmetric function: vectors of all relative rarities, prices and weights, i.e.
be a similarity function of vectors ⃗ ϕ and ⃗ ψ with given weights ⃗ k. We propose the optimal rarity meter problem as follows where R is a search space of rarity meters. [19] as a weight function as a weighted similarity of vectors, where corr is the weighted correlation function [20]. I.e., let People already have their expectations of rarity. One can formulate preferences in partially ordered sets (posets) [21], where the greater element is rarer than, the smaller one. For example, X i > X j if x id has less entries in {x 1d , . . . , x Nd } than x jd . We equip each poset with a score function For example, where #A is the cardinality of the set A and the subscript f stands for frequency. A score function is a rarity meter but also expert-based and interpretable [22], [23]. We want an optimal interpretable rarity meter, so we take score functions as building blocks and define R as all the combinations of the chosen score functions with non-negative weights, i.e., a positive hull. Given M score functions s 1 , . . . , s M , the rarity meter search space equals

IV. TOURNAMENT SCORE FUNCTION
Score functions define a rarity meter search space (4). A score function is the rarity meter, i.e., a mapping from NFT collection X N to non-negative real numbers [0, ∞). The expert-based and interpretability requirements are subjective. This section introduces the heuristic approach to the score functions.

A. UNIQUE VS RARE
Thinking about different collections, we noticed several things. Imagine a collection of 1 red ball and 999 gray balls.
We think everyone will agree that this red ball is unique and very rare. But imagine a collection with 500 gray balls and 500 balls of different and non-repetitive colors. Look-any of these balls are unique-but to be honest, it is not what we would like. We feel these non-grays like ''just 500 colored balls.'' So we must not only think about ball color like a member of a group (like ''there are only 20 such colors, this color is the same in the group of 20 same colored balls''), but also we must think about what groups are in our collection. For example 500 groups of 1 color and 1 group of 500 gray color.
Rarity.Tools formula [8] suggests to count only an amount in the group: you get TOTAL NFTs group size point for every trait and sum of the points is your Rarity result. The sense of the very similar formula could be: ''Ok, this color is a part of the group of n same colors.'' Let us count the different colored balls and divide the group members' results. Formula is TOTAL NFTs -n n (and this is almost the same with the Rarity tools formula, difference is −1).
But this formula seems not good enough. Imagine an NFT with 2 traits. The first is color, and the second is shape. There are 5000 unique different colored and 5000 gray NFTs. 5000 ''just colored'' NFTs are all round, 4999 gray are triangles, and the last one-gray ball-is squared. If we count with the Rarity.tools formula, every ''just colored'' ball will have the same rarity score as a squared (and truly unique) gray ball.
If there are 500 groups of 20 same colored balls, we will not treat this as a big achievement. So it is important to differentiate between Uniqueness, Rarity, and Uniqueness Rarity. Uniqueness has no big value if it is not Rare. So we need the Rarity meter, which counts the rareness of this exact ''color'' and a collection structure.

B. SCORE FUNCTION DESIGN
Let us construct an alternative score function to the reverse frequency from Section IV-A. Let the values of a trait be not comparable by themselves, i.e., and golden eyes are not better than brown eyes just because of color. We will compare only groups of the trait. If there are 50 golden eyes and 20 brown eyes, and 20 gray eyes, we will compare only 3 different groups of 50, 20, and 20 of this trait.
So, first of all, we split every NFT by traits and counted the score for each trait. After that, we take an average for all traits.
For every trait we write out all group sizes N 1 , . . . , N G , where G is the number of groups and G g=1 N g = N . For  The order does not matter. Then, we start a tournament between these groups. One point is played out in every battle between groups with N 1 and N 2 elements correspondingly, and each group's points are inverse to the group size. • We can count the average rareness of an NFT with several traits.
• We are counting the percentage, not the score, depending on the collection size. The percentage is much clear, and you do not need to know additional information.

Bad news
• Sometimes results seem questionable. For example, Tables 2, 3, 4. So if we leave the results as is, we will get a worse score for 5 Uniques (with 95 gray) vs. a group of 10 same-colored in (10,10,20,20,20,20) collection.
The finishing touch. As we mentioned before, the average of the right column is always 0.5. But the average of all entities of the trait (trait_average) is different. So if one trait is not more important than another, we need to weigh all traits. The final average of each trait should be the same-1. For every single instance we get group_score/trait_average. Formally, let N (X ) be the group size of X for a given trait. The tournament score The term − 1 2 is to eliminate the tournament within the group. See Tables 5, 6, 7.
Different entities of the trait have relatively the same score, but also the average of different traits is the same and equal to 1. Also, note that More Rare Uniqueness will have a better score than Less Rare Uniqueness.

V. NUMERICAL EXPERIMENTS A. KANARIA COLLECTION
Kanaria NFT collection based on RMRK protocol and running on the Kusama blockchain-an experimental Polkadot network [6], [24]. Each Kanaria NFT is associated with a canary bird picture (see Figure 2-a). Traits define the bird's appearance (see Figure 2-b). And there is a meta trait called edition which affects the traits generator. Different traits may have the same value, and the picture parts will have the same style. Let us consider the collection structure leaving beyond the scope of the paper the hatching mechanism, items, and gems (see Figure 2-c).  We grouped these parts intentionally into four groups.
First, it is important to understand the Theme and what ''trait values'' can be found in the birds. The Theme is an important source of ''color'' for the parts without a ''color.'' There are two types of bird parts ''color,'' independent (like a zombie, diamond, butterfly, etc.) and dependent (Plain, Pinstrip, Speckled, Solid, Iridescent). Dependent ''colors'' don't give the bird part a ''color,'' and they only conduct the Theme's ''color'' to the bird's part. You can think about these five ''colors'' as they are fluted glass pieces. These dependents are var0, var1, . . . , and var4 in the code (we suppose var stands for variation). We will call them Var (Vars). For example, the bird #119 has the theme ''Sakura,'' and its Body, Right Wing, and Feet have inherited the Theme with Var1 (Pinstrip) style (see Figure 2-b). Also, note that the Theme is never a Var.
In code, Feet are represented as left foot and right foot. They are always equal. Also, there are left hands and right hands, but they are always equal to the left and right wings.  The reason to have extra duplicated traits is to have slots to equip items. Next, 6 parts (Head, Left wing, Right wing, Tail, Feet, Body) were generated, taking a bit of probability from the emojis which were added to the Kanaria Eggs during the spring, June, and July 2021. If all of these 6 are not Vars and they are equal, it is called Full Set, and there are only 105 of them based on the code plus one special Full Set #3413 (see Figure 1-a,b). The Top is a specific part: it overlays the tail and feet in the Birds Image and overwrites the tail in the code (see Figure 1- Eyes and Beak (shown as a Face for the user) are the same in most birds, and only 866 birds have something meaningful plus Var. Var is ignored in this case. Eyes and Beak did not take any probability from the egg emojis, so it is a standalone pair.

3) SETS
Set is the best combination of the same non-Vars from {Head, Left Wing, Right Wing, Tail, Feet, and Body}. For example, birds 1, 436, and 9997 are Full Sets, which means that 6 out of 6 these parts are the same non-Vars. Here are the numbers • 6 same non-Vars -105 • 5 same non-Vars -1 • 4 same non-Vars -23 • 3 same non-Vars -828 • 2 same non-Vars -1650 • 1 non-Var -4120 • all Vars -1751. Some sets are extended with Theme and/or Face. While counting the rareness of the sets, we defined that a 6-set with the same Theme is better than a 6-set with the same Face (due to more rare appearance) and better than a not extended 6-set. Of course, the bird with 3 zombies in a set with an additional zombie in the Face looks much more valuable than a bird with ''just three zombies,'' but we count it worse than a bird with the 4-set (and no extension).
Notable facts • No bird has three or more different non-Var, i.e., Diamond wing, Volcano head, Tornado Tail • There are no birds with all the same parts (including Eyes, Beak, and Theme) • As mentioned, the only 5-set bird is #3413. It has 5 diamonds and 1 Jinn.
• Only 2 sets are extended by the Theme (Theme is the same as a bird set).

4) GEMS AND ITEMS
In KRAMER (Kanaria RArity MEteR) rarity meter, we do not count Gems and Items for two reasons 1) Gems and Items could be removed from the bird. Gems could be used for different cases, and Items can be moved from the bird directly. We wanted to calculate rarity only for the unchangeable parts of the bird. 2) Gems and Items and their evaluation is a matter of taste and subjective assessment. For example, Njord/son Gem is evaluated with a big expectation (it gives you a part of RMRK profits), but no one knows what profits Kokopelli can give you in the future (it gives you a possibility to enter RMRK-related projects very early, say seed round prices).

B. OPTIMAL RARITY METER
We calculated 3 scores for Traits, Sets and Edition (5). The Trait Score (TS) is an average of 9 tournament scores for 9 parts (all except Top) from Subsection V-A2. The Sets Score (SS) is a tournament score, where the group size is treated as cumulative, with all groups with better sets. The Edition Score (ES) is a tournament score (Section V-A1). The search space (4) is considered a positive hull of TS, SS, and ES. We split the deals dataset into the train (before November 2021) and test (November 2021) parts. The optimal rarity meter (1)-Kramer-is 0.75 · TS + 0 · SS + 0.25 · ES. Under the hood, TS, SS, and ES give a correlation of 0.56, 0.38, and 0.74 correspondingly for the train sample, while Kramer gives 0.83 for the train and even more-0.88-for the test sample.
Notable facts • Normally the price of a bird is from 1 to 1600 Kusama. But bird #7723 has been sold twice for 0.01 Kusama at 8:46:42 and 8:49:00 UTC on September 6, 2021, by developers. We consider these two events as an RMRK team test and omit them.
• Note that the market does not appreciate the sets. We suppose this happens because sets (or, in other words, the concentration of some sort of gene) have no functionality. In the future, when, for example, birds could have offspring, the set will matter a lot, and we could see that looking at the prices on the market.

VI. CONCLUSION AND FUTURE WORK
NFT rarity meters are accessible on the Web, but the area was missing mathematical formalism. The proposed optimal rarity meter problem (1) resolves the issue. The exact form of the problem depends on the choice of similarity F, relative rarity ϕ, relative price ψ, weight function k, and search space R. We proposed a possible parameters setup, but one can investigate alternatives and use his own. We choose the search space R (4) as a positive hull of score functions, where search functions are elementary interpretable rarity meters, the hull allows us to interpret the resulting rarity meter as a weighted sum of elementary once, and the hull is positive to allow voting for, but not against, each NFT's rarity. Section IV motivates to elaborate more on score functions design, and the proposed tournament design (5) is a heuristic to open the discussion.
The proposed approach to the rarity is shown for the Kanaria NFT collection. The optimized rarity meter gives the performance equals to 0.88 out of 1, compared to the 0.56 for the default traits-based rarity meter. The web application with the resulting meter is available in [10], and we present detailed data mining and analysis process insights in Section V. We use the Kanaria collection as a reference example because it inspired us for the research during the hackathon. We consider the analysis of other collections as a direction for further study.
The similarity measure F (1) allows comparison of the rarity meters available online if the values are available for the same collection. As most meters do not reveal their approach, one can provide a new one by reverse-engineering the best available rarity meters if only it is legal.
The proposed rarity approach compares NFTs within a single collection, while it is also interesting to compare different collections, for instance, to help collection designers get wanted diversity.
MIKHAIL KRASNOSELSKII received the Specialist degree in informatics and mathematics from the Maimonides State Jewish Academy, Moscow, Russia, in 1998.
He was a Hedera Hashgraph Ambassador and the Hedera's Hackathon Co-Organizer. He is currently an Ambassador of RMRK, with interests in blockchain, governance, team building, and community research and development. He is also a Contract Bridge European Champion and a Russian Champion in multiple times. His research interests include peer-to-peer random number generation, interpretable machine learning, decentralized finance (DeFi), and non-fungible tokens (NFT).
YASH MADHWAL is currently pursuing the Ph.D. degree with the Skolkovo Institue of Science and Technology (Skoltech), specializing in implementing blockchain technology in resolving supply chain problems. He is currently a Teaching Assistant of the course ''Introduction to Blockchain.'' Additionally, he is also invited as a Guest Lecturer at different universities to deliver an introductory lecture on blockchain technology and potential applications. He has conducts technical seminars and showing the listeners methods to build blockchain applications. He has authored multiple scientific articles, where he has built prototypes of the blockchain-based decentralized application (DApp), focusing on industrial problems, especially the supply chain.
YURY YANOVICH (Member, IEEE) received the bachelor's and master's degrees (Hons.) in applied physics and mathematics from the Moscow Institute of Physics and Technology, Moscow, Russia, in 2010 and 2012, respectively, and the Ph.D. degree in probability theory and mathematical statistics from the Institute for Information Transmission Problems, Moscow, in 2017.
Since 2017, he has been a Lecturer of the ''Introduction to Blockchain'' course at top Russian universities. He is currently a Senior Research Scientist with the Skolkovo Institute of Science and Technology, Moscow. He is the author of Exonum consensus protocol. His research interests include blockchain, consensus protocols, and privacy and applications.