Introduction
A majority of Internet transmission is highly redundant. Popular video/audio streaming applications such as radio, TV, Skype/Zoom/Webex teleconferencing, Netflix/Stan entertainment providers, Facebook social platforms, medical remote diagnosis and monitoring, and remote teaching are all good examples Internet applications, which transmit and process highly redundant data. To save communication bandwidth and make transmission faster, a redundant stream is compressed. Upon reception, the receiver recovers the original (redundant) stream. The focus of the paper is lossless compression, where a receiver is able to fully recreate the original/uncompressed data. However, video/audio compression is usually lossy.
Theoretical underpinning of compression is deeply rooted in Information Theory initiated by Shannon’s seminal work [22]. Huffman codes (HC) [9] show optimal compression for symbol streams, whose probabilities follow very specific patterns (i.e. natural powers of
The generic research question posed here is whether it is possible to design a single algorithm that simultaneously compresses and encrypts (called compcrypt). The work addresses its variant when compression is based on ANS and there is no encryption per se but instead, internal states of ANS are controlled by random bits. To make implementation easy, we use PRBG to provide the requested randomness. We expect that \begin{align*} \text {cost}(\text {compcrypt})=&\text {cost}(\text {ANS}) + \text {cost}(\text {PRBG}),\\ \text {comp}\_{}\text{rate}(\text {compcrypt})\approx&\text {comp}\_{}\text{rate}(\text {ANS}),\\ \text {security}(\text {compcrypt})>&\text {security}(\text {ANS}),\end{align*}
Motivation: One of the emerging ANS applications is compression of data gathered, stored and transmitted by the Internet of things (IoT) devices. It is predicted that by 2025, the IoT infrastructure will include more than 75 billion devices [13]. Those Internet-enabled devices will be able to produce and consume a large amount of data. For example, one of the missions of 5G is to provide connectivity for data-intensive machines such as IoT devices forming the foundation of the fourth industrial revolution. Resource limitations on the most IoT devices restrict deployment of both compression and encryption algorithms, especially in real-time applications.
The work aims to extend ANS compression with cryptographic features to address this gap. The idea is to identify the natural properties of ANS, which, together with lightweight cryptographic tools, can provide a “decent” level of security for confidentiality and integrity [3]. This topic has been previously investigated in [7] by Duda and Niemiec. The authors consider a plain ANS with a (pseudo)randomly chosen encoding function. However, once it has been chosen (as a part of initialisation), it is fixed for the whole compression session. Due to an inherent cyclic nature of compression, this leaves a door ajar for possible integrity attacks, where an adversary may inject/remove parts of output bits without detection by the receiver. Note that in many application, data integrity is much more important than confidentiality. For example, environment sensing (temperature/pollution) or smart parking.
In contrast, we cryptographically change behaviour of an underlying ANS during symbol processing so the above mentioned integrity attacks fail with high probability. In particular, we investigate joint compression and encryption (compcrypt) for low-level security IoT communications. By low-level security, we mean IoT applications whose compromise causes relatively low damages or alternatively, an adversary attacking IoT communication has a limited computing resources. Our work is guided by the following requirements for lightweight compcrypt algorithms, i.e. (1) minimal use of cryptography, (2) security against ciphertext-only adversaries and (3) integrity checking mechanism,
Contributions: The work
analyses confidentiality and integrity of data provided by a plain ANS (without any cryptography). The analysis is done for ciphertext-only and known-plaintext attacks. It also discusses integrity of output streams,
provides three compcrypt solutions. First one is based on state jumps. This is a very basic solution that applies a plain ANS, where state transition is controlled cryptographically. It has a similar efficiency as a plain ANS with a slightly reduced compression quality. The second one applies two plain ANS algorithms with transition between the two controlled by PRBG bits. It is as efficient as the first solution but uses two encoding tables so it needs twice as much storage as the first compcrypt. Compression quality is almost the same as the one in ANS. The third one uses PRBG bits to modify ANS encoding function. This solution is the most secure but the least efficient,
evaluates security and efficiency of the proposed compcrypt algorithms.
The rest of the work is structured as follows. Section II introduces the plain ANS. We first give a bird-eye view of ANS followed by a formal description of its algorithms. The section is complemented by an example of a toy ANS. Section IV analyses confidentiality and integrity of the plain ANS under ciphertext-only and known-plaintext attacks. Section V describes our three lightweight compcrypt algorithms. Section VI evaluates security and efficiency of the proposed algorithms. Section VII concludes the work.
Description of Asymmetric Numeral System
Let
A. Bird-Eye View of ANS
ANS [5] allows to achieve a close to optimal compression for a source of an arbitrary probability distribution. The ANS encoding and decoding can be done very efficiently. When describing ANS operations, it is helpful to think about ANS as a finite state machine (FSM), optimised for a given probability distribution, whose states are labelled by integers [16]. We describe building blocks of ANS without giving rationale for their design. This is enough to understand the encryption part of the paper. The reader interested in ANS details is referred to [5], [18].
The main data structure is determined by the number of states
the current state
is re-normalised by truncating enough least significant bits (LSB) so the truncated integerx_{i} belongs toy ,\mathbb {L}_{s_{i}} calculates a new state
by applying an encoding functionx_{i+1} and outputs the binary sequenceC(s_{i},y) =LSBb_{i} , which is a binary encoding of(x_{i}) .s_{i}
The crux of ANS is its encoding function
The approximation
determines quality of compression. This means that there are\frac {L_{s}}{L} \approx p_{s} different integersL_{s} that encodex\in \mathbb L .s By construction, for a given symbol
,s accepts integersC(s,y) . The functiony\in \mathbb L_{s} can be represented by a table, whose rows are indexed by a symbolC(s,y) and columns by an integers . The columns are indexed by all consecutive integers starting fromy\in \mathbb {L}_{s} . The last column index isy_{min}=\min _{s}{L_{s}} . The entries of the2^{R}-1 -th row for consecutive columnss create a setL_{s},L_{s}+1,\ldots, 2L_{s}-1 of states arranged in an increasing order.\Gamma _{s} is also called symbol spread for\Gamma _{s} .s The integers
can be chosen at random fromx\in {\Gamma }_{s} as long as any symbol spread pair is disjoint, i.e.\mathbb L as long as\Gamma _{s} \cap \Gamma _{s'}=\emptyset , wheres\neq s' and\Gamma _{s}=\{ x\in \mathbb {L} | x=C(s,y); y\in \mathbb {L}_{s}\} .\bigcup _{s} \mathbb {L}_{s} = \mathbb {L}
A decoding function
Encoding – given a state \begin{equation*}k=k_{s}(x)=\lfloor \log _{2}(x/L_{s}) \rfloor \longrightarrow b_{s}= x\mod {2^{k}}.\end{equation*}
The binary string \begin{equation*}x \longrightarrow x'=C(s',\lfloor x/2^{k} \rfloor).\end{equation*}
Note that
Decoding – for a state
B. ANS Algorithms
The tANS compression can be seen as a triplet
Initialisation
A set
Instantiation of
the encoding functions
andC(s,y) andk_{s}(x) the decoding functions
andD(x) .k_{s}(x)
Steps: Initialisation proceeds as follows:
calculate the number of states
;L=2^{R} determine the set of states
;\mathbb {L}=\{L,\ldots,2L-1\} for each symbol
, compute integers\in \mathbb {S} , whereL_{s}\approx Lp_{s} is probability ofp_{s} ;s define the symbol spread function
, such that\overline {s}: {\mathbb {L}}\to \mathbb {S} ;|\{x\in {\mathbb L}: \overline {s}(x)=s\}|=L_{s} establish the coding function
for the integerC(s,y)=x , which assigns statesy\in \mathbb {L}_{s}=\{L_{s},\ldots,2L_{s}-1\} according to the symbol spread function;x\in {\mathbb L} compute the function
fork_{s}(x)=\lfloor \lg (x/L_{s}) \rfloor andx\in \mathbb L . The function shows the number of output bits generated during a single encoding step;s\in \mathbb S construct the decoding function
, which for a stateD(x)=(s,y) assigns its unique symbol (given by the symbol spread function) and the integerx\in \mathbb L . Note thaty\in \mathbb {L}_{s} andC(D(x))=x .D(C(s,x))=(s,x) calculate the function
, which determines the number of bits that need to be read out from the bitstream in a single decoding step.k_{s}(x)=R-\lfloor \lg (x)\rfloor
The algorithm
Symbol Frame Encoding
A symbol frame
A binary frame
Steps: For
{
};
Store the final state
The next algorithm takes a binary frame and the final state and produces symbols of the corresponding frame.
Binary Frame Decoding
A binary frame
A symbol frame
Steps: while
{
}
Check
Note that
C. Example
Design compression and decompression algorithms for a source with
Determine symbol spread function \begin{align*} \overline {s}(x)= \begin{cases} s_{0} & \quad \text {if } x\in \{18,22,25\} =\Gamma _{0} \\ s_{1} & \quad \text {if } x\in \{16,17,21,24,27,29,30,31\}=\Gamma _{1}\\ s_{2} & \quad \text {if } x \in \{19,20,23,26,28\}=\Gamma _{2} \end{cases}\end{align*}
Write the encoding function \begin{align*}\begin{array}{l||c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline s \backslash y &3&4&5& 6 &7&8&9& 10 &11&12&13& 14 &15 \\ \hline s_{0} &18&22&25& - &-&-&-& - &-&-&-& - & -\\ \hline s_{1} &-&-&-& - &-&16&17& 21 &24&27&29& 30 & 31\\ \hline s_{2} &-&-&19& 20 &23&26&28& - &-&-&-& - & -\\ \hline \end{array}\end{align*}
Construct the encoding table \begin{align*}& \begin{array}{l||c|c|c|c|c|c|c|c|} \hline s_{i} \backslash x_{i} &16&17&18& 19&20&21&22& 23 \\ \hline s_{0}&{\binom{22}{ 00}}&{\binom{22}{ 01}}&{\binom{22}{ 10}}&{\binom{22}{ 11}} & {\binom{25}{ 00}}&{\binom{25}{ 01}}&{\binom{25}{ 10}}&{\binom{25}{ 11}} \\ \hline s_{1}&{\binom{16}{ 0}}&{\binom{16}{ 1}}&{\binom{17}{ 0}}&{\binom{17}{ 1}} &{\binom{21}{ 0}}&{\binom{21}{ 1}}& {\binom{24}{ 0}}&{\binom{24}{ 1}} \\ \hline s_{2}&{\binom{26}{ 0}}&{\binom{26}{ 1}}&{\binom{28}{ 0}}&{\binom{28}{ 1}} & {\binom{19}{ 00}}&{\binom{19}{ 01}}&{\binom{19}{ 10}}&{\binom{19}{ 11}} \\ \hline \end{array}\\[2mm]& \begin{array}{l||c|c|c|c|c|c|c|c|} \hline s_{i} \backslash x_{i} &24&25&26& 27&28&29&30&31 \\ \hline s_{0}& {\binom{18}{ 000}}&{\binom{18}{ 001}}&{\binom{18}{ 010}}&{\binom{18}{ 011}} &{\binom{18}{ 100}}& {\binom{18}{ 101}}&{\binom{18}{ 110}}& {\binom{18}{ 111}}\\ \hline s_{1}& {\binom{27}{ 0}}&{\binom{27}{ 1}}&{\binom{29}{ 0}}&{\binom{29}{ 1}} & {\binom{30}{ 0}}&{\binom{30}{ 1}}&{\binom{31}{ 0}}&{\binom{31}{ 1}} \\ \hline s_{2}& {\binom{20}{ 00}}&{\binom{20}{ 01}}&{\binom{20}{ 10}}&{\binom{20}{ 11}} & {\binom{23}{ 00}}&{\binom{23}{ 01}}&{\binom{23}{ 10}}& {\binom{23}{ 11}} \\ \hline \end{array}\end{align*}
To illustrate calculations, assume that we have \begin{align*} (19) \rightarrow \begin{array}{c} \binom{19} {s_{1}} \\ \downarrow \\ 1 \end{array} \rightarrow \begin{array}{c} \binom{17} {s_{1}} \\ \downarrow \\ 1 \end{array} \rightarrow \begin{array}{c} \binom{16} {s_{2}} \\ \downarrow \\ 0 \end{array} \rightarrow \begin{array}{c} \binom{26} {s_{1}} \\ \downarrow \\ 0 \end{array} \rightarrow \begin{array}{c} \binom{29} {s_{2}} \\ \downarrow \\ 01 \end{array} \rightarrow \\ \rightarrow \begin{array}{c} \binom{23} {s_{1}} \\ \downarrow \\ 1 \end{array} \rightarrow \begin{array}{c} \binom{24} {s_{1}} \\ \downarrow \\ 0 \end{array} \rightarrow \begin{array}{c} \binom{27} {s_{0}} \\ \downarrow \\ 011 \end{array} \rightarrow \begin{array}{c} \binom{18} {s_{2}} \\ \downarrow \\ 0 \end{array} \rightarrow \begin{array}{c} (28) \end{array}\end{align*}
The output bits are
Build the decoding table. The decoding function \begin{align*} \begin{array}{l||c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline x&16&17&18& 19&20&21&22& 23&24&25&26& 27&28&29&30&31 \\ \hline y &8&9&3& 5&6&10&4& 7&11&5&8& 12&9&13&14&15 \\ \hline \end{array}\end{align*}
The decoding table \begin{align*} \begin{array}{l||c|c|c|c|c|c|c|c|} \hline x_{i} ~&16&17&18& 19&20&21&22& 23 \\ \hline s_{i} &s_{1}&s_{1}&s_{0}& s_{2}&s_{2}&s_{1}&s_{0}& s_{2}\\ \hline k &1&1&3& 2&2&1&2& 2 \\ \hline x_{i+1} &16\!\!+\!\!b_{i}&18\!\!+\!\!b_{i}&24\!\!+\!\!b_{i}&20\!\!+\!\!b_{i} &24\!\!+\!\!b_{i}&20\!\!+\!\!b_{i}&16\!\!+\!\!b_{i}& 28\!\!+\!\!b_{i} \\ \hline \end{array} \\ \begin{array}{l||c|c|c|c|c|c|c|c|} \hline x_{i} ~&24&25&26& 27&28&29&30&31 \\ \hline s_{i} &s_{1}&s_{0}&s_{2}& s_{1}&s_{2}&s_{1}&s_{1}&s_{1} \\ \hline k &1&2&1& 1&1&1&1&1 \\ \hline x_{i+1} & 22\!\!+\!\!b_{i}&20\!\!+\!\!b_{i}&16\!\!+\!\!b_{i}& 24\!\!+\!\!b_{i}&18\!\!+\!\!b_{i}&26\!\!+\!\!b_{i} &28\!\!+\!\!b_{i}&30\!\!+\!\!b_{i} \\ \hline \end{array}\end{align*}
Note that \begin{align*} (28) \rightarrow \begin{array}{c} \binom{28} {0} \\ \downarrow \\ s_{2} \end{array} \rightarrow \begin{array}{c} \binom{18} {011} \\ \downarrow \\ s_{0} \end{array} \rightarrow \begin{array}{c} \binom{27} {0} \\ \downarrow \\ s_{1} \end{array} \rightarrow \begin{array}{c} \binom{24} {1} \\ \downarrow \\ s_{1} \end{array} \rightarrow \begin{array}{c}\binom {23} {01} \\ \downarrow \\ s_{2} \end{array} \rightarrow \\ \rightarrow \begin{array}{c} \binom{29}{ 0} \\ \downarrow \\ s_{1} \end{array} \rightarrow \begin{array}{c}\binom {26}{ 0} \\ \downarrow \\ s_{2} \end{array} \rightarrow \begin{array}{c}\binom {16}{ 1} \\ \downarrow \\ s_{1} \end{array} \rightarrow \begin{array}{c} \binom{17}{ 1} \\ \downarrow \\ s_{1} \end{array} \rightarrow (19)\end{align*}
Pseudorandom Bit Generation
For many IT protocols and simulations, there is a need for a source of random bits. A typical solution applies a pseudorandom bit generator (PRBG). Unfortunately, generated bits are no longer truly random. Many applications can be run successfully as long as pseudorandom sequences “look” random. Looking random can be equated to passing some statistical tests (such as the ones recommended by NIST [19]). However, a choice of statistical tests is highly arbitrary and having many tests to choose from, one can ask which test is really important and which ones can be ignored. Yao in his work [24] argues that there is an universal test, which, if passed, assures that all other statistical tests hold. This is the well known next-bit test. Given an adversary with polynomially-bounded computing resources who can observe a polynomial-size output sequence generated by PRBG. Then PRBG passes the next-bit test if the adversary is able to predict the next bit with probability no better than
There are two classes of CSPRBG: one whose security is anchored to a heuristic argument and the other – to an intractability assumption. The first class includes numerous designs based on nonlinear feedback shift registers (NFSR). For example, Trivium, Snow and Sober (see eStream portfolio https://www.ecrypt.eu.org/stream/). The second class includes a RSA-based PRBG that assumes intractability of integer factorisation [1] and a Bum-Blum-Shub PRBG, whose security rests on intractability of quadratic residuosity [2]. Needless to say, CSPRBG based on an intractability assumption tends to be inherently slow and, thus, not appropriate for devices (such as IoT) with limited computing resources.
In the context of this work, we target PRBG, whose security is heuristic as they are very efficient and can be easily implemented in both software and hardware. More secure candidates include winners of the eStream competition. Less secure PRBGs should not be discounted as they can be a viable option for lower-end security applications especially if one requires a security against a ciphertext-only adversary. In this circumstances, PRBG solutions based on linear feedback registers and linear congruences [12] can be used.
Analysis of Plain ANS
A symbol frame
a view of a receiver who knows
. It sees sequence of encodings for consecutive symbols, i.e.C(s,y) , where{\mathcal{ B}}=(b_{1},\ldots,b_{n}) is an encoding ofb_{i} . In other words, it knows how to divide a binary frame into encodingss_{i} . As eachb_{i} may have a different length, we can define a window frameb_{i} , where{\mathcal{ W}}=(k_{1},\ldots,k_{n}) indicates the number of bits ink_{i} orb_{i} . In other words, the receiver knows both framesk_{i}=| b_{i} | and\cal B ,\cal W a view of an adversary
who does not know\cal A .C(s,y) deals with a binary frame{\mathcal{ A}} and it does not know how to extract particular encodings\cal B . In other words,b_{s_{i}} knows{\mathcal{ A}} but does not know the window frame\cal B (this is also called a synchronisation problem). Note that, in general, a window frame does not determine symbols as the same symbol\cal W can be encoded intos of different lengths.b_{s}
Note that we ignore active adversaries who may access to an ANS encoder/decoder (or to oracles
The above scenarios are most common in IoT applications. The ciphertext-only attack is relevant to any adversary who is able to see the traffic generated by an IoT device. This is true if an IoT device uses broadcast communication (such as Bluetooth or Wi-Fi) to interact with a server. The known-plaintext attack can be launched if an adversary has additionally an access to source of symbols. For instance, it is easy to determine symbols for a temperature sensor by installing an adversarial sensor nearby that hopefully replicates the temperature readings.
A. Ciphertext-Only Attack Against ANS
A majority of IoT devices that use ANS for compression communicates with their servers via broadcasting channels (such as Bluetooth or WiFi). This makes them vulnerable to eavesdropping (alternatively called ciphertext-only attacks). The main difficulty for an adversary is to guess a window frame. After it has guessed it, it can upload an observed binary frame
1) Symbol Versus Window Statistics:
A typical source includes all 28 ASCII symbols. Its statistical properties are approximated by geometric probability distribution truncated to 28 events. To recall, geometric probability distribution is defined as \begin{align*} \begin{array}{ccc} { { Source ~Statistics }} &\quad &\quad { { Window~ Statistics }} \\ P(s_{1})= p_{1} &\quad &\quad P(W=0)=P_{0} \\ \vdots &\quad \rightarrow \text ~{\textit{ANS}} \rightarrow &\quad \vdots \\ P(s_{m})= p_{m} &\quad &\quad P(W=\alpha)=P_{\alpha } \end{array}\end{align*}
Remark 1:
Given ANS window probabilities
2) Guessing Window Frames:
Assume that \begin{equation*} (x+y)^{r}=\sum _{k} {\binom{r}{ k}} x^{k} y^{r-k}.\end{equation*}
Let us illustrate the connection between the theorem and our problem. Assume that we are dealing with window frames that are built from \begin{align*}&\hspace {-.5pc}3^{n}=(1+2)^{n} = \sum _{k=0}^{n} {\binom{n }{ k }} 2^{k} = {\binom{n}{ 0 }}\cdot 2^{0} + {\binom{n }{ 1 }} \cdot 2^{1} +\ldots \\&\qquad\qquad\qquad\qquad\qquad\qquad+\, {\binom{n }{ {n}-1 }}\cdot 2^{n-1} + {\binom{n }{ {n} }}\cdot 2^{n}.\end{align*}
Note that the term \begin{equation*}{\binom{n }{ k }} 2^{k} \cdot P_{1}^{n-k}\cdot \left ({\frac {P_{2}+{\hat P}_{3}}{2} }\right)^{k}.\end{equation*}
The total length of window frames ranges from
Remark 2:
It is reasonable to assume that an adversary knows the length \begin{align*} {n}_{1} + {n}_{2} + \cdots + {n}_{\alpha }=&{n} \\ {n}_{1} + 2\cdot {n}_{2} + \cdots + \alpha \cdot {n}_{\alpha }=&N,\tag{1}\end{align*}
\begin{equation*} {\binom{n }{ {n}_{1},{n}_{2},\ldots,{n}_{\alpha }}}=\frac {n! }{n_{1}!{n}_{2}!\cdots {n}_{\alpha }! }\tag{2}\end{equation*}
Let us make the following observations about ANS resistance against ciphertext-only attack.
The adversary has a “good” chance to guess relatively short window frames. It may attempt to determine such frames at any position of binary stream as symbols are independently generated. As the number
of symbols grows, the probability of success quickly becomes negligible. Note that this observation is consistent with the conclusion made by Gillman et al. [8] about cryptanalysis of compression with Huffman codes that is “surprisingly difficult”.{n} If the length
of binary frame is known and very close to eitherN or{n} , then it is possible to guess the window frame with a non-negligible probability. Note however that probability of such events is negligible for a large enough\alpha \cdot {n} .{n}
B. Known-Plaintext Attack Against ANS
For a given symbol, ANS assigns binary encodings/windows of different lengths. The following observation can be used to determine the window lengths for each symbol.
Fact 1:
Given a symbol \begin{equation*}\lfloor \log _{2}{\frac {2^{R}}{L_{s}}} \rfloor \leq k_{s} \leq \lfloor \log _{2}{\frac {2^{R+1}-1}{L_{s}}} \rfloor.\end{equation*}
When the approximation \begin{equation*}\lfloor \log _{2}{p_{s}^{-1}} \rfloor \leq k_{s} \leq \lfloor \log _{2}{p_{s}^{-1}\left({2-\frac {1}{2^{R}}}\right)}\rfloor.\end{equation*}
A closer look at the above conditions reveals the following properties of window lengths:
If
, then ANS assigns ap_{s}=(1/2)^{i} -bit window.i If
, then ANS assigns a window of the length either 0 or 1.p_{s}> 1/2 Otherwise, ANS assigns a window of the length
, wherek_{s}\in \{i,i+1\} .i=\lfloor \log _{2}{p_{s}^{-1}} \rfloor

Consider ANS from our example. For
1) Guessing Window Frames:
This time our adversary knows both a symbol frame
finds a space of all solutions of the following relation
where\begin{equation*} k_{1}+k_{2}+\ldots +k_{n}=N,\tag{3}\end{equation*} View Source\begin{equation*} k_{1}+k_{2}+\ldots +k_{n}=N,\tag{3}\end{equation*}
is the length of a window used by ANS to encodek_{i} ;s_{i} . Note thati=1,\ldots,n can take on two values only so we can write thatk_{i} , where a constantk_{i}=c_{i}+\gamma _{i} is known to the adversary andc_{i} is unknown. Equation (3) can be re-written as\gamma _{i}\in \{0,1\} The integer\begin{equation*}\sum _{i=1}^{n}\gamma _{i} = N - \sum _{i=1}^{n} c_{i}.\end{equation*} View Source\begin{equation*}\sum _{i=1}^{n}\gamma _{i} = N - \sum _{i=1}^{n} c_{i}.\end{equation*}
is the number of times when\sum _{i=1}^{n}\gamma _{i} and it is known to the adversary,\gamma _{i}=1 enumerates all possible patterns of
, whose weight is(\gamma _{i})_{1}^{n} . It is obvious that the number of patterns isN - \sum _{i=1}^{n} c_{i} To maximise chances, the adversary tries from most probable. This can be done as it knows probabilities\begin{equation*} \binom{n} {N - \sum _{i=1}^{n} c_{i}}.\end{equation*} View Source\begin{equation*} \binom{n} {N - \sum _{i=1}^{n} c_{i}}.\end{equation*}
.\beta _{s}
2) Adaptive Attack Against ANS:
We assume that an adversary knows a symbol frame \begin{align*} \begin{array}{cccl} {~\text {ANS }} &\quad &\quad {~\text {ANS$_{\mathcal{ A}}$}}&\quad \\ \downarrow &\quad &\quad \downarrow &\quad \\ b_{i} &\quad \stackrel {F}{\longleftrightarrow } &\quad b'_{i} &\quad {~\text {for }} i=1,2,\cdots \\ \end{array}\end{align*}
The adaptive attack proceeds along the following steps:
The adversary
designs her ANS\cal A applying the same parameters as the original ANS._{\mathcal{ A}} chooses an initial state\cal A at random. For the first observationx'_{1} , it finds(s_{1},b_{1}) from the encoding table of ANSb'_{1} . It records_{\mathcal{ A}} \begin{equation*}(s_{1},x_{1},b_{1}) \stackrel {F}{\longrightarrow } (s_{1},x'_{1},b_{1}').\end{equation*} View Source\begin{equation*}(s_{1},x_{1},b_{1}) \stackrel {F}{\longrightarrow } (s_{1},x'_{1},b_{1}').\end{equation*}
continues with subsequent observations and builds the function (table)\cal A . This process is successful if the function is fully determined for all symbols and states. If the original ANS or ANSF contain cycles then the algorithm fails. If a cycle occurs in the original ANS,_{\mathcal{ A}} needs to “re-design” ANS\cal A by introducing the cycle of an appropriate length. On the other hand, if ANS_{\mathcal{ A}} hits a cycle, it needs re-design to remove the cycle._{\mathcal{ A}}
C. Integrity of ANS Binary Frames
ANS is normally represented by its encoding table \begin{align*}&\hspace {-.5pc}x_{i} \stackrel {s}{\longrightarrow } x_{i+1}=C\left({s, \Big \lfloor {\frac {x_{i}}{2^{k_{s}}}} \Big \rfloor }\right) \stackrel {s}{\longrightarrow } x_{i+2}=C\left({s, \Big \lfloor {\frac {x_{i+1}}{2^{k_{s}}}} \Big \rfloor }\right) \\&\qquad\qquad\qquad\qquad\quad\stackrel {s}{\longrightarrow } \cdots \stackrel {s}{\longrightarrow } x_{i+j}=C\left({s, \Big \lfloor {\frac {x_{i+j-1}}{2^{k_{s}}}} \Big \rfloor }\right)\end{align*}
Consider ANS from our Example. Assume that ANS starts from an initial state \begin{align*} (19) \rightarrow \begin{array}{c} \binom{28} {s_{2}} \\ \downarrow \\ 1 \end{array} \rightarrow \underbrace { \begin{array}{c} \binom{23}{ s_{2}} \\ \downarrow \\ 00 \end{array} \rightarrow \begin{array}{c}\binom{19}{ s_{2}} \\ \downarrow \\ 11 \end{array} \rightarrow \begin{array}{c} \binom{28}{ s_{2}} \\ \downarrow \\ 1 \end{array} }_{cycle} \rightarrow \begin{array}{c} \binom{23} {s_{2}} \\ \downarrow \\ 00 \end{array} \rightarrow \begin{array}{c} \binom{19} {s_{2}} \\ \downarrow \\ 11 \end{array} \rightarrow \begin{array}{c} \cdots \end{array}\end{align*}
An adversary can inject/delete/replace the cycle and a receiver fails to detect it as the other bits are correctly decompressed and ANS reaches the correct final state. The deletion is illustrated below.\begin{align*} (19) \rightarrow \begin{array}{c} \binom{28} {s_{2}} \\ \downarrow \\ 1 \end{array} \rightarrow \begin{array}{c} \binom{23} {s_{2}} \\ \downarrow \\ 00 \end{array} \rightarrow \begin{array}{c} \binom{19}{ s_{2}} \\ \downarrow \\ 11 \end{array} \rightarrow \begin{array}{c} \cdots \end{array}\end{align*}
The periodic nature of ANS has the following security and design implications.
Cycles in ANS are unavoidable. A designer of ANS can avoid loops (cycles of the length 1) making sure that for each state
and any symbolx_{i} s\in \mathbb S Getting rid of longer cycles requires more and more computation overhead as the designer has to consider different combinations of states and symbols. This also means that the entropy of state selection drops, which means that an adversary does not need to enumerate encoding functions\begin{equation*} x_{i+1}=C\left({s_{j}, \Bigl \lfloor \dfrac {x_{i}}{2^{k_{i}}}\Bigr \rfloor }\right) \neq x_{i}.\end{equation*} View Source\begin{equation*} x_{i+1}=C\left({s_{j}, \Bigl \lfloor \dfrac {x_{i}}{2^{k_{i}}}\Bigr \rfloor }\right) \neq x_{i}.\end{equation*}
that have short cycles.C(s,y) Cycles are easy to identify by searching binary frame for repeating sequences. A detection of a concatenation of two or more bit patterns allows the adversary to remove or insert arbitrary number of times the bit pattern without detection by the receiver. This is true as injection/removal of bit pattern repetition correspond to adding/removing a cycle without disturbing decoding process for other parts of the binary frame (before and after injection/removal).
If a ciphertext-only adversary
can remove/inject binary patterns from/into the binary frame, then a decoder recovers an incorrect symbol frame. A typical integrity check applied in ANS that checks correctness the final state fails.\cal A For an observed binary cycle in
, a known-plaintext adversary can ensemble a relation for encoding function\cal B . This reduces entropy of the encoding function.C(s,y) can increase its chances of guessing lengths of possible cycles by experimenting with random instances of ANS for a known\cal A and a symbol statistics.R hopes that cycles of a target ANS follow the statistics gathered from random instances.\cal A
Lightweight Encryption With ANS
The analysis given in Section IV identifies strengths and weaknesses of ANS and is a major driver for our design of a cryptographically strengthened ANS-based compcrypt. Note that it is easy to design a very secure compcrypt algorithm when one can use a full range of cryptographic tools. A price to pay for increase of security is a heavy resource overhead, which discourages potential users from using them. This is true if ANS is applied for a relatively low-security communication (such as collecting data from IoT devices). In general, IoT devices have very restricted CPU and storage resources and adding extra encryption algorithm may not be practical. As IoT devices communicate with their servers via broadcasting (Bluetooth and Wi-Fi), the data can be subject to both eavesdropping and tampering with its contents. Our constructions are guided by the following design principles:
Minimal application of cryptographic tools so compcrypt preserves its efficiency and compression quality. In other words, our designs must be lightweight avoiding “heavy” cryptography and encouraging potential user to adopt the designs for protection of data collected by IoT devices.
Secure against a ciphertext-only adversary who additionally can modify binary frames by injecting/removing bit cycles. In other words, detection of a cycle in a binary frame is a “false positive” with overwhelming probability.
Repair of the existing ANS authentication/integrity checking mechanism so any bit stream modification is detected with probability
. Note that the plain ANS allows to check equality of a (pre-agreed) encoding initial states on both communicating sides. As discussed in Section IV, this may involve a careful selection of encoding function\approx (1-2^{-R}) with no short cycles.C(s,y)
A. Compcrypt With State Jumps
This solution follows close the original ANS. The only change is pseudorandom selection of the next state. Consequently, it preserves the efficiency of ANS and enhances resistance against confidentiality and integrity attacks. The main cryptographic tool used here is a pseudorandom bit generator (PRBG), whose seed \begin{equation*} x:=(x+{{ state\_{}cor}})\!\!\!\!\!\mod {2^{R}} + 2^{R}.\tag{4}\end{equation*}
The integer
A simple illustration of compcrypt with state jumps is given below.\begin{equation*} \cdots \stackrel {s_{i-1}}{\longrightarrow } \boxed {x_{i-1}} \stackrel {s_{i}}{\longrightarrow } \boxed { x_{i} \stackrel {{{ jump}}}{\longleftarrow } x_{i}+state\_{}cor} \stackrel {s_{i+1}}{\longrightarrow } \boxed { x_{i+1}} \cdots\end{equation*}
Implementation of the algorithm seems to introduce a relatively light overhead. Few points are relevant here.
State jumps tend to have a negative impact on quality of compression. This implies that jumps should not occur too often. Consequently, very short cycles of output bits may be observable. To avoid such cycles, ANS should be carefully designed to exclude short cycles.
Consider a state jump. Note that a binary encoding
has to be computed for the state after jump, i.e.b_{i} . Otherwise, decoding fails.x_{i}+{{ state\_{}cor}} The only cryptographic component used is PRBG. It could be as simple as a linear feedback shift register (LFSR), whose seed (or initial state) is
. It could be also cryptographically strong PRBG based on nonlinear feedback shift register (NFSR) or block cipher or hashing.K Generation of integers
for state correction should be easy in both directions: backward (for encoding where{{ PRBG}}(i,K) decreases) and forward (for decoding wherei increases).i
B. Compcrypt With Double ANS
This solution is more expensive than the first one, offers better security especially against a known-plaintext adversary and preserves compression quality. The idea is to design two copies of ANSi with their encoding functions \begin{align*} \boxed { \begin{array}{c} x_{i+1}=C_{1}\left({s,\lfloor \frac {x_{i}}{2^{k_{s}}}\rfloor }\right) \\ b_{i}=x_{i}\mod {2^{k_{s}}} \end{array} } + \boxed { \begin{array}{c} x_{i+1}=C_{2}\left({s,\lfloor \frac {x_{i}}{2^{k_{s}}}\rfloor }\right) \\ b_{i}=x_{i}\mod {2^{k_{s}}} \end{array} } \stackrel {{{ merge}}}{\longrightarrow } \\ \stackrel {{{ merge}}}{\longrightarrow } \boxed { \begin{array}{c} x_{i+1}\stackrel {\$}{\leftarrow } \left\{{ C_{1}\left({s,\lfloor \frac {x_{i}}{2^{k_{s}}}\rfloor }\right), C_{2}\left({s,\lfloor \frac {x_{i}}{2^{k_{s}}}\rfloor }\right) }\right\}\\ b_{i}=x_{i}\mod {2^{k_{s}}} \end{array} }\end{align*}
Note that
We assume that PRBG generates a single bit for each call
Intuitively, switching encoding functions should not have an impact on compression quality.
Compared to a single ANSi, compcrypt requires larger memory (twice as much) to store two encoding functions
andC_{1}(s,y) . The same size of memory is enough to store encoding functionC_{2}(s,y) for ANS with a double number of states, which allows better approximation of symbol statistics and consequently better compression.C(s,y) An adversary who detects a cycle in the bit stream is unlikely to succeed in injecting it into the stream without detection.
C. Compcrypt With Encoding Function Evolution
Compcrypt based on two ANS can be seen a graph built from two subgraphs. Each subgraph represents a plain ANS. Compression is done by using both subgraphs, where transition between them is controlled by PRBG. As already noted that may be perceived as a waste of resources. An option could be to modify an encoding function \begin{align*} \boxed { x_{i-1}\;\substack { s_{i-1}\\ \xrightarrow {\hspace {10mm}}\\ C(s,y) } \;\;x_{i} }\;\;\; \boxed { x'_{i}=x_{i}+{{ PRBG}}(i,K) \substack { s_{i}\\ \xrightarrow {\hspace {10mm}}\\ C'(s,y) } \;\;x_{i+1} }\end{align*}
The symbol \begin{equation*}x'_{i}=x_{i}+{{ PRBG}}(i,K)\!\!\! \mod {2^{R}} +2^{R},\end{equation*}
This variant has interesting properties. Let us discuss some of them.
As the encoding function is constantly updated, it seems to be difficult to extend attacks, whose goal is its recovery. Additionally, insertion/deletion of binary cycles into/from binary frame is very likely to be detected with high probability.
Quality of compression could suffer and this aspect needs more investigation.
As we have already noted, the
update does not need to be done for every symbol. It looks reasonable to allow longer runs of compression withoutC(s,y) update. If the interval between two consecutive updates is too long, then one can expect that short cycles could be detectable. However, we do not know how this can be exploited by an adversary.C(s,y)
Security and Efficiency of Lightweight Encryption With ANS
Our goal is to strengthen a plain ANS using as little cryptography as possible. In our three compcrypt versions we use a cryptographically strong PRBG (for possible solutions see [4], [15]). This is the only cryptographic tool needed. Note that we assume that the adversary knows our ANS algorithm details except the cryptographic key
A. Security of Compcrypt With State Jumps
We start from a lemma that sets up the background for our security discussion.
Lemma 1:
Given a plain ANS as described in Section II. Then for a symbol
1-bit encodings and for the symbol
, the encoding table row contains equal number of zeros and ones ifs ,L_{s}=2^{R-1} either empty-bit or 1-bit encodings and for the symbol
, the encoding table row contains equal number of zeros and ones ifs ,L_{s}>2^{R-1} either
-bit ork_{s} -bit encodings and for the symbol(k_{s}+1) , the encoding table row includes multiples ofs and2^{k_{s}} if2^{k_{s}+1} , where allL_{s} < 2^{R-1} and2^{k_{s}} entries run through all possible2^{k_{s}+1} -bit ork_{s} -bit strings.(k_{s}+1)
Proof:
According to the frame coding algorithm (see Section II), for a state \begin{equation*} \lfloor \log _{2}{\frac {2^{R}}{L_{s}}} \rfloor \leq k_{s} \leq \lfloor \log _{2}{\frac {2^{R+1}-1}{L_{s}}} \rfloor\tag{5}\end{equation*}
Case 1
If
, then Equation (5) becomesL_{s}=2^{R-1} There is a single value\begin{equation*}\lfloor \log _{2}{\frac {2^{R}}{2^{R-1}}} \rfloor \leq k_{s} \leq \lfloor \log _{2}{\frac {2^{R+1}-1}{2^{R-1}}} \rfloor.\end{equation*} View Source\begin{equation*}\lfloor \log _{2}{\frac {2^{R}}{2^{R-1}}} \rfloor \leq k_{s} \leq \lfloor \log _{2}{\frac {2^{R+1}-1}{2^{R-1}}} \rfloor.\end{equation*}
, for which the above relation holds. As statesk_{s}=1 are chosen from the rangex , it is easy to see that encodings are equal 1 if\{2^{R},\ldots, 2^{R+1}-1\} is odd or 0, otherwise. The numbers of zeros and ones are the same (x runs through all consecutive integers from the interval).x Case 2
If
, then the left side of Equation (5) givesL_{s}>2^{R-1} , while the right side equals tok_{s}=0 . We can find the smallestk_{s}=1 , for whichx . It is easy to see thatk_{s}(x)=1 . ANS produces empty encodings forx=2L_{s} . The other states output 1-bit encodings. As the number of states in the setx\in \{2^{R},\ldots, 2L_{s}-1\} is even, the encodings contains equal number of zeros and ones.\{2L_{s},\ldots, 2^{R+1}-1\} Case 3
if
, thenL_{s} < 2^{R-1} . The smallestk_{s}=\lfloor \lg (2^{R}/L_{s}) \rfloor that yieldsx -bit encoding is(k_{s}+1) . All statesx=2^{k_{s}+1}L_{s} generatex\in \{2^{R},\ldots,2^{k_{s}+1}L_{s}-1\} -bit encodings, whilek_{s} producex\in \{ 2^{k_{s}+1}L_{s},\ldots, 2^{R+1}-1\} -bit encodings. The number of states in the set(k_{s}+1) equals to\{ 2^{i+1}L_{s},\ldots, 2^{R+1}-1\} where\begin{equation*}2^{R+1} - 2^{k_{s}+1}L_{s}= 2^{k_{s}+1}(2^{R-k_{s}}-L_{s}),\end{equation*} View Source\begin{equation*}2^{R+1} - 2^{k_{s}+1}L_{s}= 2^{k_{s}+1}(2^{R-k_{s}}-L_{s}),\end{equation*}
is a multiplier and it has to be positive as the expression is positive. The encoding table row contains a multiple of(2^{R-k_{s}}-L_{s})\geq 1 entries. Any2^{k_{s}+1} consecutive entries cover all possible2^{k_{s}+1} -bit strings as they correspond to consecutive states in the interval. Similarly, the number of states in the setk_{s}+1 can be calculated as\{2^{R},\ldots,2^{i+1}L_{s}-1\} . Using similar arguments, we argue that the entries cover all possible2^{k_{s}+1}L_{s} - 2^{R}=2^{k_{s}} (2L_{s}-2^{R-k_{s}}) -bit strings and they are repeatedk_{s} times.(2L_{s}-2^{R-k_{s}})
We are ready to prove security of the compcrypt algorithm. Assume that we deal with a chosen-plaintext adversary
Theorem 1:
Given ANS with state jumps described by Algorithm 1 and a chosen-plaintext adversary
if2^{-n(R-1)} ;p_{s}=1/2 if{\binom{N}{ N-n}}^{-1} L_{s}^{-(N-n)} (2^{R-1}-L_{s}/2)^{-n} ;p_{s}>1/2 if{\binom{n}{ \alpha }}^{-1} \left ({2^{-(R-1)}L_{s} - 2^{-k_{s}}}\right)^{\alpha } \left ({2^{-k_{s}} - L_{s} 2^{-R} }\right)^{n-\alpha } , wherep_{s} < 1/2 is the number of\alpha -bit encodings.k_{s}
Proof:
As
If
,p_{s}=1/2 knows that each\cal A is either 0 or 1 forb_{i} . Assume that PRBG generates random jumpsi=1,\ldots, n uniformly at random.state\_{}cor needs to identify a\cal A from the values of two consecutive output bitsstate\_{}cor .b_{i},b_{i+1} knows an encoding table and according to Lemma 1,\cal A can guess\cal A with probabilitystate\_{}cor as precisely half of states produce the correct2^{-(R-1)} . Forb_{i},b_{i+1} symbols,n succeeds with probability\cal A .2^{-n(R-1)} If
,p_{s}>1/2 observes output of\cal A bits withN < n bits empty encodingsN-n . It can compute all possible\varnothing binary frames, where each frame includes{\binom{N}{ N-n}} bits andn empty encodings(N-n) . Only one of them is correct. It is easy to verify that we have\varnothing distinctL_{s} values when moving fromstate\_{}cor tob_{i} . They areb_{i+1} ,0 \rightarrow \varnothing and1 \rightarrow \varnothing . Each of the other six options (i.e.\varnothing \rightarrow \varnothing ,0 \rightarrow 0 ,0 \rightarrow 1 ,1 \rightarrow 0 ,1 \rightarrow 1 and\varnothing \rightarrow 0 ) involves\varnothing \rightarrow 1 possible2^{R-1}-L_{s}/2 values. The probability of guessing correct values ofstate\_{}cor is therefore equal tostate\_{}cor \begin{equation*}{\binom{N}{ N-n}}^{-1} L_{s}^{-(N-n)} (2^{R-1}-L_{s}/2)^{-n}.\end{equation*} View Source\begin{equation*}{\binom{N}{ N-n}}^{-1} L_{s}^{-(N-n)} (2^{R-1}-L_{s}/2)^{-n}.\end{equation*}
If
,p_{s} < 1/2 observes output of\cal A bits, where each encodingN>n can be eitherb_{i} ork_{s} -bit long. Let(k_{s}+1) and\alpha be the numbers of encodings with the length\beta andk_{s} , respectively.(k_{s}+1) can compute\cal A and\alpha by solving the following two equations: (1)\beta and (2)\alpha + \beta =n .k_{s}\alpha + (k_{s}+1)\beta =N does not know partition of output bits into encodings or in other words, it does not know the correct window frame. Clearly, there are\cal A possibilities and only one correct. For each guess of a window frame, we analyse possible{\binom{n}{ \alpha }}={\binom{n}{ \beta }} values that lead to correct transition ofstate\_{}cor tob_{i} . Ifb_{i+1} isb_{i+1} -bit long, then there arek_{s} possibilities (out of(2L_{s}-2^{R-k_{s}}) ) that are consistent with the observation (Lemma 1). If2^{R} isb_{i+1} -bit long, then there are(k_{s}+1) possibilities aligned with the observation (Lemma 1). Wrapping up, the probability of a successful guess of(2^{R-k_{s}}-L_{s}) is\cal A \begin{equation*} {\binom{n}{ \alpha }}^{-1} \left ({2^{-(R-1)}L_{s} - 2^{-k_{s}}}\right)^{\alpha } \left ({2^{-k_{s}} - L_{s} 2^{-R} }\right)^{n-\alpha }.\end{equation*} View Source\begin{equation*} {\binom{n}{ \alpha }}^{-1} \left ({2^{-(R-1)}L_{s} - 2^{-k_{s}}}\right)^{\alpha } \left ({2^{-k_{s}} - L_{s} 2^{-R} }\right)^{n-\alpha }.\end{equation*}
Few points are relevant here.
Theorem 1 proves security when
have the same length as the ANS states. It means that each jump is chosen independently at random from the full range ofstate\_{}cor states. Probabilities of guessing pseudorandom bits2^{R} are the smallest and they give the upper bound on security. Shouldstate\_{}cor be shorter, guessing probabilities are growing. In the case whenstate\_{}cor is a single bit, guessing probabilities are equal to 1.state\_{}cor In Lemma 1,
is free to choose an arbitrary strategy of symbol selection. However, it is expected that for a given instance of the algorithm,\cal A first evaluates its chances by computing the relevant success probabilities from Lemma 1 and then\cal A chooses the one that maximises its success probability.\cal A State jumps have a negative impact on compression. The reason for this is a flat probability distribution of states forced by PRBG. Note that probability distribution of a plain ANS follows
, where\approx 1/x is a state. So a plain ANS favours states with shorter encodings.x Let us compare the compcrypt with state jumps with a plain ANS, whose output is XOR-ed with a PRBG keystream (ANS
PRBG). From the efficiency point of view, both solutions are more or less equivalent. A major difference lays in security. Note that for ANS\oplus PRBG, a chosen-plaintext adversary\oplus can extract whole keystream generated by PRBG. This may have grave implications for integrity as\cal A can create valid but fake binary frames. For the compcrypt in hand, this is still possible but with a probability that quickly becomes negligible (see Lemma 1). Additionally, because\cal A is forced to make guesses about\cal A generated by PRBG, it is possible to use PRBG with a lower security level that is more efficient.state\_{}cor
B. Security of Compcrypt With Double ANS
Consider a chosen-plaintext adversary
C. Security of Compcrypt With Encoding Function Evolution
As this compcrypt algorithm needs recalculation of encoding table every time the states are swapped, it is reasonable to expect that the swapping is not frequent. This assumption allows the adversary to launch the following attack. Our chosen-plaintext adversary
D. Other Cryptographic Attacks
Because of its internal structure, ANS is difficult to analyse using standard tools such man-in-the-middle, differential and linear attacks. The main difficulty seems to be irregular lengths of binary encodings that are assigned to symbols. The encodings are glued together with a single binary frame. To do any meaningful analysis, an adversary needs to split the frame into separate encodings. This unavoidably leads to guessing. There is, however, an exception – algebraic cryptanalysis. The heart of ANS is its symbol spread function. If the function can be represented by short polynomials or short Boolean expressions, then there is a hope that this analysis can work.
E. Efficiency Evaluation
Our implementation of tabular version of ANS was written in the Go language (version 1.15.2). Throughout our experiments, we have used an OpenBSD 6.8 installed on a Dell Precision T3610 desktop PC with 32 GB of RAM and an Intel Xeon E5-1650 with 6 physical cores running at 3.5 GHz and hyper-threading enabled, which makes 12 threads available in total. All our compcrypt algorithms invoke PRBG. The impact of the PRBG on the execution time of the encoding and decoding heavily depends on its implementation. Our implementation use standard Go function provided by math/rand.
Let us discuss briefly some implementation details of our compcrypt algorithms with:
state jumps – our experiments assume that state jumps are performed for each input symbol. The initial encoding/decoding tables are created precisely as in the plain ANS.
double ANS – there are two plain ANS algorithms. The switch between the two is done by PRBG for each input symbol. The execution time should not be much different from the previous algorithm. A significant difference relates to an extra memory needed to store two encoding/decoding tables. Consequently, loading time may impact overall execution time. This may be noticeable when processing short streams of symbols.
encoding function evolution – the algorithm is initialised to a plain ANS and then its encoding table is modified for each symbol by swapping the current state with a random one (chosen by PRBG). The swap might look like a computationally cheap operation but, in fact, each non-trivial swap involves recomputation of the encoding table. This means that for each symbol, we may expect up to
table operations.2^{R}
Comparison of execution times of plain ANS (blue □) and compcrypt algorithms with: double tables (red □), encoding function evolution (green □) and state jumps (brown □).
Let us consider quality of compression provided by the three compcrypt algorithms. We use a plain ANS as a reference. Figure 2 describes our results. We observe that compcrypt with encoding function evolution lengthens output stream by < 10% in comparison to the plain ANS. Compcrypt with double tables increases the length of output bits by less than 1%. Compression quality of compcrypt with state jumps is similar to the one of a plain ANS.
Quality of the compression (measured by the number of output bits) for plain ANS (blue □) and compcrypt algorithms with: double tables (red □), encoding function evolution (green □), state jumps (brown □).
F. Comparison of ANS Algorithms
Table II summarizes our security and efficiency discussion. A plain ANS with a secret symbol spread function is a good option, when ciphertext-only security is required and integrity is not important. The ANS-AES compcrypt provides strong security and if implemented as authenticated encryption can support strong integrity. Its Achilles heel is poor efficiency that precludes it from IoT applications. ANS
Table III shows results of IoT implementations for compcrypt algorithms. Experiments have been done for frames of 1000 symbols generated by a source of 256 symbols according to the geometric probability distribution with
Conclusion and Future Research
The work investigates joint compression and encryption for lightweight applications, where natural behaviour of ANS is enhanced using as little cryptography as possible. Consequently, resulting compcrypt algorithms offer low-security level for both confidentiality and integrity (against ciphertext-only adversaries). The only cryptographic tool used is PRBG, which can be chosen depending on efficiency and security requirements. For applications that require a decent security level, a PRBG based on a good quality stream cipher (such as Trivium [4]) is recommended. As hinted in the work, PRBG can be removed all together and replaced by a cryptographic key and make the encoding table dynamic (using encoding function evolution). This is an attractive direction for future research.
We propose three compcrypt algorithms. The first one applies a single ANS with state jumps controlled by PRBG. The second one uses two copies of ANS, where PRBG manages transition between copies. The third compcrypt deploys encoding function evolution that modifies encoding tables. Assuming a ciphertext-only adversary, the security level for confidentiality is mainly determined by the probability of guessing input symbols. It is significant for small number of symbols but diminishes exponentially when the number grows. This is true for all three algorithms. But when the guess is correct we deal with a known-plaintext attack. Under the attack, compcrypt with encoding function evolution offers best security. With the exception of compcrypt with encoding function evolution, the algorithms offer similar efficiency and compression quality as the plain ANS.
Note that compcrypt with encoding function evolution can be slightly modified so it preserves good security features and has “almost” the same efficiency and compression quality as the plain ANS. Instead of swapping states after processing any single symbol, compcrypt starts as the original algorithm (swapping states frequently) and then it gradually increases number of symbols between two consecutive swaps.