By Topic

• Abstract

SECTION I

## INTRODUCTION

A LINEAR scrambler is usually used in a communication system to convert a data bit sequence into a pseudorandom sequence that is free from long strings of 1 s and 0s. It is easy to implement with a wide variety of scrambler polynomials to choose from and the choice of which one to use has relatively little impact on the performance of the communication system. However, basing on the scrambler reconstruction technique detailed in [1], it is found in [2] that not all scrambler polynomials offer equal protection against reconstruction. In this work, we examined further the reconstruction of the feedback polynomial of a linear scrambler assuming the source bits are being encoded with forward error correction coding before being scrambled. The findings of this work are envisaged to aid the design of secured digital communication systems implemented in a flexible platform such as software defined radio (SDR). Our results point out what can be done to prevent reconstruction of a communication system; for example, various scrambler reconstruction techniques were proposed in [1], [2], [3], [4], [5]. The proposed approach will also add to the plethora of techniques for designing an intelligent receiver which can adapt itself to the different building blocks of the transmitter such as those proposed in [6], [7], [8]. It is also an extension of the results and findings on recovery of error-correcting codes which include linear block codes [9], [10], [11] and convolutional codes [12], [13], [14], [15], [16].

There are generally two types of linear scrambler, namely synchronous scrambler and self-synchronized scrambler. Both types of scrambler usually consist of a LFSR whose output sequence ${(s_{t})}_{t\geq 0}$ is combined with the input sequence ${(x_{t})}_{t\geq 0}$ and the result is the scrambled sequence ${(y_{t})}_{t\geq 0}$, i.e., TeX Source $$y_{t}=x_{t}\oplus s_{t}\quad t\geq 0\eqno{\hbox{(1)}}$$ where $\oplus$ denotes modulo 2 summation. In this paper, for simplicity, only synchronous scramblers are considered. Reconstruction of a synchronous scrambler consists of reconstructing the feedback polynomial of the LFSR as well as its initial state. When some input and scrambled bits are known, the Berlekamp-Massey algorithm [3] can be used to reconstruct the feedback polynomial of the LFSR. In [4], a method is proposed to estimate the initial state of the LFSR from the scrambled sequence only, and by assuming that the feedback polynomial of the LFSR is also known. Recently, in [1], an algorithm is proposed by Cluzeau for reconstructing the feedback polynomial of the LFSR by only using the scrambled sequence. In the following, this algorithm will be referred to as Cluzeau's algorithm.

Although Cluzeau's algorithm is much more efficient than the brute force search algorithm in the recovery of the feedback polynomials of the LFSR, it is based on the critical assumption that the source bits, which xor directly with the outputs of the LFSR, are distributed with a biased probability $\hbox{Pr}(x_{t}=1)=(1/2)-\varepsilon$, where $\varepsilon\neq 0$. Although this assumption usually holds for natural sources, when the source bits pass through a channel encoder before they are scrambled, the bias existing in the bit sequence might become very small. Consequently, the number of bits required to do the reconstruction becomes exorbitantly large. To deal with this problem, in this paper, a scheme is proposed to use the property of “dual words”, which are orthogonal to the codewords generated by the channel encoder, instead of the bias existing in the encoded bit sequence, to achieve reconstruction of the scrambler. It can be observed that by using the proposed scheme, the number of bits required for reconstruction is reduced drastically.

The paper is organized as follows. In Section II, Cluzeau's algorithm is reviewed. In Section III, the bias existing in the encoded bit sequence after a channel encoder is analyzed. In Section IV, the scheme to recover the feedback polynomial as well as the initial state of the LFSR in a linear scrambler placed after a channel encoder is proposed. In Section V, the problem of reconstruction of the scrambler in the presence of channel noise is investigated. Some security propositions are given in the concluding section in Section VI.

SECTION II

## CLUZEAU'S ALGORITHM FOR RECONSTRUCTING A SYNCHRONOUS SCRAMBLER

In a synchronous scrambler, $s_{t}$ is generated independently of $x_{t}$ and $y_{t}$, as shown in Fig. 1.

Fig. 1. Structure of synchronous scrambler.

Instead of brute force searching for the feedback polynomial $P(X)$ directly, Cluzeau's algorithm searches for sparse multiples of $P(X)$ with the degree of the sparse multiples varying from low to high. After two multiples of $P(X)$ are detected, it returns the nontrivial greatest common divisor (gcd) of the two detected multiples as the detected feedback polynomial. The determination of whether a sparse polynomial is a multiple of $P(X)$ or not is based on a statistical test on the absolute value of a variable $Z$, which is given by TeX Source $$Z=\sum_{t=i_{d-1}}^{N-1}(-1)^{z_{t}},\eqno{\hbox{(2)}}$$ where $z_{t}$ is a modulo 2 summation of $d$ scrambled bits, i.e., $z_{t}=y_{t}\oplus\bigoplus_{j=1}^{d-1}y_{t-i_{j}}$, $(0<i_{1}<i_{2}<\cdots<i_{d-1})$, and $N$ is the number of bits required for the reconstruction. Let $Q(X)=1+\sum_{j=1}^{d-1}X^{i_{j}}$. When $Q(X)$ is a multiple of $P(X)$, we have TeX Source $$z_{t}=y_{t}\oplus\bigoplus_{j=1}^{d-1}y_{t-i_{j}}=x_{t}\oplus\bigoplus_{j=1}^{d-1}x_{t-i_{j}}\eqno{\hbox{(3)}}$$ since $s_{t}\oplus\bigoplus_{j=1}^{d-1}s_{t-i_{j}}=0$ and $y_{t}=x_{t}\oplus s_{t}$. According to the statistical analysis results given in [1], $z_{t}$ is biasedly distributed with $\hbox{Pr}(z_{t}=1)=(1/2)[1-(2\varepsilon)^{d}]$, if the input bits are biasedly distributed with $\hbox{Pr}(x_{t}=1)=(1/2)-\varepsilon$, where $\varepsilon\neq 0$. Consequently, the value of $Z$, i.e., $\sum_{t=i_{d-1}}^{N-1}(-1)^{z_{t}}=(N-i_{d-1})-2\sum_{t=i_{d-1}}^{N-1}z_{t}$, is Gaussian distributed with the mean value $\mu$ given by TeX Source $$\mu=(N-i_{d-1})(2\varepsilon)^{d}\eqno{\hbox{(4)}}$$ and the variance $\sigma^{2}$ [5] given by TeX Source $$\sigma^{2}\leq(N-i_{d-1})\left[1+d\left((2\varepsilon)^{2}-(2\varepsilon)^{2d}\right)\right].\eqno{\hbox{(5)}}$$

It can also be shown that when $Q(X)$ is not a multiple of $P(X)$, $\hbox{Pr}(z_{t}=0)=1/2$, implying that $Z$ has a Gaussian distribution with the mean value 0 and the variance $N-i_{d-1}$. The two distributions are depicted in Fig. 2.

From Fig. 2, it can be observed that when the two distributions of $Z$ have a small enough intersection, a threshold $T$ can be used to determine whether $Q(X)$ is a multiple of $P(X)$, i.e., when $\vert Z\vert<T$, $Q(X)$ is not a multiple of $P(X)$; otherwise, $Q(X)$ is a multiple of $P(X)$. The threshold $T$ and the number of bits required for the reconstruction $N$ depend on two factors, i.e., the false-alarm probability $P_{f}$ and the nondetection probability $P_{n}$.

Fig. 2. Distributions of Z.

Let TeX Source $$a=\Phi^{-1}\left(1-{P_{f}\over 2}\right)={T\over\sqrt{N-i_{d-1}}}\eqno{\hbox{(6)}}$$ and TeX Source $$b=-\Phi^{-1}(P_{n})={T-\vert\mu\vert\over\sigma},\eqno{\hbox{(7)}}$$ where $\Phi$ denotes the normal distribution function. From (6) and (7), it can be derived that the threshold $T$ is TeX Source $$T={a(a+b\bar{\sigma}_{l})\over\left(2\vert\varepsilon\vert\right)^{d}},\eqno{\hbox{(8)}}$$ and the number of bits required for the reconstruction is TeX Source $$N=i_{d-1}+{(a+b\bar{\sigma}_{l})^{2}\over(2\varepsilon)^{2d}},\eqno{\hbox{(9)}}$$ where $\bar{\sigma}_{l}$ is the normalized upper bound of $\sigma$, which is given by TeX Source $$\bar{\sigma}_{l}=\sqrt{\left[1+d\left((2\varepsilon)^{2}-(2\varepsilon)^{2d}\right)\right]}.\eqno{\hbox{(10)}}$$ More detailed description of Cluzeau's algorithm can be found in [1] and [5].

SECTION III

## BIAS AFTER CHANNEL ENCODER

In many communication systems, error correcting codes are used to combat errors introduced by the communication channel. In this work, we considered the case when the channel encoder is placed between the source and the scrambler as shown in Fig. 3.

In the following, the bias existing in the encoded bit sequence after a channel encoder will be analyzed. Two commonly used error correcting codes are considered, i.e., linear block code and convolutional code.

### A. Bias of a Bit Sequence After a Linear Block Encoder

Fig. 3. Chain of scrambler and channel encoder.

Generally, for a $(n,k)$ binary linear block code ${\cal C}$, where $k$ is the number of information bits and $n$ is the number of coded bits, a $k\times n$ generator matrix can be defined by the following $k\times n$ array: TeX Source $${\bf G}_{\bf b}=\left[\matrix{{\bf g}_{\bf 0}\cr{\bf g}_{\bf 1}\cr\vdots\cr{\bf g}_{{\bf k}-{\bf 1}}}\right]=\left[\matrix{g_{0,0}&g_{0,1}&\cdots&g_{0,n-1}\cr g_{1,0}&g_{1,1}&\cdots&g_{1,n-1}\cr\vdots&&&\cr g_{k-1,0}&g_{k-1,1}&\cdots&g_{k-1,n-1}}\right],\eqno{\hbox{(11)}}$$ where $g_{i,j}=(0\ \hbox{or}\ 1)$ and ${\bf g}_{\bf 0},{\bf g}_{\bf 1}\ldots{\bf g}_{{\bf k}-{\bf 1}}$ are linearly independent $n$-tuples that form a basis for ${\cal C}$. Considering a $k$-tuple message, i.e., TeX Source $${\bf x}=(x_{0},x_{1},\ldots,x_{k-1}),$$ the encoder transforms the message ${\bf x}$ independently into an $n$-tuple codeword ${\bf c}=(c_{0},c_{1},\ldots,c_{n-1})$ by TeX Source $${\bf c}={\bf x}\cdot{\bf G}_{\bf b}=(x_{0},x_{1},\ldots,x_{k-1})\left[\matrix{{\bf g}_{\bf 0}\cr{\bf g}_{\bf 1}\cr\vdots\cr{\bf g}_{{\bf k}-{\bf 1}}}\right].\eqno{\hbox{(12)}}$$ Any encoded bit $c_{i}$ $(i=0,1,\ldots,n-1)$ can be written as a linear binary summation of the message bits, i.e., TeX Source $$c_{i}=g_{0,i}x_{0}\oplus g_{1,i}x_{1}\oplus\cdots\oplus g_{k-1,i}x_{k-1}.\eqno{\hbox{(13)}}$$

Suppose the source bit sequence is produced by a biased and memoryless source with bias $\varepsilon$, and the number of nonzero terms (the weight) in the $i$th column of ${\bf G}_{\bf b}$ is $L_{i}$ $(i=0,1,\ldots,n-1)$, then the probability that $c_{i}=1$ is given by TeX Source \eqalignno{{\rm Pr}(c_{i}=1)=&\,\sum_{l=1,3,\ldots}^{L_{i}}\left({L_{i}\over l}\right)\left({1\over 2}-\varepsilon\right)^{l}\left({1\over 2}+\varepsilon\right)^{L_{i}-l}\cr=&\,{1\over 2}\left[1-(2\varepsilon)^{L_{i}}\right].&\hbox{(14)}} According to (14), the bias existing in the $i$th encoded bit $c_{i}$ is $\varepsilon_{c_{i}}=1/2(2\varepsilon)^{L_{i}}$. As $L_{i}\geq 1$ and $\varepsilon\leq 0.5$, we have $\varepsilon_{c_{i}}\leq\varepsilon$. The bias existing in the whole encoded bit sequence, $\varepsilon_{bc}$, can be expressed by TeX Source $$\varepsilon_{bc}={1\over n}\sum_{i=0}^{n-1}\varepsilon_{c_{i}}\leq{1\over n}\sum_{i=0}^{n-1}\varepsilon=\varepsilon.\eqno{\hbox{(15)}}$$ From the above equation, it can be observed that the bias existing in the encoded bit sequence is less than or equal to the bias existing in the bit sequence before the encoder. Consider the systematic encoder, for which $L_{0}=L_{1}=\cdots=L_{k-1}=1$ and $L_{k},L_{k+1},\cdots,L_{n-1}>1$. The bias existing in the encoded bit sequence can be roughly estimated by TeX Source $$\varepsilon_{bc}={1\over n}\sum_{i=0}^{n-1}\varepsilon_{c_{i}}={k\over n}\varepsilon+{1\over 2n}\sum_{i=k}^{n-1}(2\varepsilon)^{L_{i}}\approx{k\over n}\varepsilon.\eqno{\hbox{(16)}}$$

To verify (16), the bias existing in the bit sequences of the output of the BCH encoders are obtained by computer simulations and results are shown in Table I. In each simulation, a bit sequence which contains $10000\times k$ information bits is input into a BCH encoder (systematic encoder) and the simulation is repeated 100 times. The bias existing in the bit sequence before the encoder is set to 0.1. From Table I, it can be observed that the bias after the BCH encoder determined by the simulation results matches very well with that computed by (16).

TABLE I BIAS AFTER SOME BCH ENCODERS

### B. Bias of a Bit Sequence After a Convolutional Encoder

An $(n,k,m)$ convolutional code, where $k$ is the number of information bits, $n$ is the number of coded bits and $m$ is the constraint length, can be defined by a $k\times n$ generator matrix ${\bf G}_{\bf c}$ which consists of $k\times n$ binary “impulse responses” ${\bf g}_{\bf j}^{({\bf i})}$, where $i$ denotes the $i$th input $(0\leq i<k)$ and $j$ denotes the $j$th output $(j\leq j<n)$, i.e., TeX Source $${\bf G}_{\bf c}\!=\!({\bf G}_{\bf 0},\ldots,{\bf G}_{{\bf n}-{\bf 1}})\!=\!\left[\matrix{{\bf g}_{\bf 0}^{({\bf 0})}&{\bf g}_{\bf 1}^{({\bf 0})}&\cdots&{\bf g}_{{\bf n}-{\bf 1}}^{({\bf 0})}\cr{\bf g}_{\bf 0}^{({\bf 1})}&{\bf g}_{\bf 1}^{({\bf 1})}&\cdots&{\bf g}_{{\bf n}-{\bf 1}}^{({\bf 1})}\cr\vdots&\vdots&\ddots&\vdots\cr{\bf g}_{\bf 0}^{({\bf k}-{\bf 1})}&{\bf g}_{\bf 1}^{({\bf k}-{\bf 1})}&\cdots&{\bf g}_{{\bf n}-{\bf 1}}^{({\bf k}-{\bf 1})}}\right],\eqno{\hbox{(17)}}$$ where TeX Source $${\bf g}_{\bf j}^{({\bf i})}=\left(g_{j}^{(i)}(0),g_{j}^{(i)}(1),\ldots,g_{j}^{(i)}(m-1)\right).\eqno{\hbox{(18)}}$$ Supposing the bit sequence at the $i$th input of the convolutional encoder is ${\bf x}_{\bf i}=(x_{i,0},x_{i,1},\ldots)$, the bit sequence at the $j$th output is given by TeX Source $${\bf c}_{\bf j}={\bf x}_{\bf 0}\ast{\bf g}_{\bf j}^{({\bf 0})}\oplus{\bf x}_{\bf 1}\ast{\bf g}_{\bf j}^{({\bf 1})}\oplus\cdots\oplus{\bf x}_{{\bf k}-{\bf 1}}\ast{\bf g}_{\bf j}^{({\bf k}-{\bf 1})}=\sum_{i=0}^{k-1}{\bf x}_{\bf i}\ast{\bf g}_{\bf j}^{({\bf i})},\eqno{\hbox{(19)}}$$

Fig. 4. Dot product of a dual word of a linear block code with the received bit sequence.

where ∗ is the convolution operation. Suppose the number of nonzero terms in ${\bf g}_{\bf j}^{({\bf i})}$ is $\mathtilde{L}_{i,j}$, then the bias of the whole encoded bit sequence, $\varepsilon_{cc}$, can be expressed as TeX Source $$\varepsilon_{cc}={1\over kn}\sum_{i=0}^{k-1}\sum_{j=0}^{n-1}{1\over 2}(2\varepsilon)^{\mathtilde{L}_{i,j}}.\eqno{\hbox{(20)}}$$

TABLE II BIAS AFTER SOME RATE 1/2 CONVOLUTIONAL ENCODERS

To verify (20), the bias existing in the bit sequences after some optimum rate 1/2 convolutional code encoders [17] are obtained by computer simulations and results are shown in Table II. In each simulation, a bit sequence which contains 1,000,000 information bits is input into a convolutional encoder and the simulation is repeated 1000 times. The bias existing in the bit sequence before the encoder is assumed to be 0.1.

From Table II, it can again be observed that in general, the bias existing in the bit sequence after the sequence has passed through a convolutional encoder is very low as $\mathtilde{L}_{i,j}$ is normally $>2$.

SECTION IV

## RECONSTRUCTION OF THE SCRAMBLER AFTER A CHANNEL CODE

In the last section, our analysis shows that after passing through a channel encoder, the bias existing in the bit sequence drops, especially when convolutional codes are used. In this section, a novel scheme for reconstruction of the feedback polynomial and initial state of the LFSR in a scrambler which is placed after a channel encoder is proposed. This scheme exploits the property of dual words instead of the bias existing in the encoded bit sequence. In the following, the reconstruction of the scrambler placed after a linear block code will be considered first and after that, the proposed scheme will be extended to the case of convolutional code.

### A. Reconstruction of the Scrambler After Linear Block Code

#### 1) Reconstruction of the Feedback Polynomial of the LFSR

Consider a $(n,k)$ binary linear block code ${\cal C}$ with $k\times n$ generator matrix ${\bf G}_{\bf b}$. Rows in ${\bf G}_{\bf b}$ form a basis for ${\cal C}$. The parity-check matrix for ${\cal C}$ is a $(n-k)\times n$ matrix ${\bf H}_{\bf b}$ whose rows span the dual code ${\cal C}^{\perp}$, i.e., TeX Source $${\bf H}_{\bf b}=\left[\matrix{h_{0,0}&h_{0,1}&\cdots&h_{0,n-1}\cr h_{1,0}&h_{1,1}&\cdots&h_{1,n-1}\cr\vdots&&&\cr h_{n-k-1,0}&h_{n-k-1,1}&\cdots&h_{n-k-1,n-1}}\right]\eqno{\hbox{(21)}}$$ and ${\bf G}_{\bf b}\cdot{\bf H}_{\bf b}^{T}=0$. ${\bf h}_{\bf 0}$, ${\bf h}_{\bf 1},\ldots,{\bf h}_{{\bf n}-{\bf k}-{\bf 1}}$ denote rows $0,1,\ldots,n-k-1$ in ${\bf H}_{\bf b}$ and they are called dual words of ${\cal C}$.

To use the property of dual words to reconstruct the feedback polynomial of the LFSR, firstly, the received bit sequence ${\bf y}=(y_{0},y_{1},\ldots)$ is divided into blocks ${\bf y}_{\bf 0},{\bf y}_{\bf 1},\ldots$, with each block containing $n$ bits, i.e., ${\bf y}_{\bf t}=(y_{nt},y_{nt+1},\ldots,y_{(n+1)t-1})$. Then, a new sequence ${\bf r}=(r_{0},r_{1},\ldots)$ can be generated, in which each bit $r_{t}$ is the dot product of ${\bf y}_{\bf t}$ with a dual word, say ${\bf h}_{\bf 0}$, as shown in Fig. 4.

From Fig. 4, it can be seen that TeX Source \eqalignno{r_{0}=&\,{\bf y}_{\bf 0}\cdot{\bf h}_{\bf 0}=\sum_{i=0}^{n-1}y_{i}\cdot h_{0,i}\cr=&\,y_{0}\cdot h_{0,0}\oplus y_{1}\cdot h_{0,1}\oplus\cdots\oplus y_{n-1}\cdot h_{0,n-1}\cr r_{1}=&\,{\bf y}_{\bf 1}\cdot{\bf h}_{\bf 0}=\sum_{i=0}^{n-1}y_{n+i}\cdot h_{0,i}\cr=&\,y_{n}\cdot h_{0,0}\oplus y_{n+1}\cdot h_{0,1}\oplus\cdots\oplus y_{2n-1}\cdot h_{0,n-1}\cr&{\hskip-15pt}\vdots.&\hbox{(22)}}

As ${\bf y}_{\bf t}={\bf c}_{\bf t}\oplus{\bf s}_{\bf t}$, $(t=0,1,2\ldots)$, where ${\bf c}_{\bf t}$ is the $n$-tuple codeword at time index $t$ and ${\bf s}_{\bf t}=(s_{nt},s_{nt+1},\ldots,s_{n(t+1)-1})$ are the outputs of the scrambler, we have TeX Source $$r_{t}={\bf y}_{\bf t}\cdot{\bf h}_{\bf 0}={\bf c}_{\bf t}\cdot{\bf h}_{\bf 0}\oplus{\bf s}_{\bf t}\cdot{\bf h}_{\bf 0}.\eqno{\hbox{(23)}}$$ According to the property of dual words, ${\bf c}_{\bf t}\cdot{\bf h}_{\bf 0}=0$; therefore, $r_{t}$ can be written as TeX Source $$r_{t}={\bf y}_{\bf t}\cdot{\bf h}_{\bf 0}={\bf s}_{\bf t}\cdot{\bf h}_{\bf 0},\eqno{\hbox{(24)}}$$ i.e., TeX Source \eqalignno{r_{0}=&\,s_{0}\cdot h_{0,0}\oplus s_{1}\cdot h_{0,1}\oplus\cdots\oplus s_{n-1}\cdot h_{0,n-1}\cr r_{1}=&\,s_{n}\cdot h_{0,0}\oplus s_{n+1}\cdot h_{0,1}\oplus\cdots\oplus s_{2n-1}\cdot h_{0,n-1}\cr&{\hskip-15pt}\vdots.&\hbox{(25)}}

#### Proposition 1

For a set of $d-1$ integers $(0<i_{1}<i_{2}<\cdots<i_{d-1})$, if $r_{t}\oplus r_{t-i_{1}}\oplus r_{t-i_{2}}\oplus\cdots\oplus r_{t-i_{d-1}}\equiv 0$ for any $t\geq i_{d-1}$, then $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is a multiple of the feedback polynomial $P(X)$.

##### Proof

According to (23), $r_{t}$ can be written as TeX Source $$r_{t}=s_{nt}\cdot h_{0,0}\oplus s_{nt+1}\cdot h_{0,1}\oplus\cdots\oplus s_{n(t+1)-1}\cdot h_{0,n-1}.\eqno{\hbox{(26)}}$$ Similarly, TeX Source \eqalignno{r_{t-i_{1}}=&\,s_{n(t-i_{1})}\cdot h_{0,0}\oplus s_{n(t-i_{1})+1}\cdot h_{0,1}\oplus\cr&\cdots\oplus s_{n(t-i_{1}+1)-1}\cdot h_{0,n-1}\cr&{\hskip-15pt}\vdots\cr r_{t-i_{d-1}}=&\,s_{n(t-i_{d-1})}\cdot h_{0,0}\oplus s_{n(t-i_{d-1})+1}\cdot h_{0,1}\oplus\cr&\cdots\oplus s_{n(t-i_{d-1}+1)-1}\cdot h_{0,n-1}.&\hbox{(27)}} Therefore, TeX Source \eqalignno{&r_{t}\!\oplus\!r_{t-i_{1}}\!\oplus\!r_{t-i_{2}}\!\oplus\!\cdots\!\oplus\!r_{t-i_{d-1}}\cr&\quad\!=\!\left(s_{nt}\!\oplus\!s_{nt-n{i_{1}}}\!\oplus\!\cdots\!\oplus\!s_{nt-ni_{d-1}}\right)\cdot h_{0,0}\cr&\qquad\oplus\left(s_{nt+1}\!\oplus\!s_{nt+1-n{i_{1}}}\!\oplus\!\cdots\!\oplus\!s_{nt+1-ni_{d-1}}\right)\cdot h_{0,1}\!\oplus\!\cdots\cr&\qquad\oplus\left(s_{n(t+1)-1}\!\oplus\!s_{n(t+1)-1-ni_{1}}\!\oplus\!\cdots\!\oplus\!s_{n(t+1)-1-ni_{d-1}}\right)\cr&\qquad\cdot h_{0,n-1}.&\hbox{(28)}}

As ${\bf h}_{\bf 0}$ is a dual word, $h_{0,0},h_{0,1},\ldots,h_{0,n-1}$ cannot be all 0. Therefore, $r_{t}\oplus r_{t-i_{1}}\oplus r_{t-i_{2}}\oplus\cdots\oplus r_{t-i_{d-1}}\equiv 0$ only holds when $s_{k}\oplus s_{k-n{i_{1}}}\oplus\cdots\oplus s_{k-ni_{d-1}}\equiv 0$, i.e., $s_{k}\equiv s_{k-n{i_{1}}}\oplus\cdots\oplus s_{k-ni_{d-1}}$. It means $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is a multiple of the feedback polynomial $P(X)$.$\hfill\square$

It is interesting to note that since the encoded bits are removed according to (24), the sequence ${\bf r}$ can be taken as a combination of some $n$th decimated sequences of the original sequence produced by the LFSR. Some properties of such a decimated sequence have been found in [19]. Actually, proposition 1 can also be proved by using properties of the decimated sequence proposed in [19].

From Proposition 1, it can be observed that when the sequence ${\bf r}$ is obtained, Cluzeau's algorithm, with only minor changes, can be applied to ${\bf r}$ to find the feedback polynomial of the LFSR. In the following, the scheme to determine the feedback polynomial of the LFSR in a scrambler placed after a channel encoder is described:

1. Divide the received bit sequence ${\bf y}=(y_{0},y_{1},\ldots)$ into blocks ${\bf y}_{\bf 0},{\bf y}_{\bf 1},\ldots$, with each block containing $n$ bits.
2. Generate a new bit sequence ${\bf r}$, in which each bit $r_{t}$ is the dot product of the received block with a dual word.
3. For $(i_{1},\ldots,i_{d-1})$, $0<i_{1}<\ldots<i_{d-1}\leq D$, compute the number of bits in ${\bf r}$, $N_{r}$, required for the summation of $\mathtilde{Z}$. How to compute $N_{r}$ will be described later. Let $N_{c}=i_{d-1}+N_{r}$.
4. Initialize $\mathtilde{Z}$ with $\mathtilde{Z}=0$.
5. For $t$ varying from $i_{d-1}+1$ to $N_{c}$, compute TeX Source $$\mathtilde{z}_{t}=r_{t}\oplus\bigoplus_{j=1}^{d-1}r_{t-i_{j}}\eqno{\hbox{(29)}}$$ and TeX Source $$\mathtilde{Z}=\mathtilde{Z}+(-1)^{\mathtilde{z}_{t}}\eqno{\hbox{(30)}}$$
6. If $\mathtilde{Z}=N_{r}$, store $Q(X)=1+\sum_{j=1}^{d-1}X^{i_{j}\cdot n}$ in a table.
7. For $Q^{\prime}(X)\neq Q(X)$ in the table, compute the nontrivial greatest common divisor (gcd) of $(Q(X),Q^{\prime}(X))$.
Fig. 5. Distributions of $\mathtilde{Z}$.

Steps 1 to 4 are repeated until a ${\rm gcd}(Q(X),Q^{\prime}(X))=P(X)$ $(P(X)\neq 1)$ is found or all combinations of $(i_{1},\ldots,i_{d-1})$ are tested.

The scheme proposed above is based on the fact that if $Q(X)=1+\sum_{j=1}^{d-1}X^{i_{j}\cdot n}$ is a multiple of the feedback polynomial, $\mathtilde{z}_{t}$ will always be 0 for $t$ varying from $i_{d-1}+1$ to $N_{c}$, and therefore, the value of $\mathtilde{Z}$ should be $N_{c}-i_{d-1}=N_{r}$. If $Q(X)=1+\sum_{j=1}^{d-1}X^{i_{j}\cdot n}$ is not a multiple of the feedback polynomial, Pr $(\mathtilde{z}_{t}=1)=0.5$ and $\mathtilde{Z}$ will be Gaussian distributed with the mean value 0 and the variance $N_{r}$. The distribution of $\mathtilde{Z}$ is shown in Fig. 5.

Similar to Cluzeau's algorithm, the number of bits in ${\bf r}$ used in the summation of $\mathtilde{Z},N_{r}$, will affect the false-alarm probability $P_{f}$ and nondetection probability $P_{n}$. As shown in Fig. 5, the value of $\mathtilde{Z}$ is always equal to $N_{r}$ when $Q(X)$ is a multiple of $P(X)$. That means $P_{n}=0$ when the proposed scheme is used. The false-alarm can happen only when $\mathtilde{Z}=N_{r}$ but $Q(X)$ is not a multiple of $P(X)$, and the probability is given by TeX Source \eqalignno{P_{f}\!=\!&\,{\rm Pr}\left(\mathtilde{Z}\!=\!N_{r}\vert{\rm when}\ Q(X)\ {\rm is}\ {\rm not}\ {\rm a}\ {\rm multiple}\ {\rm of}\ P(X)\right)\cr\!=\!&\,{1\over\sqrt{2\pi\sigma^{2}}}e^{-{(N_{r}-\mu)^{2}\over 2\sigma^{2}}}\vert_{\mu\!=\!0,\sigma^{2}\!=\!N_{r}}\cr\!=\!&\,{1\over\sqrt{2\pi N_{r}}}e^{-{N_{r}\over 2}}.&\hbox{(31)}}

TABLE III SIMULATION RESULTS FOR RECONSTRUCTION OF SCRAMBLERS PLACED AFTER LINEAR BLOCK CODES

It can be observed that a small value of $N_{r}$, say 50, can already make $P_{f}<10^{-10}$. The total number of bits in ${\bf r}$ used in the reconstruction is $i_{d-1}+N_{r}$. According to (22) and Fig. 4, each bit in ${\bf r}$ is a dot product of a dual word with a received block consisting of $n$ bits. Therefore, the total number of bits required by the proposed scheme is TeX Source $$N_{c}=(i_{d-1}+N_{r})n\approx(i_{d-1}+50)n.\eqno{\hbox{(32)}}$$

Comparing (32) with (9), it can be observed that the number of bits required to do the reconstruction by the proposed algorithm does not depend on the bias $\varepsilon$ anymore. Obviously, when $\varepsilon$ is small, it is most probably that $N_{c}<N$. To show this fact clearer, the proposed algorithm is applied to reconstruct some feedback polynomials of LFSR in synchronous scramblers placed after different linear block codes. The number of bits required by the proposed algorithm $(N_{c})$ are shown in Table III. The number of bits required by Cluzeau's algorithm $(N)$ are also shown in Table III for comparison. In the simulation, it is assumed that the bias existing in the bit sequence before the block encoder is 0.1 and $d=3$. For Cluzeau's algorithm, it is assumed that $P_{f}=10^{-7}$ and $P_{n}=10^{-5}$. For the proposed algorithm, it is assumed that $N_{r}=50$, which will lead to $P_{n}=0$ and $P_{f}<10^{-10}$.

From Table III, it can be observed that the number of bits required by the proposed algorithm to do the reconstruction is much lower than that required by Cluzeau's algorithm, especially when Hamming (7,4) code is used. This is because the property of the dual word is exploited by the proposed algorithm instead of the bias in the encoded bit sequence. Since the code rate of Hamming (7,4) code is the lowest among the 3 types of codes shown in Table III, the bias existing in the encoded bit sequence is also the lowest, and the number of bits required to do the reconstruction is the longest when Cluzeau's algorithm is used.

It should be noted that in Table III, the gcd of the two detected multiples is normally not the feedback polynomial but a multiple of the feedback polynomial. Suppose the gcd of the two detected multiples is $F(X)$. To find the correct feedback polynomial, $F(X)$ is firstly factorized. The correct feedback polynomial can then be found by descrambling the bit sequence by using each polynomial factor of $F(X)$ respectively, and see which one would lead to a descrambled bit sequence that satisfies the condition that the dot product of each codeword in the sequence with the dual words ${\bf h}_{\bf i}$, $i=0,1\ldots,n-k-1$ equals to 0. For example, the first two detected multiples in Table III are $x^{112}+x^{7}+1$ and $x^{266}+x^{245}+1$. Their gcd is $x^{56}+x^{42}+x^{35}+x^{21}+1$, which is the product of 3 polynomial factors $x^{24}+x^{20}+\cdots+1$, $x^{24}+x^{19}+\cdots+1$ and $x^{8}+x^{4}+x^{3}+x^{2}+1$. After descrambling the bit sequence by each polynomial factor, it is found that only $x^{8}+x^{4}+x^{3}+x^{2}+1$ leads to a sensible descrambled sequence. Hence, it is the correct feedback polynomial.

#### 2) Reconstruction of the Initial State of the LFSR

After the feedback polynomial of the LFSR is determined, to descramble the received bit sequence, the initial state of the LFSR needs also to be recovered. In the following, a scheme to determine the initial state of the LFSR is described. This scheme is similar to the scheme proposed in [4], which also uses the encoder redundancy to determine the initial state of the LFSR.

Suppose the feedback polynomial of the LFSR is denoted by $P(X)=1+a_{1}X+a_{2}X^{2}+\cdots+a_{L}X^{L}$, where $L$ is the degree of the feedback polynomial and $a_{i}\in\{0,1\}$, then the output of the LFSR at time index $t$ is TeX Source $$s_{t}=\sum_{i=1}^{L}a_{i}s_{t-i}.\eqno{\hbox{(33)}}$$ Suppose the state of the LFSR at time index $t$ is TeX Source $${\bf S}_{t}=(s_{t}\quad s_{t+1}\quad s_{t+2}\quad\ldots\quad s_{t+L-1})^{T}\eqno{\hbox{(34)}}$$ and a transition matrix $F$ is defined as TeX Source $$F=\left(\matrix{0&1&0&\ldots&0&0\cr 0&0&1&\ldots&0&0\cr\vdots&\vdots&\vdots&\ddots&\vdots&\vdots\cr 0&0&0&\ldots&0&1\cr 1&a_{L-1}&a_{L-2}&\ldots&a_{2}&a_{1}}\right).\eqno{\hbox{(35)}}$$ According to (33) and the property of the LFSR, the LFSR state at time index $t+i$, $(i=0,1,2,\ldots)$ can be written as TeX Source $${\bf S}_{t+i}=F^{i}\cdot{\bf S}_{t}.\eqno{\hbox{(36)}}$$ Let the $1\times L$ array $U$ be defined as TeX Source $$U=(1\quad 0\quad 0\quad\cdots\quad 0),\eqno{\hbox{(37)}}$$$s_{t}$ can then be calculated by TeX Source $$s_{t}=U\cdot{\bf S}_{t}=U\cdot F^{t}\cdot{\bf S}_{0}.\eqno{\hbox{(38)}}$$ According to (26) and (38), $r_{0}$ can be rewritten as TeX Source \eqalignno{{\hskip-20pt}r_{0}\!=\!&\,U\!\cdot\!{\bf S}_{0}\!\cdot\!h_{0,0}\oplus U\!\cdot\!{\bf S}_{1}\!\cdot\!h_{0,1}\oplus\cdots\oplus U\!\cdot\!{\bf S}_{n-1}\!\cdot\!h_{0,n-1}\cr{\hskip-20pt}=&\,U\!\cdot\!(I_{L}\!\cdot\!h_{0,0}\oplus F\!\cdot\!h_{0,1}\oplus\cdots\oplus F^{n-1}\!\cdot\!h_{0,n-1})\!\cdot\!{\bf S}_{0}&\hbox{(39)}} where $I_{L}$ is a $L\times L$ identity matrix. Similarly, $r_{1},r_{2},\ldots,r_{t}$ can be rewritten as TeX Source \eqalignno{r_{1}\!=\!&\,U\cdot(I_{L}\cdot h_{0,0}\!\oplus\!F\cdot h_{0,1}\!\oplus\!\cdots\!\oplus\!F^{n-1}\cdot h_{0,n-1})\cdot F^{n}\cdot{\bf S}_{0}\cr r_{2}\!=\!&\,U\cdot(I_{L}\cdot h_{0,0}\!\oplus\!F\cdot h_{0,1}\!\oplus\!\cdots\!\oplus\!F^{n-1}\cdot h_{0,n-1})\cdot F^{2n}\cdot{\bf S}_{0}\cr&{\hskip-15pt}\vdots\cr r_{t}\!=\!&\,U\cdot(I_{L}\cdot h_{0,0}\!\oplus\!F\cdot h_{0,1}\!\oplus\!\cdots\!\oplus\!F^{n-1}\cdot h_{0,n-1})\cr&\cdot F^{tn}\cdot{\bf S}_{0}.&\hbox{(40)}} Suppose $G$ is a $L\times L$ matrix that is given by TeX Source $$G\!=\!\!\left(\matrix{U\cdot(I_{L}\cdot h_{0,0}\oplus F\cdot h_{0,1}\cdots\oplus F^{n-1}\cdot h_{0,n-1})\cdot F^{n}\cr U\cdot(I_{L}\cdot h_{0,0}\oplus F\cdot h_{0,1}\cdots\oplus F^{n-1}\cdot h_{0,n-1})\cdot F^{2n}\cr\vdots\cr U\cdot(I_{L}\cdot h_{0,0}\oplus F\cdot h_{0,1}\cdots\oplus F^{n-1}\cdot h_{0,n-1})\cdot F^{Ln}}\right)\!.\eqno{\hbox{(41)}}$$ Then the initial state ${\bf S}_{0}$ can be calculated by TeX Source $${\bf S}_{0}=G^{-1}\cdot(r_{0}\quad r_{1}\quad\cdots\quad r_{L-1})^{T}.\eqno{\hbox{(42)}}$$

In many cases, there are more than one dual word for an error correcting code. According to (41), for the same feedback polynomial and different dual words, the matrices $G$ are different. For each $G$ and vector $(r_{0}\ r_{1}\ \cdots\ r_{L-1})$, an initial state ${\bf S}_{0}$ can be obtained by using (42). Obviously, if the feedback polynomial is the true feedback polynomial of the LFSR, ${\bf S}_{0}$ obtained from (42) are the same no matter which dual word is used. Otherwise, ${\bf S}_{0}$ obtained from different dual words are most likely to be different. This property can be used to determine the correct feedback polynomial of the LFSR without descrambling the bit sequence.

### B. Reconstruction of the Scrambler After a Convolutional Code

Similar to linear block code, the generator matrix ${\bf G}_{\bf c}$ of a $(n,k,m)$ convolutional code generates a vector space of dimension $k$ over the finite field $GF(2)$. This vector space has an orthogonal space of dimension $n-k$ and any element $({\bf h}_{{\bf c},{\bf 0}},{\bf h}_{{\bf c},{\bf 1}},\ldots,{\bf h}_{{\bf c},{\bf n}-{\bf 1}})$ in this space satisfies the property: $\sum_{j=0}^{n-1}{\bf g}_{j}^{(i)}\ast{\bf h}_{{\bf c},{\bf j}}=0$ $\forall i\in[0,k-1]$. $({\bf h}_{{\bf c},{\bf 0}},{\bf h}_{{\bf c},{\bf 1}},\ldots,{\bf h}_{{\bf c},{\bf n}-{\bf 1}})$ can therefore be “translated” into a “dual word”. Suppose ${\bf h}_{{\bf c},{\bf j}}=(h_{c,j}^{0},h_{c,j}^{1},\ldots,h_{c,j}^{\mathtilde{m}-1})$ where $h_{c,j}^{i}=(0\ \hbox{or}\ 1)$. The binary vector TeX Source $${\bf h}_{\bf c}=\left(h_{c,0}^{\mathtilde{m}-1},\ldots,h_{c,n-1}^{\mathtilde{m}-1},\ldots,h_{c,0}^{0},\ldots,h_{c,n-1}^{0}\right)$$ of length $n\times\mathtilde{m}$ will be the corresponding dual word.

After the dual word is obtained, the rest of the steps for reconstruction of the feedback polynomial and initial state of the LFSR are the same as those used for the linear block code. The only difference is that the received bit sequence is not divided into blocks. In fact, the dual word will be orthogonal to any segment of $n\times\mathtilde{m}$ bits in the coded sequence, when the starting offset of the $n\times\mathtilde{m}$ bits is $n$ or a multiple of $n$. An example of the dot product of the dual word of a convolutional code with the received bit sequence is shown in Fig. 6.

Fig. 6. Dot product of a dual word of a convolutional code with the received bit sequence.

In Fig. 6, the convolutional code is a (2,1,5) convolutional code with generator matrix [11011 11001]. It is found that the dual word of the convolutional code is 1101001111. As shown in Fig. 6, $r_{t}$ is generated by making a dot product of the dual word with 10 bits in the coded sequence at time index $t$. For every increase of the time index $t$, the starting offset of the 10bits will be increased by $n=2\ \hbox{bits}$. To see the effect of the proposed algorithm clearer, it is used to reconstruct some feedback polynomials of LFSR in synchronous scramblers placed after different convolutional codes with optimum distance spectrum [18]. The multiples detected and the number of bits required by the proposed algorithm are shown in Table IV. The number of bits required by Cluzeau's algorithm are also shown in Table IV for comparison. The setting of parameters for the simulation are the same as before.

From Table IV, it can be observed that the reduction of the number of bits required to do the reconstruction is very significant. This is because firstly, as described previously, the bias existing in the bit sequence after the sequence has passed through a convolutional encoder is very low, and consequently $N$ is very big according to (9). Secondly, for convolutional code, the value of $n$ is usually very small (<10), and consequently $N_{c}$ is small according to (32). Therefore, the proposed scheme is the most suitable for convolutional code as the number of bits required by it to do the reconstruction is very small.

SECTION V

## RECONSTRUCTION OF SCRAMBLER WHEN CHANNEL NOISE IS PRESENT

In the previous sections, it is assumed that the channel is noiseless, i.e., there is no error in the received bit sequence. In practical situations, there is usually noise in the channel and some of the received bits will be wrong, as shown in Fig. 7. When channel errors are present, the dual words are no longer completely orthogonal to the received encoded bit sequence and the scheme proposed in Section IV cannot be applied directly.

TABLE IV SIMULATION RESULTS FOR RECONSTRUCTION OF SCRAMBLER PLACED AFTER CONVOLUTIONAL CODES
Fig. 7. Chain of scrambler, channel encoder, and channel.

Suppose the channel is modelled as a binary symmetric channel (BSC). The probabilities that the channel error $e$ is equal to 1 and 0 are $\hbox{Pr}(e=1)=p=0.5-\delta$ and $\hbox{Pr}(e=0)=1-p=0.5+\delta$ respectively. Let the $n$-tuple channel errors at time index $t$ be denoted by ${\bf e}_{\bf t}=(e_{nt},e_{nt+1},\ldots,e_{(n+1)t-1})$; the $n$-tuple received codeword with errors, ${\bf y}_{\bf t}^{\bf e}$, is given by TeX Source $${\bf y}_{\bf t}^{\bf e}={\bf y}_{\bf t}\oplus{\bf e}_{\bf t}.\eqno{\hbox{(43)}}$$ Since ${\bf y}_{\bf t}={\bf c}_{\bf t}\oplus{\bf s}_{\bf t}$, the dot product of the dual word ${\bf h}_{\bf 0}$ with the received bit sequence is given by TeX Source $$r_{t}^{e}={\bf y}_{\bf t}^{\bf e}\cdot{\bf h}_{\bf 0}^{T}={\bf c}_{\bf t}\cdot{\bf h}_{\bf 0}^{T}\oplus{\bf s}_{\bf t}\cdot{\bf h}_{\bf 0}^{T}\oplus{\bf e}_{\bf t}\cdot{\bf h}_{\bf 0}^{T}.\eqno{\hbox{(44)}}$$ According to the property of the dual word, we have ${\bf c}_{\bf t}\cdot{\bf h}_{\bf 0}^{T}=0$; therefore, TeX Source $$r_{t}^{e}=({\bf s}_{\bf t}\oplus{\bf e}_{\bf t})\cdot{\bf h}_{\bf 0}^{T},\eqno{\hbox{(45)}}$$ i.e., TeX Source \eqalignno{r_{0}^{e}=&\,(s_{0}\oplus e_{0})\cdot h_{0,0}\oplus(s_{1}\oplus e_{1})\cdot h_{0,1}\oplus\cr&\cdots\oplus(s_{n-1}\oplus e_{n-1})\cdot h_{0,n-1}\cr r_{1}^{e}=&\,(s_{n}\oplus e_{n})\cdot h_{0,0}\oplus(s_{n+1}\oplus e_{n+1})\cdot h_{0,1}\oplus\cr&\cdots\oplus(s_{2n-1}\oplus e_{2n-1})\cdot h_{0,n-1}\cr&{\hskip-15pt}\vdots&\hbox{(46)}}

#### Proposition 2

Suppose $\mathtilde{z}_{t}^{e}=r_{t}^{e}\oplus r_{t-i_{1}}^{e}\oplus r_{t-i_{2}}^{e}\oplus\cdots\oplus r_{t-i_{d-1}}^{e}$ $(t\geq i_{d-1})$. When $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is not a multiple of the feedback polynomial $P(X)$, $\hbox{Pr}(\mathtilde{z}_{t}^{e}=1)=1/2$. When $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is a multiple of $P(X)$, $\hbox{Pr}(\mathtilde{z}_{t}^{e}=1)\leq 1/2[1-(2\delta)^{wd}]$, where $w$ is the weight of the dual word and $\delta=0.5-p$ ($p$ is the channel crossover probability).

##### Proof

For linear block codes, $r_{t}^{e}$ can be written as TeX Source $$\displaylines{r_{t}^{e}=(s_{nt}\oplus e_{nt})\cdot h_{0,0}\oplus(s_{nt+1}\oplus e_{nt+1})\cdot h_{0,1}\oplus\hfill\cr\hfill\cdots\oplus\left(s_{n(t+1)-1}\oplus e_{n(t+1)-1}\right)\cdot h_{0,n-1}.\quad\hbox{(47)}}$$ Similarly, TeX Source \eqalignno{r_{t-i_{1}}^{e}=&\,\left(s_{n(t-i_{1})}\oplus e_{n(t-i_{1})}\right)\cr&\cdot h_{0,0}\oplus\left(s_{n(t-i_{1})+1}\oplus e_{n(t-i_{1})+1}\right)\cr&\cdot h_{0,1}\oplus\cdots\oplus\left(s_{n(t-i_{1}+1)-1}\oplus e_{n(t-i_{1}+1)-1}\right)\cr&\cdot h_{0,n-1},\cr&{\hskip-15pt}\vdots\cr r_{t-i_{d-1}}^{e}=&\,\left(s_{n(t-i_{d-1})}\oplus e_{n(t-i_{d-1})}\right)\cr&\cdot h_{0,0}\oplus\left(s_{n(t-i_{d-1})+1}\oplus e_{n(t-i_{d-1})+1}\right)\cr&\cdot h_{0,1}\oplus\cdots\oplus\left(s_{n(t-i_{d-1}+1)-1}\oplus e_{n(t-i_{d-1}+1)-1}\right)\cr&\cdot h_{0,n-1}.&\hbox{(48)}} Therefore, TeX Source \eqalignno{\mathtilde{z}_{t}^{e}=&\,r_{t}^{e}\oplus r_{t-i_{1}}^{e}\oplus r_{t-i_{2}}^{e}\oplus\cdots\oplus r_{t-i_{d-1}}^{e}\cr=&\,\left(s_{nt}\oplus s_{nt-ni_{1}}\oplus\cdots\oplus s_{nt-ni_{d-1}}\right)\cdot h_{0,0}\cr&\oplus\left(e_{nt}\oplus e_{nt-ni_{1}}\oplus\cdots\oplus e_{nt-ni_{d-1}}\right)\cdot h_{0,0}\cr&\oplus\left(s_{nt+1}\oplus s_{nt+1-ni_{1}}\oplus\cdots\oplus s_{nt+1-ni_{d-1}}\right)\cdot h_{0,1}\cr&\oplus\left(e_{nt+1}\oplus e_{nt+1-ni_{1}}\oplus\cdots\oplus e_{nt+1-ni_{d-1}}\right)\cdot h_{0,1}\cr&{\hskip-10pt}\vdots\cr&\oplus\left(s_{n(t+1)-1}\oplus\cdots\oplus s_{n(t+1)-1-ni_{d-1}}\right)\cdot h_{0,n-1}\cr&\oplus\left(e_{n(t+1)-1}\oplus\cdots\oplus e_{n(t+1)-1-ni_{d-1}}\right)\cdot h_{0,n-1}.&\hbox{(49)}}

According to the property of the LFSR, when $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is not a multiple of $P(X)$, and as $\hbox{Pr}(s_{t}=1)=1/2$, it is apparent that $\hbox{Pr}(\mathtilde{z}_{t}^{e}=1)=1/2$. When $1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is a multiple of $P(X)$, $s_{k}\oplus s_{k-n{i_{1}}}\oplus\cdots\oplus s_{k-ni_{d-1}}=0$ for any $k\geq ni_{d-1}$ and we have TeX Source \eqalignno{{\hskip-20pt}\mathtilde{z}_{t}^{e}=&\,\left(e_{nt}\oplus e_{nt-ni_{1}}\oplus\cdots\oplus e_{nt-ni_{d-1}}\right)\cdot h_{0,0}\cr{\hskip-20pt}&\oplus\left(e_{nt+1}\oplus\cdots\oplus e_{nt+1-ni_{d-1}}\right)\cdot h_{0,1}\oplus\cdots\cr{\hskip-20pt}&\oplus\left(e_{n(t+1)-1}\oplus\cdots\oplus e_{n(t+1)-1-ni_{d-1}}\right)\cdot h_{0,n-1}.&\hbox{(50)}}

In (50), $\mathtilde{z}_{t}^{e}$ is a modulo 2 summation of $wd$ channel errors $e$, where $w$ is the weight of the dual word. Similar to (14), it can be derived that TeX Source \eqalignno{{\rm Pr}\left(\mathtilde{z}_{t}^{e}=1\right)=&\,\sum_{l=1,3,\ldots}^{wd}{wd\choose l}\left({1\over 2}-\delta\right)^{l}\left({1\over 2}+\delta\right)^{wd-l}\cr=&\,{1\over 2}\left[1-(2\delta)^{wd}\right].&\hbox{(51)}}

For convolutional codes, similarly, $\mathtilde{z}_{t}^{e}$ is a modulo 2 summation of $wd$ channel errors $e$. However, according to Fig. 6, some of the channel errors might be overlapped; therefore, we have TeX Source $${\rm Pr}\left(\mathtilde{z}_{t}^{e}=1\right)\leq{1\over 2}\left[1-(2\delta)^{wd}\right].\eqno{\hbox{(52)}}$$$\hfill\square$

Suppose $\mathtilde{Z}_{e}=\sum_{t=i_{d-1}+1}^{i_{d-1}+N_{r}^{e}}\mathtilde{z}^{e}_{t}$, where $N_{r}^{e}$ is the number of bits in ${\bf r}$ required for the reconstruction when noise is present. According to Proposition 2 and the scheme described in Section IV, when $Q(X)=1+X^{ni_{1}}+X^{ni_{2}}+\cdots+X^{ni_{d-1}}$ is not a multiple of $P(X)$, $\mathtilde{Z}_{e}$ is Gaussian distributed with the mean value 0 and variance $N_{r}^{e}$. Similar to the derivation of the distribution of $Z$ [5], when $Q(X)$ is a multiple of $P(X)$, it can be derived that $\mathtilde{Z}_{e}$ is Gaussian distributed with the mean value $\mu_{e}=N_{r}^{e}(2\delta)^{wd}$ and variance $\sigma_{e}^{2}\leq N_{r}^{e}[1+d((2\delta)^{2w}-(2\delta)^{2wd})]$. Therefore, the algorithm proposed in Section IV can still be used with a minor change in Step 4, i.e., a threshold $T_{e}$ can be used to determine whether $Q(X)$ is a multiple of the feedback polynomial. Similar to Cluzeau's algorithm described in Section II, when the false-alarm probability $P_{f}$ and the nondetection probability $P_{n}$ are given, the threshold $T_{e}$ can be determined by TeX Source $$T_{e}={a_{e}^{2}+a_{e}b_{e}\sqrt{1+d\left((2\delta)^{2w}-(2\delta)^{2wd}\right)}\over(2\delta)^{wd}}\eqno{\hbox{(53)}}$$ where TeX Source $$a_{e}=\Phi^{-1}(1-P_{f})={T_{e}\over\sqrt{N_{r}^{e}}}\eqno{\hbox{(54)}}$$ and TeX Source $$-b_{e}=\Phi^{-1}(P_{n})={T_{e}-\mu_{e}\over\sigma_{e}}.\eqno{\hbox{(55)}}$$ From (54) and (55), it can be derived that the total number of bits $N_{c}^{e}$ used in the reconstruction is given by TeX Source \eqalignno{{\hskip-20pt}N_{c}^{e}\!\!=\!&\,n\left(i_{d-1}\!+\!N_{r}^{e}\right)\cr{\hskip-20pt}\!\!=\!&\,n\left(i_{d-1}\!+\!{\left(a_{e}\!+\!b_{e}\sqrt{1\!+\!d\left((2\delta)^{2w}\!-\!(2\delta)^{2wd}\right)}\right)^{2}\over(2\delta)^{2wd}}\right)\!.&\hbox{(56)}}

In Figs. 8 and 9, the numbers of bits required for reconstruction when channel noise is present are shown for different error correcting codes and channel error probabilities. It is assumed that $d=3$, $P_{f}=10^{-7}$ and $P_{n}=10^{-5}$. The feedback polynomial is assumed to be $x^{8}+x^{4}+x^{3}+x^{2}+1$.

Fig. 8. Number of bits required for reconstruction when linear block codes are used and channel noise is present.
Fig. 9. Number of bits required for reconstruction when convolutional codes are used and channel noise is present.

From Figs. 8 and 9, it can be observed that the number of bits required to do the reconstruction when channel noise is present is larger, as compared with that required in a noiseless condition. The larger the channel error probability, the larger the number of bits required to do the reconstruction. Another factor which affects the number of bits for the reconstruction is the dual word weight $w$. Obviously, with the increase of $w$, the number of bits required will increase accordingly, especially when the channel error probability is large. Therefore, for the same error correcting code, the dual word of minimum weight $w$ is the best choice for the reconstruction.

In practical situations, the number of bits available for reconstruction is usually limited. In that case, the false-alarm probability or the nondetection probability will be affected. Suppose the number of bits in ${\bf r}$ available for reconstruction is $\bar{N}_{r}^{e}$ and the false-alarm probability is determined in advance, i.e., $a_{e}$ is determined in advance. The threshold $\bar{T}_{e}$ is then given by TeX Source $$\bar{T}_{e}=a_{e}\cdot\sqrt{\bar{N}_{r}^{e}}\eqno{\hbox{(57)}}$$ and the nondetection probability $\bar{P}_{n}$ can then be calculated by TeX Source \eqalignno{\bar{P}_{n}=&\,\Phi\left({\bar{T}_{e}-\mu_{e}\over\sigma_{e}}\right)\cr\approx&\,\Phi\left({a_{e}\cdot\sqrt{\bar{N}_{r}^{e}}-\bar{N}_{r}^{e}(2\delta)^{wd}\over\sqrt{\bar{N}_{r}^{e}\left[1+d\left((2\delta)^{2w}-(2\delta)^{2wd}\right)\right]}}\right).&\hbox{(58)}}

In Fig. 10, the nondetection probabilities versus different number of bits available for reconstruction are plotted. It is assumed that $d=3$, $P_{f}=10^{-7}$ and the feedback polynomial is $x^{8}+x^{4}+x^{3}+x^{2}+1$.

For recovering the initial state of the LFSR when noise is present, some known techniques, such as those proposed in [20], [21], can be used.

SECTION VI

## CONCLUSION

In this paper, the problem of reconstruction of the LFSR in a linear scrambler placed after a channel encoder is studied. The existing algorithm, i.e., Cluzeau's algorithm, is very promising in reconstructing the feedback polynomial based on the assumption that the source bits are biasedly distributed.

Fig. 10. Nondetection probabilities versus the number of bits available for reconstruction.

However, after passing through a channel encoder, the bias (relative numbers of 1 s and 0 s) in the bit sequence drops, especially when a convolutional code is used, and the number of bits required by Cluzeau's algorithm will become exorbitantly large. In this paper, a new scheme which, instead of relying on the bias in the bit sequence, uses the orthogonality between the dual words and codewords generated by the channel encoder is studied. Our analysis shows that by using this proposed scheme, the feedback polynomial can be reconstructed much faster, as the number of bits required to do the reconstruction is reduced greatly, especially when convolutional codes are used as the error correcting codes. When channel noise is added, the above scheme can still be used to perform reconstruction, as long as the number of bits used to do the reconstruction is increased accordingly. It is noted that the larger the channel error probability, the larger the number of bits required to do the reconstruction.

Based on the above results, it is clear that scrambling the source bits before applying the FEC offers better protection against scrambler reconstruction when all else being equal.

Secondly, it has been shown that for a linear block code, the bias of the binary bits stream before scrambling can be approximated by the product of the bias of the source bits and the code rate (16). For convolutional encoder, the resultant bias is much lower (20). However, using dual words of the encoder, our results show that a convolutional code-linear scrambler pair is a much weaker pair compared with a linear block code-linear scrambler pair. This is because any shift of a multiple of $n$ bits of a dual word is orthogonal to the coded sequence, and for most practical convolutional code, $n$ is typically a small number.

The work presented in this paper is focused on determining the scrambler polynomial assuming dual word is known and word synchronization has been achieved a priori. A more challenging reconstruction problem would be to reconstruct both the code and the scrambler at the same time. One possible solution to this problem is to incorporate a scheme which recovers the code's length and achieves synchronization without considering the scrambler, such as schemes proposed in [10], [11] into the scheme proposed in this paper. For example, for a short linear block code or a convolutional code, an exhaustive search can be used to test all possible dual words and generate all possible ${\bf r}$. Obviously, after applying the scheme proposed in Section IV-A to ${\bf r}$, in noiseless case, only the ${\bf r}$ generated by the correct dual word will lead to two different distributions of $Z$ as shown in Fig. 5. In a noisy condition, the situation is similar. For longer block codes, more sophisticated schemes need to be used for recovering both the code and the scrambler at the same time. Finally, the weight of the dual word plays a key part in the reconstruction, as low weight dual words are easier to be found and in noisy condition, low weight dual words lead to fewer bits required for the reconstruction. Therefore, one might consider using error correcting codes which do not have low weight dual words. How to find such codes is also an interesting topic for future work.

## Footnotes

The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Y.-W. Peter Hong.

X.-B. Liu and S. N. Koh are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.

C.-C. Chui is with the Temasek Laboratories at Nanyang Technological University, Singapore 639798.

X.-W. Wu is with the School of Information and Communication Technology, Griffith University, Gold Coast, QLD 4222, Australia.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available