A Robust Semantic Text Communication System

Semantic communication is increasingly viewed as a promising solution to improve the transmission efficiency. However, semantic communications are susceptible not only to physical channel impairments, but also to semantic impairments, which degrade semantic understanding at the receiver and disrupt the associated downstream tasks. Hence, we focus our attention on the robustness of semantic communications against semantic impairments. Specifically, we first categorize textual semantic impairments into three categories based on their sources. Then, we propose a robust deep learning enabled semantic communication system (R-DeepSC) by introducing a semantic corrector for robust semantic encoding so as to facilitate semantic transmission. Moreover, we develop a non-autoregressive version of R-DeepSC, namely NA-RDeepSC, which offers improved inference speed by relying on a non-autoregressive architecture and an adaptive generator embedded into the semantic decoder. NA-RDeepSC performs semantic decoding in parallel, hence reducing the decoding complexity from $O(n)$ to $O({1})$ with a comparable performance to that of R-DeepSC. Our experimental results demonstrate the superior robustness of the proposed R-DeepSC and NA-RDeepSC architectures in eliminating semantic impairments, hence highlighting the significance of this work in advancing the development of robust semantic communications.

space' for narrowing the semantic discrepancy between the transmitter and the receiver, rather than minimizing the classic symbol error rate [2].Explicitly, the transmitted content typically represents task-oriented features conveying semantics.Consequently, the optimization objective of semantic communications is no longer the classic bit error rate or symbol error rate, but the fidelity of the semantic information at the receiver.This optimization objective implies that semantic communications are most suitable for scenarios involving either human-to-machine or machine-to-machine communications.
Apart from text transmission, similar joint semantic communication designs have also been proposed for diverse other applications.For instance, Weng and Qin [6] developed a semantic speech communication system, while Xie et al. [7] designed a bespoke task-oriented multiuser system.Moreover, researchers have also developed various architectures for image and video transmission.More particularly, Huang et al. [8] designed a Generative Adversarial Network-based encoder for image coding relying on adaptive bandwidth allocation.Zhang et al. [9] exploited a deep reinforcement learning-based resource allocation scheme to reduce the transmission delay.Zhang et al. [10] proposed a semantic communication system for flexible code rate optimization to achieve bandwidth efficiency while maintaining transmission quality.Qin et al. [11], [12] presented a computing network enbaled semantic communication system for optimizing the computing resources and a generalized semantic communication framework for leveraging the semantics from source and wireless channels.Jiang et al. [13] developed a semantic communication system for video conferencing over hostile time-varing channels.Hanzo et al. [14] conceived a model-based parametric semantic coding enhancement technique to improve subjective quality and to harness the limited communication resources by prioritizing semantic regions.Xie et al. [15] devised a semantic communication system incorporating a memory module for conducting scenario question answering.
Although the aforementioned contributions have succeeded in expanding the range of tasks that semantic communications can perform, the study of their robustness against transmission impairments is still in its infancy.Specifically, the robustness of semantic communications is affected by a pair of impairments.On the one hand, the transmitted signals are corrupted by the inevitable physical channel impairments, such as pathloss, slow and fast fading, dispersion, as well as the noise.These impairments can be mitigated by channel equalization [16] and channel coding [17], while relying on channel estimation [18], which have been extensively investigated.
On the other hand, semantic communications are also contaminated by semantic impairments causing semantic mismatch between the transmitter and the receiver [19].Fig. 1 illustrates the concept of semantic impairments, which degrade the integrity of semantic communication systems, namely adversarial semantic contamination and literal semantic impairments [20].Adversarial semantic contamination degrades the integrity of semantic communications through semantic channels, while a literal semantic impairment is imposed by corrupted source data.Despite the potentially grave impact of both types of semantic degradations, they have not been extensively studied, hence highlighting the pressing need for further research in this area.
The main focus of this paper is on literal semantic impairments of text transmission, which can arise from various sources, including typing errors introduced by users or incorrect recognition by DNN-based systems, such as automatic speech recognition (ASR) algorithms.This type of impairment may result in misspelled or homophonic words that can cause semantic ambiguity.For instance, consider the sentence "I saw the sun rise".If the word 'sun' is misspelled as 'son' due to speech recognition error caused by their similar pronunciation, this imposes semantic impairments by misinterpreting the sentence, potentially leading to erroneous decisions [21].
Conventional communication systems are typically optimized by minimizing the symbol error rate and lack the ability to extract semantics, hence they are vulnerable to such errors.By contrast, semantic communication systems are expected to eliminate semantic impairments and recover the original meaning even from corrupted text, which is made possible by its ability to understand and interpret semantics.However, existing semantic communication systems primarily focus on physical channel impairments, while overlooking semantic impairments and leading to unreliable communications between the transmitter and the receiver [20].

A. Robustness Against Adversarial Attacks
The first approach focuses on enhancing the robustness against adversarial attacks, which are perturbations that are intentionally designed to mislead DNNs for producing counter-intuitive predictions.These methods involve designing effective defense strategies.Szegedy et al. [22] found that adding invisible perturbations to an image may still deceive a classification model.Goodfellow et al. [23] developed a protection mechanism based on a fast gradient method, while Miyato et al. [24] proposed a semi-supervised method to defend against adversarial attacks.These methods tend to aim for increasing the resilience of DNNs by adding adversarial examples to the training data.By incorporating these defensive strategies, DNN models become more robust to semantic impairments.

B. Robustness Against Literal Errors
Literal errors may gravely affect the semantic perception of DNNs [32].To address this issue, designing robust DNNs capable of correcting literal errors is desired.Literal errors typically arise from the following pair of distinct scenarios: Firstly, errors may have accidentally been made by users during spelling.Numerous studies have focused on eliminating spelling errors.Zhao and Wang [25] investigated the grammatical error correction capability at the data level by harnessing a dynamic mask for generating 'clean-corrupt' example pairs for training.Zhao et al. [26] also introduced a copy mechanism to build a pre-trained model, so as to improve the accuracy of grammatical error correction.Zhang et al. [27] proposed a novel detection and correction framework to deal with Chinese literal errors.By applying error correction methods, DNNs can better cope with semantic impairments and perform well in downstream tasks.
Furthermore, literal errors can also be generated due to the limitations of the DNN-based algorithms.For instance, automatic speech recognition algorithms may generate literal errors, due to the limited performance of recognition accuracy, background noise, and the clarity of speech sources [33].Compared to spelling errors, ASR errors exhibit significant differences in terms of their nature.In addition to misspellings, ASR could also introduce homonym errors, which are caused by the similar pronunciation of words, such as 'sun' and 'son'.To address these challenges, researchers have developed various techniques to reduce the probability of ASR errors.
Leng et al. [28] proposed an edit alignment method to generate edit labels for 'clean-corrupt' data pairs, which are utilized for training.Zhang et al. [29] proposed a dual channel model that leverages both contextual and phonetic information for ASR error correction.Li et al. [30] designed the pre-trained BERTbased model and a copy mechanism to eliminate ASR errors.By reducing the impact of semantic impairments in ASR systems, DNNs succeed in reliably interpreting the semantics of speech sources.

C. Robust Design of Semantic Communications
More recently, these successful defense and correction techniques have also been adopted for improving the robustness of communication systems.Peng et al. [1] proposed a robust semantic communication system to combat adversarial semantic contamination and a specific type of literal semantic impairments.Hu et al. [20] proposed a robust semantic communication system relying on shared codebooks to tackle both sample-dependent and sample-independent semantic contamination.Sadeghi and Larsson [31] studied the robustness of an end-to-end communication system to physical adversarial attacks and defined a metric termed as the perturbation-to-signal ratio for characterizing the strength of adversarial semantic contamination.
Just like any other communication systems, semantic text systems are also prone to semantic impairments [34], which are harder to mitigate than channel impairments.Despite the progress made in addressing the deleterious effects of semantic impairments, challenges in enhancing the robustness of semantic communication systems persist, including the lack of unified taxonomy and metrics, the absence of semantic impairments datasets for training, as well as the challenges in designing effective yet affordable decontamination modules.
In this context, the existing error correction techniques applied to natural language processing do not consider the realistic constraints of the communication process, while the semantic communication studies have not as yet addressed the grave potential impact of semantic impairments.Therefore, this paper investigates the mechanisms of literal semantic impairments and addresses these challenges.Table I boldly contrasts the contributions of this paper against those reviewed above.Our new contributions are further detailed as follows in a point-wise fashion: • To quantify the semantic impairments that the proposed system can handle, we categorized semantic impairments into three distinct types and developed a new metric termed as semantic impairment intensity.Furthermore, we established a semantic impairments dataset having varying semantic impairment intensity.• We developed a robust semantic communication system, referred to as R-DeepSC, which employs a semantic corrector for robust semantic encoding.• Additionally, we proposed a speedy version, namely NA-RDeepSC, which utilizes an adaptive generator and nonautoregressive architecture for significantly improving the inference speed while maintaining robustness, making it a cost-effective yet efficient solution for online services.
The rest of this paper is organized as follows.Section II introduces the semantic communication system models with particular emphasis on semantic impairments.Section III presents our proposed robust semantic communication system design, while our experimental results are discussed in Section IV.Finally, section V concludes this paper.

II. THE ROBUST SEMANTIC COMMUNICATION MODEL
In this section, we present a robust semantic communication model that is specifically designed for minimizing the effects of semantic impairments in communication channels and propose a novel technique for modeling semantic impairments.Furthermore, we formulate our problem in detail.

A. The Robust Semantic Communication System Model
Fig. 2 portrays the semantic communication system architecture considered, which can handle both physical channel effects and semantic impairments.Denote the input text as S, which is broken into tokens based on the tokenization rules.For instance, when tokenizing the sentence "this is predefined", the resulting tokenized sequence could be ['this', 'is', 'predefined'] or ['this', 'is', 'pre', 'defined'], depending on the specific tokenization rules used.During the process of tokenization, the collection of all tokens is referred to as the dictionary, denoted as ν.This dictionary serves as the knowledge base in the proposed system, facilitating semantic encoding and decoding.The tokenized sequence can be represented as S = {s 1 , s 2 , • • • , s L }, where s i is the i-th token.
Then, through the One-Hot encoding and the embedding layer, these tokens can be converted into the embedding vector, E, which are represented as where f ν [•] represents the One-Hot encoder's action associated with the released knowledge base ν and f γ [•] is the embedding layer relying on the trainable parameter set γ. Before tranmission, the robust semantic communication system of Fig. 2 must carry out semantic encoding to extract the pertinent semantic features, followed by semantic correction to refine the semantics, and deep learning enabled channel encoding to guard against physical channel impairments, including impairments caused by AWGN and Rician fading channels.Therefore, the transmitted signal, X, is given by where f φ [•] represents the channel encoder's action associated with the trainable parameter set φ, f λ [•] is the semantic corrector having the trainable parameter set λ, and f η [•] is the semantic encoder having the trainable parameter set η.
The transmitted signal, X, may become distorted by the fading channels and receiver noise.Hence the received signal, Y, can be represented as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I CONTRASTING OUR CONTRIBUTIONS TO THE LITERATURE
Fig. 2. The robust semantic communication system proposed for mitigating the impact of literal semantic impairments.The example in this figure is intended to illustrate the operating principles of our systems, while the proposed systems can handle more complex semantic impairments, encompassing a broader range of forms and higher degrees.
where H characterizes the fading channel and N p ∼ CN (0, σ 2 n ).By utilizing a channel decoder, adaptive generator, and semantic decoder, the received text, Ŝ, can be represented as where g δ [•] is the channel decoder having the trainable parameter set δ, g µ [•] is the adaptive generator associated with having the trainable parameter set µ, and g ζ [•] is the semantic decoder having the trainable parameter set ζ. Specifically, the trainable parameters of our system, including the channel encoder and channel decoder, are optimized and obtained by joint training in an end-to-end manner.The proposed semantic communication system is designed to enhance robustness against semantic impairments, which is alleviated by introducing a semantic corrector at the transmitter to rectify semantic errors.At the receiver side, an adaptive generator is used for producing the input sequence of the semantic decoder to speed up the decoding process.These modules allow the system to cope with diverse types and degrees of semantic impairments, thereby improving the reliability and efficiency of semantic communications.

B. Semantic Impairments
The semantic impairments, N s , which is considered to be literal semantic impairments in the source text, S, may pose challenges for both humans and DNN models.Fig. 3 illustrates two ways of generating semantic impairments, namely by spelling errors during typing and recognition errors generated by deficient DNN models, such as ASR and optical character recognition.Semantic impairments may cause semantic ambiguity and mislead the DNN models.For instance, the misspelling of the word 'excited' in the sentence "we are exhausted with this movie" may confuse a sentiment analyzer.
Semantic impairments may be introduced by three operations: replacement, R, deletion, D, and insertion, I.The i-th word of a sentence corrupted by semantic impairments, N s = {N 1 s , N 2 s , . . ., N n s }, is defined as F(N i s , e, i), which is given by where F(•) is a semantic impairment simulation function that fits the error distribution of users or incomplete DNNassisted systems, N i s is the corresponding corrupted word of u i , and e is the error type.For instance, the corrupted text "I saw the son rise" is obtained after applying F(son, R, 4) function to the uncorrupted sentence "I saw the sun rise".After applying the function F(•) to the uncorrupted sentence, U = {u 1 , u 2 , . . ., u n }, the corrupted text, S, can be obtained.
The semantic impairment simulation function F(•), illustrates how semantic impairments are generated.The objective of the proposed robust semantic communication system is to mitigate these impairments by approximating the inverse function of the semantic impairment simulation function, denoted as F −1 (•).

C. Problem Formulation
The semantic impairment simulation function has no explicit formula, since the distribution of its input variables may vary in different scenarios, it becomes necessary to formulate the associated problem and devise a solution.
The proposed system takes corrupted text with semantic impairments as its input and generates text without semantic Fig. 3.
Illustration of the effects of semantic impairments on semantic communications, and the role of the semantic corrector in mitigating these effects.
impairments.Moreover, when transmitting over physical channels, the transmitted signal will be subject to the effects of channel noise and fading, as seen in Eq. ( 3).
The objective of the proposed system is to eliminate semantic impairments in the transmitted text and achieve highfidelity end-to-end semantic communications, which can be represented as where E(•) quantifies the semantic similarity between the uncorrupted text and the received text, and D is the semantic impairment dataset.To address this challenge, we design robust deep learning enabled semantic communication systems to tackle the problem at hand.

III. PROPOSED ROBUST SEMANTIC COMMUNICATION SYSTEMS
In this section, we propose a robust deep learning aided semantic communication system, namely R-DeepSC, relying on a semantic corrector for robust semantic encoding.Moreover, we develop a non-autoregressive speedy form of R-DeepSC, termed as NA-RDeepSC, which adopts an adaptive generator to perform semantic decoding at an accelerated inference speed.Additionally, we discuss the model's implementation in practical scenarios.

A. Robust Semantic Encoding Relying on the Semantic Corrector
Vaswani et al. [35] calculate attention scores based on the semantic correlation between tokens, regardless whether they are corrupted or not.By applying these scores to the semantic representations of all tokens, the semantics of the sentence may be obtained.However, if a token is incorrect, its corrupted semantic representation may interfere with the semantics of other tokens, leading to corrupted semantic information.For instance, if an incorrect word is present in the input sequence, such as 'son' instead of 'sun' in "I saw the sun rise", the self-attention mechanism may calculate its representation vector and attention score, which can cause a deviation in the contextual representation of other words and lead to inaccurate model output.
To cope with this problem, we propose a novel semantic encoder which utilizes a semantic corrector and a calibrated self-attention mechanism to eliminate the influence of semantic impairments.For example, in "I saw the son rise", we can adjust the attention score of 'son' to minimize its impact on the contextual representation of other words.This process can help eliminate the interference of corrupted text and improve the accuracy and performance of the Transformer model.
The architecture of the robust semantic encoder developed is shown in Fig. 4. The adopted knowledge base is the dictionary for conducting the Ont-Hot encoding.The extracted semantics, M, is obtained by where f ϱ (•) is the semantic encoder having the trainable parameter set ϱ, and E is the embedding vector obtained with the knowledge base in Equation (1).
A novel semantic corrector is introduced to rectify the corrupted semantics obtained by the semantic encoder, which is the core component of the proposed model, comprising a Gated Recurrent Unit (GRU) [36], a fully connected layer, and a sigmoid activation function.The error probability P of the tokens, may be represented as where f ϵ (•) is the semantic corrector having the trainable parameter set ϵ, and M is the output of the semantic encoder.
The calibration matrix, C, is formulated as The calibration matrix is a weight matrix that adjusts the attention scores of a model to reduce the impact of corrupted text.It assigns smaller weights to the corrupted words, which helps the model better understand the semantic information of the input sequence and enhances its accuracy and performance.
Then, the attention score is calibrated by C for ensuring that more attention is devoted to uncorrupted tokens.The calibrated attention score, A c , can be expressed as where ⊙ represents the element-wise product, Q, K, V, d k are the query, key, value, and the dimension of the encoded semantics.
The calculation process of calibrated self-attention is summarized in Algorithm 1.The value of C is firstly set as none, which will be updated after passing through the encoder layer of Fig. 2. The semantic error corrector has to be activated N − 1 times throughout the semantic encoding process, where N is the number of layers.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
else 6: end if T .9: end for 10: return M Furthermore, to train the semantic corrector, a novel loss function, L SC (•), is developed by relying on the binary crossentropy loss [37], which is defined as where L = {l 1 , l 2 , . . ., l n } is the label indicating, whether the token is corrupted or not and l i can be

B. Non-Autoregressive Decoder for Inference Acceleration
Although the Transformer of [35] has achieved remarkable performace, the inference time of this autoregressive form has increased substantially due to the complete dependence between tokens.To enhance the inference speed and minimize the communication delay, we concieve a reduced-complexity decoder structure of non-autoregressive form.To realize this goal, the mechanisms of autoregressive and non-autoregressive architecture are analyzed as follows.
An auto-regressive semantic communication system generates the i-th token of the received sentence Ŝ as: where O is the output of the channel decoder, g ζ (•) is an autoregressive semantic decoder, and ν is the knowledge base.Naturally, generating a sequence in an autoregressive manner, which predicts one token at a time based on the previously predicted tokens, has to carry out decoding in series.As a result, the inference time of the autoregressive decoder is on the order of O(n), which incurs an escalating communication delay.
By contrast, a non-autoregressive semantic communication system can transmit data in parallel, because it directly generates a sequence at once.The received text Ŝ can be represented as where g π (•) is the non-autoregressive decoder, which utilizes an independent conditional sequence, I, rather than the tokens, ŝ1 , ŝ2 , • • • , ŝi−1 , generated for conducting semantic decoding.The non-autoregressive architecture is capable of decoding in parallel, hence accomplishing decoding at an inference time order of O(1).
The non-autoregressive architecture is capable of significantly reducing the inference time.However, designing the independent conditional sequence, I, constitutes a critical challenge when establishing a non-autoregressive model.The previously proposed non-autoregressive models rely either on a source-target alignment constraint with fertility [38], or on duration prediction [28] to regulate I.Although these solutions achieve excellent performance, their premise is that the decoder has access to the source text, S, which can then be utilized to build the independent conditional sequence, I, for the semantic decoder.
Unfortunately, this assumption is not applicable to realistic communication scenarios, because as seen in Fig. 2, the semantic decoder receives its input signal, O, from the channel decoder and it can only obtain the source text in case of errorfree channel decoding.Therefore, an appropriate conditional sequence must be designed along with the corresponding loss function for training.
In this paper, an adaptive generator, which consists of linear layers, the popular Relu activation function, and the softmax function, is devised for predicting the target length, T , of the input text.The process can be represented as where g µ (•) is the adaptive generator module having the trainable parameter set µ.
The k-th token of the input sequence, I, is defined as where ⟨UNK⟩ and ⟨PAD⟩ are predefined tokens, indicating that the token is not in the dictionary and the token is used for padding, respectively.The architecture of the proposed semantic decoder is shown in Fig. 5. Compared to the autoregressive form, the input sequence is no longer constituted by the generated tokens, but by the predicted conditional sequence.Furthermore, since the model predicts in parallel, it is no longer necessary to rely on a mask mechanism, in contrast to the Transformer of [35].
Moreover, the cross-entropy loss function is utilized to develop an adaptive generator loss function, L AG (•), to regulate the output of the adaptive generator, which is defined as where G is the ground truth for the adaptive generator.

C. Loss Function for Robust Semantic Communications
To allow the whole system to function appropriately, a new loss function is proposed for training the robust semantic communication systems developed, which is given by

D. Model Implementation
To enhance the accuracy and efficiency of predictions, we develop a pair of models, namely R-DeepSC and NA-RDeepSC.Specifically, R-DeepSC focuses on transmitting text with a high semantic fidelity by utilizing our robust semantic encoding and autoregressive decoding architecture.Its loss function is composed of the first three terms of L total in (18), which helps us to optimize the accuracy of the encoding process.By contrast, NA-RDeepSC aims for eliminating the semantic impairments, while maintaining a high inference speed by relying on both our robust semantic encoding and non-autoregressive architecture.Its loss function is L total , which optimizes the overall performance of the model.
The choice between these models depends on the specific task at hand.R-DeepSC is suitable, when the objective is to accurately encode text into a structured representation for transmission.Conversely, NA-RDeepSC is better suited for efficiently decoding structured representations back into text at a superior speed, while maintaining high accuracy.By leveraging the strengths of both R-DeepSC and NA-RDeepSC, our semantic communication solutions are capable of striking a flexible inference accuracy versus speed trade-off.
For time-sensitive scenarios, such as the real-time chat, deploying NA-RDeepSC is advantageous due to its lower inference complexity.Conversely, for the professional document transmission, the R-DeepSC is recommended to ensure high accuracy in error correction.The model selection algorithm is summarized in Algorithm 2. Load the paramters of R-DeepSC for transmission.5: end if

IV. NUMERICAL RESULTS
In this section, we construct semantic impairments datasets for employment in our experiments.Furthermore, we present our performance metrics, simulation settings, and the experimental results to validate the robustness of the proposed models.

A. Datasets and Baseline Models
We adopt the Europarl corpus dataset [39], which is based on the proceedings of the European Parliament in 11 different languages.The English corpus, which contains 98,751 sentences, is selected as the transmitted data.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II DETAILS OF PROPOSED DATASETS
Then, a pair of semantic impairments datasets are harvested based on the Europarl dataset.The first dataset is termed as the induced spelling error dataset, which is obtained by randomly sampling the words in the corpus and performing operations based on the predefined transformations of [40] to introduce semantic impairments.The transformation types include substitution, insertion, deletion errors, and verb replacements.We simulate literal errors that may occur in typing by imposing these operations on the corpus, which are determined by sampling a Multinoulli distribution defined in [41].By adjusting the probability of the above errors and the sampled word index, induced spelling error datasets associated with different levels of semantic impairment intensity were collected.The second dataset is referred to as the spontaneous spelling error dataset, which is also based on the Europarl dataset constructed by leveraging the released spelling errror replacement rules [42] relying on the same method.
Moreover, we use the speech-recognition and synthesis based semantic communication system of [43] to transmit elements of the Librispeech [44] dataset over AWGN channels.Briefly, Librispeech is a dataset, which has about 1,000 hours of English speech excerpts, used for conducting ASR tasks.By varying the signal-to-noise ratio (SNR), we obtained an ASR error dataset having different levels of semantic impairment intensity.
The proposed models and baseline models are evaluated by relying on these datasets.Details of these datasets are presented in Table II.There are different types of errors in these datasets, which are suitable for comprehensively testing the performance and for yielding reproducible results.
Our proposed system is compared to a range of baseline models.The first one is DeepSC [3], which is a semantic communication system based on deep learning.The second one harnesses the SoftMaskedBERT of [27] along with the BERT tokenizer [45] as the semantic codec.The remaining systems use Huffman and low-density parity-check (LDPC) codes with a 0.5 code rate for channel coding, and adaptive modulation [46] techniques for transmission.In AWGN channels, the SNR is stable, but in Rayleigh and Rician fading channels, it can fluctuate significantly.To ensure efficient communication, we use adaptive modulation (AM), which dynamically adjusts the modulation scheme based on the channel conditions to maximize the data rate, while maintaining a low bit error rate.Specifically, we utilize 8-QAM modulation for unfavorable channel conditions, while we employ 16-QAM modulation for good channel conditions.To ensure a fair comparison, DeepSC utilizes the same parameters for training as the proposed R-DeepSC.

B. Simulation Settings
In this experiment, we set the number of layers to 3 and the number of heads to 4. The semantic corrector is set to a gated recurrent unit associated with 128 units and a linear layer, activated by the sigmoid activation.The channel encoder is a dense net having 2 layers, whose hidden dimension is 256 and output dimension of 16.The channel decoder has three layers, with a hidden dimension of 512.The adaptive generator consists of two linear layers and a normalization layer.After passing through the triple-layer semantic decoders, the predicted sequence is generated by the head layer.The details of these settings can be found in Table III.

C. Performance Metrics
Again, in contrast to conventional communication systems, classic metrics, such as the bit-error rate and symbol-error rate, are unable to adequately quantify the performance of semantic communication systems.Instead, we have to consider whether there is a semantic gap between the transmitted and the received text.Hence, we take advantage of the BLEU score [47] and the BERTScore [48] for characterizing the communication performance, while utilizing the semantic impairment intensity for quantifying semantic impairments.
1) BLEU Score: The BLEU score utilizes the n-gram matching criterion for evaluating the integrity or intensity of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the received text.For example, if we take the sentence "I saw the sun rise", the 1-grams would be 'I', 'saw', 'the', 'sun', and 'rise', while the 2-grams would be 'I saw', 'saw the', 'the sun', and 'sun rise'.We denote the number of the k-th word for the n-gram text by C k , while the weight of the n-gram precision, and the penalty index by W n and BP.The BLEU score is formulated as follows More particularly, BP is defined as where l R corresponds to the length of the received text, and l T corresponds to the length of the transmitted text.The value of the BLEU score is between 0 and 1, and a higher score implies having a more similar sentence.The BLEU score is efficient, but it only estimates the literal, rather than the semantic difference.As a result, we also harness the BERTScore as the metric of quantifying the semantic similarity between two sentences.
2) BERTScore: The BERTScore quantifies the semantic similarity and applies different weights to words according to their corresponding semantic importance.It was shown in [48] that the semantic similarity assessed by the BERTScore is closely related to human judgements.
We denote the corresponding vector representation of the transmitted text S by ⟨T 1 , T 2 , . . ., T n ⟩, and the vector representation of the received text Ŝ by ⟨R 1 , R 2 , . . ., R m ⟩.All these vetors are calculated by the BERT model.The importance weight function idf (•) can be formulated as where R 0 , R 1 , . . ., R M is the test corpus.The BERTScore between the transmitted and the received text can be obtained as Next, the BERTScore is stretched to an expanded range using the following transformation where b is a scaling factor.The rescaled BERTScore ranges from -1 to 1, and a higher score implies a higher similarity between the pair of input sentences.
3) Semantic Impairment Intensity: Moreover, to quantitatively characterize the semantic impairments, we devise a new metric namely the semantic impairment intensity (SII) to quantify the intensity of semantic impairments, which is given by where BLEU(•) is the function quantifying the so-called bilingual evaluation understudy (BLEU) score between the corrupted sentence, S, and the uncorrupted sentence, U.The higher the SII, the stronger the semantic impairments in the source text.

D. System Performance
We conducted comprehensive experiments to validate the performance of the proposed semantic communication systems relying on our semantic impairments datasets.
1) System Performance Versus SNR: Fig. 6 illustrates the performance of our systems for transmission over AWGN channels at various signal-to-noise ratios, in the face of different types of semantic impairments.Specifically, Fig. 6(a), Fig. 6(b), and Fig. 6(c) show the BLEU score of our systems versus the ASR error, spontaneous spelling error, and induced spelling error, respectively.These test datasets have a semantic impairment intensity of 0.4, and the models are trained by a combination of three kinds of semantic impairments.
Observe by comparing Figs.6(a) to 6(c) that our semantic communication systems exhibit lower BLEU scores when tested on ASR error datasets compared to other types of semantic impairments.This result indicates that correcting ASR errors presents the most grave challenge for semantic communication systems.A plausible reason for this is that ASR errors are more complex and have a wider variety of types, making them more difficult to correct.Nonetheless, our solutions still achieve significant improvements in correcting ASR errors, hence they are eminently suitable for practical real-world applications, such as speech recognition.
Furthermore, the results of Fig. 6 suggest that DeepSC struggles to eliminate the semantic impairments inflicted by ASR error datasets, as evidenced by the BLEU scores seen to be lower than 0.6 at high SNRs.This indicates that DeepSC lacks the capability of eliminating semantic impairments without dedicated designs.
By contrast, R-DeepSC efficiently mitigates the semantic impairments imposed by all three datasets, as evidenced by its superior BLEU scores in Fig. 6.This is because R-DeepSC is specifically designed for correcting semantic errors through robust semantic encoding by relying on the semantic corrector and the calibrated self-attention mechanism of Fig. 2.
Similarly, observe in Fig. 6 that the NA-RDeepSC is also capable of mitigating both spontaneous and induced spelling errors.However, it also struggles to correct ASR errors, as evidenced by its lower BLEU scores compared to R-DeepSC.This is because NA-RDeepSC utilizes nonautoregressive decoding, which limits its ability to handle the complex errors inflicted by the ASR datasets.
In addition to AWGN channels, we also conducted experiments under Rician fading channels (k = 1) associated with various literal errors.The results shown in Fig. 7 exhibit similar trends to those under AWGN channels.Explicitly, the error correction capability quantified in the face of these datasets follows the order of ASR error < induced spelling Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.error < spontaneous spelling error, in line with the gravity of the afflictions experienced.
Furthermore, the performance gap between NA-RDeepSC and R-DeepSC becomes narrower under Rician fading channels compared to AWGN channels.This could be attributed to the fact that Rician fading channels often impose more grave channel impairments at a given SNR.However, the proposed NA-RDeepSC beneficially leverages both the calibrated self-attention mechanism and the nonautoregressive decoding architectures, which allow it to handle the complex errors arising in Rician fading channels more effectively.As a result, NA-RDeepSC achieves more similar transmission performance to that of R-DeepSC under Rician fading channels, despite the complexity of the propagation environment.
2) System Performance Versus SII: To further evaluate the performance of these communication systems, we conducted experiments under various semantic impairments intensities, including 0, 0.2, 0.4, 0.6, and 0.8.Fig. 8 shows the performance versus SII at 18 dB for an AWGN channel.The test set is composed of three types of semantic impairments.
The results indicate that both R-DeepSC and NA-RDeepSC outperform the other systems, especially when the SII is greater than 0.2.This demonstrates that R-DeepSC and NA-RDeepSC are capable of supporting robust text transmission.
Fig. 9 shows the performance of semantic communication systems versus the SII under Rician fading channels.The results demonstrate that the semantic fidelity of conventional communication system is significantly degraded in the face of Rician fading channels, while our semantic communication systems achieve superior robustness, as evidenced by both the BLEU score and BERTScore.This is because the proposed NA-RDeepSC and R-DeepSC leverage joint semantic-channel coding methods, allowing them to handle the complex impairments inflicted by Rician fading channels more effectively.Additionally, the models proposed achieve higher semantic fidelity than DeepSC, even in the face of violently fluctuating SII.
The BLEU score and BERTScore metrics used in this study quantify the text similarity differently, with BLEU evaluating character-level similarity, while the BERTScore measuring semantic similarity.Although they may show a similar tendency in most cases, this is not always the case.For example, in Fig. 8, NA-RDeepSC achieves a similar BLEU score to conventional communication systems using LDPC coding and adaptive modulation associated with SII = 0.2,  while NA-RDeepSC achieves higher semantic fidelity, as evidenced by its BERTScore.This highlights the importance of considering both metrics for confidently quantifying the performance of semantic communication systems, since they provide different insights in terms of character-level and semantic-level fidelity.Fig. 10 demonstrates the loss evolution of the proposed NA-RDeepSC.It can be observed that the MI loss keeps on increasing while the other components of the loss function gradually decrease and eventually converge, demonstrating the effectiveness of the system.Table IV shows the transmission results of samples containing different kinds of semantic impairments for SII = 0.4, which further demonstrate the effectiveness of our proposed models.

E. Computational Complexity Analysis
The proposed NA-RDeepSC exhibits higher inference speed than DeepSC and R-DeepSC as a benefit of its non-autoregressive architecture, which allows for parallel computation and O(1) time complexity.By contrast, DeepSC and R-DeepSC rely on sequential decoding, resulting in a decoding time complexity of O(n).Owing to its  This substantial acceleration improves the efficiency of NA-RDeepSC for semantic communications, making it a practical solution for online services.

V. CONCLUSION
We commenced by categorizing semantic impairments into ASR, spontaneous, and induced spelling errors.To investigate their impact, we have generated semantic impairments datasets and devised the SII metric for our further analysis.We have then conceived the R-DeepSC and NA-RDeepSC schemes.R-DeepSC employs the novel semantic corrector of Fig. 2 to perform robust semantic encoding and an autoregressive scheme for semantic decoding.NA-RDeepSC, which incorporates the R-DeepSC into a non-autoregressive scheme by adopting an adaptive generator to accelerate the inference speed attained.The experimental results demonstrate that both the R-DeepSC and NA-RDeepSC are more robust than the benchmarks, as evidenced by their BLEU score and BERTScore.By applying R-DeepSC and NA-RDeepSC, robust semantic communications can be supported, which could pave the way for their application in real-world scenarios.

Fig. 4 .
Fig. 4. The robust semantic encoder developed, which relies on a semantic corrector and a calibrated self-attention mechanism.Algorithm 1 Algorithm of Calibrated Self-Attention Mechanism Input: Q: The query of the input sentence; K: The key of the input sentence; V: The value of the input sentence; N is the number of encoder layers.Output: M : The semantic output.1: C = None 2: for each i ∈ [1, N] do 3: if C = None then where L CE (•) aims for making the uncorrupted text, U, and the received text, Ŝ, as similar as possible; Furthermore, L M I (•) maximizes the capacity or the data transmission rate by maximizing the mutual information between the transmitted signal, X, and the received signal, Y; L SC (•) is the predefined loss used for training the semantic corrector; Finally, L AG (•) is the loss employed for training the adaptive generator.The proportions of L M I (•), L SC (•), and L AG (•) in the loss function can be controlled by the positive parameters α, β, and γ.

Algorithm 2 2 :
Algorithm of Model Selection Initialization: Load the pretrained R-DeepSC and NA-RDeepSC; Function: Select model for inference.Input: Business type b t 1: if b t is time-sensitive business then Load the paramters of NA-RDeepSC for transmission.3: else 4:

Fig. 6 .
Fig.6.System performance in AWGN channels versus the SNR with various types of semantic impairments.

Fig. 7 .
Fig. 7. System performance in Rician fading channels for k = 1 versus the SNR with various types of semantic impairments.

TABLE IV TRANSMISSION
RESULTS FOR SAMPLES WITH SII = 0.4TABLE V DETAILS OF INFERENCE TIME significantly faster.Specifically, NA-RDeepSC requires only about 22% of the inference time required by R-DeepSC.