Design of Red Culture Retrieval System Based on Multimodal Data Fusion and Innovation of Communication Strategy Path

Cultural communication plays a vital role in social development and human interaction. Red culture, as an integral part of China’s revolutionary history and socialist construction, holds significant meaning and exerts a wide influence. However, in the era of information technology, effectively disseminating red culture and stimulating public interest and participation has become an urgent challenge. In this study, we use the advanced deep learning tech to explore the use of multimodal data fusion for enhancing the effectiveness and impact of red culture communication. Specifically, we extract text features and image features from users’ browsing information using BI-GRU and CNN, respectively. These features are then fused with user portraits to create a multi-source information fusion vector. Subsequently, we employ a BPNN (Backpropagation Neural Network) to perform user interest classification based on the fused features. Experimental results demonstrate that our proposed user recognition framework achieves an average recognition rate of 95.4% across three types of users, indicating high accuracy. Therefore, the user interest classification model, incorporating fused multi-features, presented in this paper offers a promising approach for future red culture communication, as well as user intelligent recommendation and analysis.


I. INTRODUCTION
Red culture holds a significant position and exerts a profound influence on modern Chinese culture.It serves as not only a crucial component of China's revolutionary history but also an integral element of contemporary Chinese culture, playing a vital role in enriching its connotation and enhancing its influence [1].With the changing times and societal developments, the status of red culture in Chinese society is also evolving and progressing.This integration ensures that the spiritual essence of red culture is closely intertwined with modern Chinese culture, rendering it an indispensable and pivotal aspect of Chinese culture [1].In recent years, the Chinese government has intensified its efforts to safeguard and preserve red culture.Through diverse forms of cultural construction and tourism development, red culture has been The associate editor coordinating the review of this manuscript and approving it for publication was Rosalia Maglietta .
effectively disseminated to society and the public.Simultaneously, red culture has been seamlessly integrated into various sectors of China's social development, thereby facilitating the inheritance of historical and cultural heritage as well as fostering innovative advancements in Chinese society.Consequently, red culture has emerged as a significant source of confidence in Chinese culture [2].Red culture, as a vital part of China's revolutionary history and culture, possesses substantial historical, political, and cultural values.However, during the process of preserving and promoting red culture, challenges such as fragmented data, asymmetric information, and inadequate dissemination have emerged.Therefore, it is imperative to address these challenges and ensure the transmission of red culture, enabling a broader audience to appreciate its exceptional heritage [3].
The study of red culture has gained significant traction as a popular research field, encompassing various aspects such as historical background, connotation, and inheritance.Moreover, the dissemination methods of red culture are continuously evolving, incorporating innovative and expansive approaches.In addition to conventional methods like book publishing, TV dramas, and movies, emerging media platforms such as Internet live streaming and mobile applications have emerged, enabling a wider audience to access and learn about red culture [4].Concurrently, red culture is increasingly integrated into people's daily lives through red tourism, cultural experiences, and thematic education.These initiatives ensure greater participation in the preservation and development of red culture [4].To conclude, red culture holds immense importance in China's revolutionary history and contemporary culture, making its research and dissemination highly significant.Moving forward, it is crucial to reinforce the research and preservation of red culture and promote its transmission through various means, contributing actively to the revival and great rejuvenation of the Chinese nation [5].
With the rapid advancement of Internet technology, the study of red culture has undergone a transformation from traditional academic research to digital research.This includes the exploration of red culture retrieval based on multimodal data and the utilization of AI and machine learning for red culture classification and prediction [6].Artificial intelligence algorithms have the potential to enhance users' reading and learning experiences by providing more accurate red culture content recommendations based on their interests and preferences.Additionally, these algorithms can fuse multimodal data, such as images, videos, and texts related to red culture, to construct a robust red culture retrieval system.This system offers users convenient access to red culture content, thereby deepening their understanding and awareness of red culture.Thus, the role of artificial intelligence in the dissemination of red culture is pivotal, and its application is expected to further drive the development and preservation of red culture.
In this investigation, we propose an intelligent user identification method that integrates multimodal information fusion.This method incorporates user retrieval information, as well as textual and image information extracted from browsing news articles, to cater to the needs of red heritage culture.The primary objective is to swiftly identify and analyze users who have a specific interest in red culture.The contributions of this paper can be summarized as follows: 1. this paper implements text and image feature extraction of red culture learning user browsing information using BI-GRU and CNN; 2. the fusion of multi-source information based on user profiles and extracted text-image features; 3. User interest classification was completed based on multi-source fusion features, and the classification accuracy exceeded 95%.
The remainder of this paper is as follows: Section II introduces related works for the user interest analysis; In section III, the BI-GRU and CNN are In section III, the BI-GRU and CNN are both introduced to establish user interest classification model; Section IV describes the experiment and result analysist; In Section V, we discuss the result and what should be paid for culture transmission; Conclusion is presented at last.

II. RELATED WORKS A. CONTENT RETRIEVAL RESEARCH
Information retrieval is a crucial process for users to obtain the desired information, and dealing with the immense volume of available data has become a prominent research area.In terms of unimodal retrieval, image retrieval research began in the 1970s, primarily focusing on text-based image retrieval.This approach describes image features using text and utilizes text matching to achieve effective image retrieval [7].Lowe introduced the scale-invariant feature transform method to address challenges related to poor invariance of global descriptors in scenarios like occlusion and luminance variations.Subsequently, the FV algorithm and VLAD algorithm were developed to enhance the overall information expression of images through coding local features [8].To overcome the limitations of image retrieval methods based solely on visual features, semantic-based image retrieval emerged, incorporating natural language processing and traditional image retrieval techniques.In recent years, with advancements in deep learning theory and computer performance optimization, algorithms based on neural networks for feature extraction have rapidly evolved [9].
Text retrieval [10] involves organizing and processing text collections according to specific methods.The retrieval algorithm compares user-submitted keywords or sentences to retrieve relevant document information based on the user's requirements [11].
In recent years, the rapid growth of multimodal data, including text, images, videos, and audio, has surpassed the capabilities of traditional unimodal retrieval methods.Consequently, cross-modal retrieval has gained popularity, allowing users to submit queries of any modality and obtain results with diverse modalities.The key challenge in cross-modal retrieval is the ''media gap'' problem, which arises due to the inherent differences between modalities, making similarity measurement a significant challenge [12].However, crossmodal similarity measures have made it possible to compute similarity directly without the need for a predefined common space [13].Most current retrieval methods focus on retrieving both image and text modalities, but cross-modal retrieval is expanding to encompass various modalities such as audio and video.This integration of multiple modalities represents the future trend and direction of cross-modal research [14].

B. USER RECOMMENDATION STUDY
In big data era, the problem of information overload has become increasingly severe, requiring users to invest significant time in selecting relevant information.Recommendation algorithms have emerged as effective solutions to address this challenge.These algorithms have been widely applied in various domains, including video websites, e-commerce platforms, and search engines, yielding substantial economic benefits.Consequently, research on recommendation algorithms has gained momentum as a hot topic.
Jiang et al. proposed a new collaborative filtering method by incorporating information entropy and double clustering into collaborative work, enabling the extraction of locally dense scoring modules [15].Tian et al. designed a hybrid recommendation algorithm that utilized clustering techniques to alleviate the issue of data sparsity.They combined a big data platform with a hybrid recommendation algorithm to achieve personalized book recommendations [16].Li et al. devised a collaborative combined spectral clustering and transfer learning filtering recommendation algorithm.This approach effectively addressed the challenges of data sparsity and the lack of knowledge transfer between multiple matrices [17].
To mitigate the over-reliance of traditional recommendation algorithms on user behavior, some scholars have conducted extensive research.After conducting the aforementioned research, it becomes evident that the application of AI technology in big data analysis has significantly contributed to the acceleration of system development efficiency and enhanced user experiences in various fields.Particularly, in the context of propagating red culture, which is often perceived as dull, it becomes crucial to incorporate user behavior and browsing information.This integration enables cross-modal fusion recognition and facilitates the creation of more specific user tags, thereby enhancing the dissemination of red culture.

III. CNN-BI-GRU BASED MULTIMODAL INFORMATION FUSION USER IDENTIFICATION MODEL
In the dissemination of red culture, it is crucial to quickly propagate new ideas and new consciousness, requiring a certain degree of real-time capability.Therefore, during the process of user interest recognition, it is essential to achieve real-time recognition and updating of relevant browsing information.This recognition should go beyond the conventional user behavior portrait and encompass the integration of both image and text information from browsing data.By doing so, user interest recognition can be enhanced, providing detailed reference data for future red culture dissemination efforts.The results of the model proposed in this paper are depicted in Figure 1.

A. TEXT FEATURE EXTRACTION BASED ON BI-GRU
To process textual information in this paper, the BI-GRU (Bidirectional Gated Recurrent Unit) model is chosen.The BI-GRU model is widely employed in text feature processing due to its effectiveness in analyzing contextual semantics, which is crucial given the biased nature of news content.BI-GRU is a deep learning model used for text sequence modeling and finds applications in various natural language processing tasks such as text classification, and machine translation [23].GRU, an improved recurrent neural network (RNN) structure, exhibits better modeling capabilities with a shorter memory distance.BI-GRU considers both past and future contextual information simultaneously within a single time step.This enables it to comprehensively capture semantic and contextual information in text.Figure 2 illustrates the structure of the BI-GRU model [24].In Figure 2, the important feature that distinguishes GRU from traditional RNN and LSTM methods is its update gate zt and reset gate rt, which are calculated as shown in equations (1) and ( 2): where the σ represents activation function.W is the weight of the cell, b is the bias and x represents the input for each layer.The BI-GRU consists of two independent GRU layers, one for forward sequences (left to right) and the other for reverse sequences (right to left).Each GRU layer consists of an Update Gate and a Reset Gate, which control the flow of information and the updating of memory.Specifically, in the forward sequence, the input text is propagated forward through a GRU layer, and the hidden state is updated continuously over time.In the reverse sequence, the input text is propagated backward through another independent GRU layer, and the reverse hidden state is also updated over time.
Finally, the forward and reverse hidden states are stitched or weighted and summed to form the final bidirectional hidden representation.Where the forward candidate hidden state update and the forward update are shown in Eqs. ( 3) and ( 4): Similarly, for the inverse update, the candidate hidden state and the inverse update are shown in Eqs. ( 5) and ( 6): After the forward and inverse updates are completed, the final result is obtained by splicing and weighting the states.The forward and inverse hidden state splicing is shown in equation (7), while the final forward and inverse weighted summation can be expressed by equation (8): )

B. CNN-BASED FEATURE EXTRACTION FOR NEWS IMAGES
In the process of spreading information related to red culture, besides textual information, image information is also its important carrier, and the color scheme used in the images, i.e., the text and other contents are also crucial to the spreading of red culture, so this paper uses CNN method to extract the features of related news images.CNN is a very powerful tool when it comes to feature extraction from image data, and through its unique structure and convolutional operations, CNN can effectively capture local features in an image and gradually combine and abstract them to generate a representation with high-level semantic information.multiple convolutional layers, pooling layers, and fully connected layers.The convolutional layer performs feature extraction on the input image using convolutional operations, the pooling layer is used to downsample and reduce the feature dimension, and the fully connected layer is used to map the extracted features to the final output class.the convolutional operations of CNN perform feature extraction on the image through sliding windows (convolutional kernels), and each sliding window performs convolutional operations with a local region of the input image to generate the corresponding feature mapping.This local connectivity and weight sharing effectively reduces the number of parameters in the model and is able to capture the spatial localization features in the image.The convolution and operation procedure is shown in equation ( 9): where x is the input feature map, w is the convolution kernel, y is the output feature map, (i, j) are the coordinates of the output feature map, and (m, n) are the coordinates of the convolution kernel.Pooling operations reduce feature dimensionality and retain primary information by aggregating features in local regions into a single value by subsampling over the feature map.The common pooling operations are maximum pooling and average pooling, which are calculated as shown in Eqs. ( 10) and (11).
where y represents the pooling result of the layer.Therefore, in this paper, we use this method to extract and analyze the features of image data, and form the corresponding feature sequences to provide the basis for subsequent user classification.

C. USER IDENTIFICATION BASED ON USER BEHAVIOR WITH CNN-BI-GRU
After the extraction of basic user features, text features and image features of browsing information are extracted separately, this paper classifies users based on these features.A variety of methods can be used to fuse text and image features.Here we use a simple method, i.e., connecting text and image features to form a multimodal feature vector.The concatenation can be as simple as concatenation, or as a weighted average, etc.In this paper, due to the small amount of data and few labels of classification results, a simple concatenation is used to form a feature vector, and after completing the construction of the feature vector this paper uses BPNN to complete the classification of users, and the softmax function is shown in equation (12): Using the Softmax function as the activation function of the output layer helps the model to model the classification of different classes and provides an intuitive probabilistic interpretation.At the same time, the derivatives of the Softmax function have good mathematical properties, which facilitate the gradient calculation and parameter update in model training.

IV. EXPERIMENT RESULT AND ANALYSIS
In response to the issue of user classification involved in the dissemination of red culture, we conducted relevant data model training and testing using the data stored in the current system, with a division ratio of 70% of the training data and 30% of the training data.The data entry mainly includes modal data such as user browsing image data, text information, attention duration, and text content.Due to some missing data, a total of over 200 user data were obtained considering only valid data.
In this paper, according to the user interest recognition model built in section III, we analyzed the relevant data using the selected data, mainly completed the training of the model, feature fusion test and comparison of classification methods, etc.In this paper, we divided the testers' interests into history, politics and economy according to their interests in red culture content.The results are as follows:

A. MODEL TRAINING PROCESS AND THE RECOGNITION RESULT
In this paper, the relevant training of the model was carried out using the collected user data, and the variation of loss function and accuracy during its training is shown in Figure 3: We can see through Figure 3 that the model has a more uniform loss change during the training process, and the accuracy change of its line roll gradually improves, and after about 20 rounds of training, the model reaches convergence and its final average recognition accuracy reaches 95.4%.
To better analyze the performance of the model for user interest classification, we calculated three commonly used metrics in machine learning, Precision, Recall and F1score, and the results are shown in Figure 4, where we can see that for the three types of data, the overall recognition rate is maintained around 95%, and the F1score is over 0.95, which This indicates that the proposed fusion of multidimensional features for user interest recognition is more balanced and its results are more excellent.

B. METHOD COMPARISON IN THE USER CLASSIFICATION
In the analytical validation of the model, the more classical machine learning methods such as SVM were selected for accuracy comparison in the classification according to the fused features, and the results are shown in Figure 5: In Figure 5, we can find that the proposed method is significantly better than other classical methods in recognizing precision, and it can be seen that after fusing features, BPNN has better recognition due to more nonlinear features involved.

C. FEATURE CONTRIBUTION FOR THE RECOGNITION RESULT
After completing the classification comparison, we also performed a comparison of features, the results of which are shown in Figure 6, Table 1 and 2: In the process of feature comparison, we selected single modal features and the combination of two and three, and found that among these features, the recognition rate using user information alone is the lowest, while the overall recognition rate has been improved after the introduction of 134122 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.browsing information, and the overall recognition rate of the proposed method fusing three types of information features is significantly better than that of single features and two types of features.The fusion of features is crucial to the final recognition accuracy of the model.

D. THE PRACTICAL TEST FOR THE MODEL
In this paper, after the construction of the model, we conducted a practical test in which we recruited relevant volunteers to evaluate and analyze the data.We conducted a practical classification test in which users were asked to browse relevant information at a specific time and match it with their interests, and the resulting confusion matrix is shown in Figure 7: We can see from the results of the confusion matrix in Figure 7 that the recognition rate of all three categories of user interests exceeds 90%, and the recognition accuracy of the econonomy category is the highest, because the data in this category occupies a larger proportion, which makes its overall characteristics more obvious, so the overall recognition rate is higher.In the confusion matrix, we can find that the classification of different data is more balanced, and the overall misclassification rate is lower.

V. DISCUSSION
In this paper, our focus is on studying communication methods in the realm of red culture dissemination.We propose a classification approach for user interests by fusing features from multiple information sources, including user retrieval information, user portraits, and browsing information.Specifically, for extracting features from textual data, we employ the BI-GRU method, which allows us to capture contextual information effectively.The BI-GRU model comprises two directional GRUs that process sequences in both the forward and reverse directions, respectively.This architecture enables the model to learn dependencies and semantic information efficiently within the text.The GRU units incorporate a memory and update gate mechanism, which selectively forget or update previous states, facilitating the modeling of long-term dependencies within the sequences.With its bidirectional structure, the BI-GRU model excels at capturing contextual information and extracting rich semantic features from the text.On the other hand, the CNN model excels in processing image data and is adept at capturing local features and spatial structures within images.CNNs extract features through convolutional and pooling layers, where the convolutional layer detects local patterns and features, while the pooling layer reduces the feature map's size while preserving essential information.These convolution and pooling operations enable CNNs to learn abstract features in images, making them well-suited for image recognition and classification tasks.While the effect of fusion stages is also discussed by Pawłowski [25], and the results illustrates that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination.The similar results can also be found in the feature fusion discussion investigation [26], [27].In these researches, the authors discussed the feature fusion effect considering the different application scenario, which provided more inspiration for this paper in the fusion choice.The different fusion strategies are discussed in the result that the feature fusion using CNN and BI-GRU can achieve the best performance.
By fusing features extracted from the BI-GRU and CNN models, we leverage the complementary nature of text and image data, resulting in a more comprehensive and integrated feature representation.Through this fusion process, regardless of the classifier employed, we achieve high recognition accuracy in the final recognition results, effectively meeting the requirements of the respective task.Red culture dissemination, as an important part of China's revolutionary history and socialist construction, possesses both historical uniqueness and the need for integration with contemporary society's needs and development [28].In order to ensure its effectiveness and adaptability, attention should be given to key elements in the process of red culture communication.Firstly, red culture communication should emphasize diverse communication channels.While traditional forms of communication, such as red culture exhibitions and commemorative activities, retain their importance, the emergence of social media and digital technology necessitates the active utilization of these new media platforms for broader red culture dissemination.Through thoughtful planning and design, sharing multimedia content, including stories, audio, video, and pictures related to red culture, on social media can better capture the attention and engagement of the younger generation.Secondly, red culture communication should focus on innovative ways of expression.Traditional approaches often adopt a narrative and documentary style.However, in today's information age, people seek novel, creative, and participatory content.Thus, combining art, cultural creative industries, and new media technologies can create more appealing and interactive forms of red culture expression.For instance, virtual reality technology can immerse audiences in historical red culture scenes, while the design and promotion of cultural creative products can combine red culture elements with fashion, art, and other popular trends to capture the interest of a wider audience.Additionally, red culture communication should prioritize personalized needs and participation.When people receive information and cultural content, they tend to resonate and interact with it.Therefore, in the process of red culture communication, it is important to consider the interests and needs of different individuals and provide personalized content and experiences that inspire their participation and sharing.Customized red culture experience activities, online interactive discussions, and community building can foster a sense of participation and belonging among viewers and users, leading to a more positive communication effect.It is evident that introducing more artificial intelligence technologies can enhance these processes and improve overall communication efficiency, as illustrated in Figure 8, enabling users to have a better experience of the charm of red culture.

VI. CONCLUSION
In this paper, we have focused on studying user interests in the context of red culture communication.We have proposed a user classification framework that combines BI-GRU and CNN for feature extraction.The framework utilizes user portraits, such user retrieval information, and integrates text and image features extracted from user browsing information.The BI-GRU model is employed for text feature extraction from browsing information, while the CNN model is used to analyze user browsing images.By fusing these features with user portrait information, we construct multi-source information feature vectors.The interests of red culture learning users are then classified using a BPNN, achieving recognition accuracies of over 90% for the three interest categories: red culture, history, and economy.The performance of the framework was also evaluated through actual tests, yielding recognition rates of over 90% overall and over 95% for users interested in the economy.As discussed the effect of the different data fusion strategies, four features combination using CNN and BI-GRU can reach the most satisfied result when comparing with These results provide valuable data support and new insights for user profile construction and the design of user recommendation mechanisms in future red culture dissemination.
However, it is important to note that the identity of red culture learning users is complex, and their interests may extend beyond the aforementioned three categories to include other forms of red culture content dissemination.Further analysis is required to explore this aspect in future research.Additionally, expanding the data samples, enhancing the model's robustness, increasing the recognition types, and improving accuracy are areas that need further improvement.
Xiao et al. proposed an implicit feedback model for displaying feedback, introducing the alternating least squares optimization method ALS-WR [18].Lu et al. presented a Bayesian probability tensor framework for tag and item recommendation, which better captured potential interaction patterns [19].Zheng et al. mined shared check-in patterns from users in different regions and utilized these patterns to explore more similar users across regions [20].They focused on studying collaborative filtering recommendation algorithms based on interest points, aiming to recommend interest points with similar characteristics to a user's preferred interest points.Zhao et al. incorporated co-occurring geographic influence to filter geographically noisy interest points and integrated it into a personalized pairwise preference ranking matrix decomposition model [21].Li et al. integrated deep neural networks, migration learning techniques, and density-based resampling methods into a unified framework to develop a novel deep neural network model for cross-city point-of-interest recommendation [22].

FIGURE 1 .
FIGURE 1. Framework for the user classification.

FIGURE 2 .
FIGURE 2. The structure of the GRU.

FIGURE 3 .
FIGURE 3. The training loss and precision for the proposed framework.

FIGURE 4 .
FIGURE 4. The result for the user interest classification.

FIGURE 5 .
FIGURE 5.The method comparison for the classification.

FIGURE 6 .
FIGURE 6.The feature fusion comparison for the different method.

FIGURE 7 .
FIGURE 7. The confusion matrix for the user interest recognition in the practical test.

FIGURE 8 .
FIGURE 8. What AI can do in the culture transmission.

TABLE 1 .
Feature fusion result for combing two features.

TABLE 2 .
Feature fusion result for combing three features.