Radar Self-Following Shopping Cart Based on Multi-Sensor Fusion

Long queues of people waiting for a settlement in global shopping supermarkets are bustling with traffic. In order to further improve operating efficiency and reduce the cost of manual operation of the supermarket, this article proposes a multi-sensor integration shopping cart system, which mainly includes image recognition, radar self-following technology and road patrol. The improved SIFT algorithm effectively identifies a variety of commodities. At the same time, relying on the public cloud platform, discount information is recommended through WeChat programs, and the development of member users login interface. After comprehensive calculation, the total price is displayed by the EAIDK-610 user interface edited by Tkinter to complete the checkout. The LD-14 radar used can realize the functions of autonomous follow-up and pedestrian obstacles. When a sensor in the system fails or the detection deviation is too large, the multi-sensor fusion technology can ensure the normal operation of the system due to a certain degree of redundant. Improve the credibility and effectiveness of the data as a whole. This not only solves the problem of queuing and checkout in the supermarket, but also helps brand merchants to achieve precise marketing and scenario access. Finally, the use of computer vision technology and various embedded hardware devices to build a radar self-following the hardware prototype of the shopping cart for experimental verification. The average accuracy of APs in the system is as high as 98.31%and 97.28%, respectively.


I. INTRODUCTION
In today's society, supermarket shopping carts, as a tool for meeting people's consumption needs in the sales model of supermarkets, play a crucial role in promoting the future unmanned process. Their manufacturing level and popularity have become the benchmark for measuring digital services in a smart city [1]. The O2O (Online to offline) model [2] has successfully transformed many traditional enterprise models, bringing consumers a better consumption experience. Therefore, the reform of conventional supermarkets is imperative. Based on existing enterprise operations, optimizing the pain points in services and continuing the current change in payment method diversity [3]. Effectively analyzing the The associate editor coordinating the review of this manuscript and approving it for publication was Adamu Murtala Zungeru . advantages and disadvantages of offline supermarkets, amplifying their benefits, integrating and changing their weaknesses, and combining them with mobile intelligent platforms can better serve consumers [4].
With more and more research focusing on smart supermarkets, the deep integration of intelligent and fast internet technology, machine learning, multi-sensor perception technology, and computer vision processing has become a research hotspot. Digital technology is developing rapidly in intelligent supermarkets. This paper [5] has built an overall framework for understanding smart services' primary forms and dimensions. The service industry's nature and the service economy's background have created a conceptual system model for the critical dimensions of intelligent services and intelligent service platforms. This paper [6] takes Alibaba's Hema Fresh New Retail industry as an example, combining fresh supermarkets and catering experiences with online business warehousing and analyzing the reconstruction path of the new retail industry driven by big data technology. It is found that these three essential elements evolve and cooperate under the support of new technology, providing consumers with new consumption scenarios. This paper [7] establishes the comprehensive impact of shopping motivation, store attributes, and demographic factors on the prediction results of store format selection in Vietnam. Design a dichotomy and use a logistic regression model to determine the impact of the above factors on the prediction results of traditional markets or supermarkets purchasing non-food and fresh food.
The commercialization of smart cities, as the mainstream direction in the next stage of social development, is inseparable from promoting digital services and smart city construction [8], [9], [10]. Digital technology has become a trend in the development of future intelligent supermarkets [11]. Smart Commercial Supermarket can help consumers estimate the total cost of purchasing commodities, reduce billing errors, and improve shopping efficiency by adding a QR code scanning checkout function to shopping vehicles [12], [13], [14]. This paper [15] proposes a self-unloading robot system to apply digital technology to the logistics process of supermarkets, which achieves horizontal or vertical grasping of commodities of different sizes. A structured execution program is used to modularize and layer the queuing of commodities, and a heuristic strategy is adopted to adjust the order of commodities. This paper [16] uses tools like Arduino to design a cashier application that integrates servers and databases, observing various variables to simulate the process increases its efficiency with the increase of commodities and consumer numbers. In the future, the system will also be promoted to industrial research and development scenario centers.
During the epidemic periods, online shopping has replaced supermarkets and shopping malls to reduce population exposure [17], [18]. Intelligent shopping tools can accelerate the landing of unmanned supermarkets. In the context of normalized epidemic prevention and control, ensuring that prevention and control measures are in place, reducing population contact, and ensuring normal economic development [19].
Embedding intelligent commodities in offline large-scale supermarket services and replacing manual services with artificial intelligence can not only reduce supermarket operating costs and save labor costs but also improve consumers' shopping experience and efficiency [20].The author proposes a lightweight and low-cost intelligent checkout shopping vehicle based on mobile cloud computing and deep learning cloud services, which utilizes the YOLOv2 deep learning network's Linux cloud server for cloud image recognition [21]. This paper [22] proposes a passive RFID based intelligent shopping vehicle that utilizes an ID3 decision tree algorithm to classify consumer shopping lists and determine discounts, minimizing the number of queues at the checkout counter; Based on multi-sensor information fusion technology and servo motor technology, In [23], the author has designed an intelligent cart that integrates photoelectric sensors, ultrasonic sensors, and small SLR cameras, which can achieve automatic integration of car following shopping, and payment. This paper [24] proposes a shopping list based on a radio frequency identification (RFID) sensor, Arduino microcontroller, Bluetooth module, and mobile application. Each of commodities information is displayed in the mobile application, and consumers can easily manage the shopping list in the mobile application according to their preferences.
In stereoscopic image recognition, super-recognition methods based on convolutional neural networks have developed rapidly. Literature [25] applied the expectation maximization attention mechanism to image super-recognition for the first time, and proposed a progressive multi-scale feature extraction block (PMSFE) to extract feature information of different scales, reduce recognition parameters and improve visual quality. Literature [26] utilizes local and global crossview features to propose a feature that captures different spatial levels in different views across view blocks, which outperforms current state-of-the-art SOTA stereo image SR methods in terms of reconstruction quality and efficiency. Further, the literature [27] applied memory learning to Stereo SR, and proposed an interactive memory learning strategy to transform image features into latent spaces and establish their corresponding relationships, effectively improving the quality of stereo image detection.
In order to better enhance the consumer service experience, this article combines the problems consumers currently encounter in supermarkets with shopping tools, analyzes the service process of supermarkets and consumers' shopping process, and determines that the design inspiration comes from the design of supermarket shopping carts in unmanned sales mode. The checkout function is added to the shopping cart, and you can scan the QR code for checkout through wireless payment at any time to complete shopping.
The main contributions of this article are: 1. Compared with the traditional QR code self-checkout, it improves shopping efficiency, shortens the time for consumers to queue up for checkout, and enhances the shopping experience; Real-time recognition of images captured by the camera is realized, which not only ensures the accuracy of the commodity, but also does not care about the status of the purchased commodity. The YOLOv4 detection model based on a portable dataset is twice as fast as EfficientDet with comparable performance.
2. The proposed shopping cart system integrates payment and loading. Compared with traditional infrared following, LD-14 radar following can achieve multi-angle following with higher sensitivity and is more suitable for complex and changeable shopping malls extending in all directions. 3. A control scheme based on cloud remote data interaction is provided. Since the owner's permission is logged in through the WeChat applet and obtained in the cloud, it can prevent shoppers from pushing other shopping carts by mistake 77056 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
during the shopping process, increase the efficiency and accuracy of commercial supermarket operations, and reduce the cost of hiring shopping guides, which provides a reference for improving the operation of commercial supermarkets in the future.
4. Using the Internet of Things technology, a hardware smart car prototype with a multi-device management platform and multi-sensors is designed. The management platform can realize the information connection between consumers, embedded device location information, WeChat applets and database platforms. The intelligent identification system and the motion control system use EAIDK-610 and Stm32 main chips respectively. The 610 chip uses the RK3399 with high-performance Arm SoC and is equipped with the OPEN AI LAB embedded development platform. Compared with the EAIDK-310 intelligent processing system, it focuses on for machine vision processing. And compared to single-sensor systems, machine vision requires less processing time.
The remaining structure of this article is as follows: The second part introduces the relevant work; The third part presents the system from the aspects of hardware, mechanical architecture, and software and introduces the linkage technology between the entire multi-sensor and IoT system; The fourth part introduces the experimental results, and finally, we summarize the work of this article.

II. RELATED WORK A. IMAGE PROCESSING
Currently, the SIFT algorithm is the most commonly used feature point detection method in the intelligent container detection of commercial supermarkets. This algorithm has scale invariance and can detect critical image points by constructing a Gaussian difference pyramid. The algorithm process is as follows.
Firstly, sample the input image to obtain images of different scales, and stack these images in a pyramid shape from bottom to top, from large to small. Construct a Gaussian pyramid using the Gaussian convolution formula (equation (1)).
Then use the DOG pyramid (equation (2)) to construct a scale space, with each octave one layer less than the Gaussian pyramid. The difference between the first and second layers of the Gaussian pyramid calculates the first layer of the first set of DOG pyramids. Repeat the above operation to construct the DOG pyramid.
D(x, y, σ ) = (G(x, y, kσ ) − G(x, y, σ )) * I (x, y) (2) Among them, σ is the scale space factor, determining the degree of image blur and smoothness. After the previous operation, many points were retained, and these points were compared with the points in their neighborhood, taking the extremum points as feature points-filter by setting a threshold to remove points with poor contrast and stability. The comparison of feature points mainly involves calculating the gradient amplitude (equation (3)) and gradient direction (equation (4)) between points and neighboring points.
Gradient direction calculation formula: In the formula, L represents the scale space where the feature points are located, m(x,y) and θ(x, y) represent the amplitude and direction of the gradient at (x,y), respectively.
A multi-scale convolutional model was constructed based on the SIFT algorithm and the single feature pyramid model in the FPN network. The gradient histogram was obtained based on the size and direction of the gradient, and the input image was multi-scale sampled to determine whether it can be matched based on the threshold. While ensuring its robustness, the number of detectable feature points was increased, improving the accuracy of feature point detection. In the process of extracting image features through convolution operation, considering that the change in image resolution may increase the time consumed for feature detection, the model detection speed is improved by selecting key focus areas on the image. The scale space mentioned in the SIFT algorithm is the concept of simulating human eye observation of objects, focusing on the main parts of the target to obtain essential detail information and placing more emphasis on local detail features. In convolutional neural networks, attention mechanisms mainly provide work efficiency by allocating attention resources reasonably, similar to the concept of scale space.
The attention mechanism can be divided into two approaches: soft attention and hard attention. The soft attention mechanism focuses on all data without setting filtering VOLUME 11, 2023 77057 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
conditions and distinguishes primary and secondary by calculating corresponding attention weights; The hard attention mechanism will filter out a portion of attention that does not meet the requirements based on the attention weight, that is, it will no longer focus on these non-compliant parts.
First, after the convolution operation, a batch of characteristic graphs (channels) will be obtained. The global average pooling is used to process the characteristic values, and then the activation function (equation (5)) is used to turn each twodimensional channel into an actual number, which represents the global distribution of responses on the characteristic channels. The next step is the Excitation operation (equation (6)).
The purpose of this operation is to assign different weights to different channels and enhance attention to key channel domains.
However, in the process of commodity recognition, the more feature points there are, the better the reconstruction effect. Therefore, while ensuring detection speed, increasing the number of detected feature points is also necessary. In order to avoid forgetting some regional feature points during the detection process, a direct mapping method is adopted to prevent the loss of image features as the network layer deepens.
To obtain identity maps, Make H (x) = F(x) + x, When F(x) = 0, H (x) = x; this achieves identity mapping. So the learning objective is changed to optimize the difference between H (x) and x (i.e. residual), and the training objective is to make the residual result approach 0, ensuring that there is not a significant loss of image features while ensuring that the accuracy does not decrease due to the deepening of the network layer.

B. TARGET DETECTION
In deep learning algorithms, the main problem to be solved in image target recognition is to use deep convolutional neural networks and construct YOLOv4 networks to extract image features. Analyzing the existing YOLOv4 network essentially involves feature layer extraction and decision inference. In order to extract features as accurately as possible, it is necessary to combine these network layer structures.
The Figure 2 shows the overall block diagram of the YOLOv4 object detection algorithm. An object detection algorithm is usually divided into four general modules, including input, reference network, Neck network, and Head output.
1. Mosaic data augmentation is added for the input data, with each image having its corresponding sample box. Four images are concatenated to obtain a new image, and the corresponding sample box is received. This new image is input into the neural network for learning. This method dramatically enriches the background information of the detected object and can simultaneously achieve standardized BN (batch normalization) calculations for four image data.
2. The benchmark network CSP module can first divide the feature mapping of the basic layer into two parts and then merge them through a cross-stage hierarchical structure. This not only reduces the computational workload but also ensures the accuracy of the model. It enhances the learning ability of CNN networks and lightweight models while maintaining model accuracy, Reduces the computational bottleneck of the entire model, and Reduces the memory cost of the algorithm.
The Dropblock used in YOLOv4 is also a regularization method to alleviate overfitting. Dropblock is an improvement on the data enhancement method of Cutout. The main idea of Cutout is to clear some areas of the input image to zero, while the innovation of Dropblock is to apply Cutout to each feature map. It is not a fixed return to zero-ratio but rather a tiny ratio that increases linearly during training.
The three feature maps are the detection results output by the entire YOLO, with the detection box position (4 dimensions), detection confidence (1 dimension), and category (80 dimensions) all included, adding up to precisely 85 dimensions. The final dimension of the feature map, 85, represents this information, while the other dimensions of the feature map, N*N*3, represent the reference position information of the detection box, and 3 are three prior boxes of different scales.
YOLOv3 used K-means clustering to obtain the size of prior boxes, while YOLOv4 continued this method by setting three prior boxes for each downsampling scale, resulting in a total of 9 sizes of prior boxes being clustered. Assign the first three prior boxes to the feature map of 76*76*255, the middle three boxes to the feature map of 38*38*255, and the last three boxes to the feature map of 19*19*255.
The detection boxes x, y, w, and h can be decoded with a prior box and output feature map.
Among them, σ (t x ) and σ (t y ) are the offset based on the grid coordinates of the upper left corner of the bounding box center point. Based on the four offsets predicted by the bounding box, the actual position of the box can be calculated using the above formulas.

C. RADAR MODULE
2D Radar has a high scanning rate of 50Hz and can detect large protrusions or obstacles at high speed. The 360 • scanning angle suits positioning, area monitoring, or collision avoidance applications. The extremely high Angular resolution and accurate measurement provide accurate navigation for autonomous carts or mobile robots. The distance data is received using a separate TCP/IP channel as the data packet, which can meet the software development requirements.
In order to ensure that the trajectory of the smart cart accurately tracks shoppers in real time, this commodity adopts SLAM technology that does not rely on satellite navigation. It refers to the smart car continuously obtaining environmental observation information through sensors in unknown environments, gradually constructing environmental maps, and eliminating positioning errors on the basis of repeated observations. The upper computer exchanges and controls data between the upper and lower computers through TCP communication protocol, Measure the position information of the cart body using the cart body distance sensor and connect it to the WiFi module through a serial port. The processed Radar data is fused with the Gmapping algorithm of the 2D SLAM algorithm to construct a two-dimensional grid map. The intelligent cart is connected to the front through Bluetooth and displayed on the WeChat mini program interface.
It mainly consists of a laser ranging core, a wireless power transmission unit, a wireless communication unit, an angle measurement unit, a motor drive unit, and a mechanical casing.
Firstly, the ranging core adopts triangulation technology, which can perform ranging 2300 times per second. During each range, the radar emits an infrared laser at a fixed angle, which is reflected back to the receiving unit when it encounters the target object. Calculate the distance through the triangular relationship formed by the laser, target object, and receiving unit. After obtaining distance data, the angle values measured by the radar fusion angle measurement unit form point cloud data, which is then sent to an external interface through wireless communication. At the same time, the motor drive unit will drive the motor and closed-loop control it to the specified speed through the PID algorithm.
In Radar, the laser emits pulses, which are reflected from the target and reach the photodetector, converting them into electrical signals to output, thereby measuring the target's distance. The distance formula is: In the formula, R represents the distance; c is the speed of light in vacuum; n is the refractive index of the propagation medium; △t is the flight time.
LD-14 supports both internal and external speed control. the external speed control is used in this design, and a square wave signal needs to be connected to the PWM pin. The start, stop, and speed of the motor can be controlled through the PWM signal duty cycle. Conditions for triggering external speed control:1. Input PWM frequency is 15-30K;the duty cycle is within the range of (45% and 55%) (excluding 45% and 5%), and the continuous input time is at least 100ms. After triggering external speed control, it remains in the external speed control state; At the same time, speed control can be achieved by adjusting the PWM duty cycle.
Select the radar to follow mode after the cart enters the control program. The cart will start moving and automatically follow the object closest to the cart within a 360 • range. However, the car does not directly collide with the nearest object but maintains a certain distance from the nearest object. The default distance value is X cm. If the distance from the nearest object exceeds this distance value, the car will follow, and if the distance from the nearest object is less than this distance value, the cart will retreat.

D. RADAR OBSTACLE AVOIDANCE
After the cart starts the control program, select Radar Avoidance Mode. The cart will have an initial forward speed, and VOLUME 11, 2023 77059 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  if encountering obstacles ahead, it will rotate to avoid them. However, the car's automatic obstacle avoidance function during remote control is temporarily unsupported.
Considering that the appearance of obstacle vehicles on the road is mostly irregular shapes, it is difficult to directly model and quantify them for collision detection during obstacle avoidance. Therefore, it is necessary to appropriately simplify the appearance and shape of the vehicles. This article selects a rectangular model for collision detection in the obstacle avoidance process of intelligent carts. To determine whether the intelligent cart will collide with obstacles during obstacle avoidance, it can be converted to whether the rectangular vertices of the intelligent cart appear in the rectangle of the obstacle vehicle in the obstacle avoidance trajectory. If the vertex of the intelligent cart overlaps with the obstacle rectangle during the obstacle avoidance process, the obstacle avoidance trajectory will collide with the obstacle vehicle; If there is no overlap between the vertex of the intelligent cart and the obstacle rectangle during the obstacle avoidance process, the obstacle avoidance trajectory is safe and will not collide with the obstacle vehicle. When the vehicle turns, the actual path is basically the same as the reference path. In order to avoid collisions when passing the first corner, the actual path deviates from the reference path. At the same time, after turning the first corner, obstacles in front are observed. Local path planning tends to avoid obstacles, and after avoiding obstacles, the actual path gradually approaches the reference path. After passing the second corner, it smoothly reaches the vicinity of the target point. Applied in commercial supermarket scenarios with good results.

E. MULTI SENSOR FUSION TECHNOLOGY
The basic principle of multi-sensor data fusion technology is to combine and process various types of sensor data at multiple levels and in different spatial information to complement and optimize each other, ultimately forming an explanation for the consistency of observed environmental information. In the entire process of data fusion, it is necessary to make full use of the data feedback from multiple sensors. The fundamental goal of sensor data fusion is to export more effective information by combining different types of information detected by different sensors at multiple levels and aspects. It covers multiple fields of knowledge, including signal processing, statistical estimation and inference, deep learning, and more. Data fusion integrates data from multiple sensors, so it has the following advantages over single-sensor data processing: 1. When a particular sensor in the system malfunctions, the system can continue to operate due to a certain degree of redundancy; 2. It can enhance the overall credibility and effectiveness of data. 77060 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. As shown in Figure 9, we adopt a multi-sensor fusion approach to achieve tracking of shopping carts. 2D radar helps intelligent shopping carts perform multi-angle tracking functions. Compared to traditional ultrasonic multi-sensor cooperation, 2D radar integration solves the problem of tracking accuracy. The shopping cart and infrared sensor cooperate to achieve users' self-inspection line homing after purchasing commodities, breaking away from traditional manual transportation.
The intelligent shopping cart target detection system integrates three sensors: a camera and a millimeter wave LD-14 radar. Supports 3D point cloud detection and connects to the host using a USB interface. Its weight is only about 100g, which will not put pressure on the load of the shopping cart. In addition, ultrasonic ranging is connected to the target detection system, fixed on the same plane as the detected sensor, for correcting the target orientation when the sensor is tilted. The camera uses an industrial camera, which is also connected to the host through a USB port to provide image information. The sensor system composed of a camera and millimeter wave radar in the target detection system is fixedly connected and installed on a three-dimensional platform. Through the cloud platform, it is possible to maintain the vertical stability of the sensor in three-dimensional space and maintain the data quality of the sensor.
The purpose of multi-sensor fusion is to improve accuracy by integrating the results of two types of sensors for human detection. Compared to a single sensor with low detection range or insufficient detection accuracy, the fused detection has higher tracking robustness to environmental changes and pedestrian interference. For the two sensors that are working normally, the fusion module fits their output results with the most single target. When a certain sensor malfunctions or has a large detection deviation, it avoids causing system crashes and ensures the normal operation of the system. After joint calibration of Radar and camera, the target points obtained from two sensors can be mapped to the same plane. At the same time, the computational cost of the visual recognition algorithm is controlled within an extremely low threshold, without generating excessive computational load. The calculation process is simple, and the detection results of a single sensor are accurate without occupying system resources, effectively improving the stability of system operation.
As is well known, the multi-sensor fusion autonomous tracking system based on cloud interconnection is an environmental sensing system. The following autonomous function can be entirely independently implemented, and the entire system recognizes commodities. In the next section, we provide the overall design framework of the system.

III. SYSTEM OVERVIEW
This article designs an intelligent system based on autonomous tracking, including control, cloud remote commands, and intelligent edge processing. The hardware structure of the entire platform is shown in Figure 10. The autonomous tracking system includes an intelligent control system and data management system (including pattern recognition and Big data management).
Motion control system: The upper computer of the control system, EAIDK-610, undertakes the motion monitoring task of the entire intelligent vehicle system, including humancomputer interaction, information connection, data display, sending control commands, and other functions. The motion control of the lower computer Stm32 is mainly responsible for transmitting intelligent vehicle motion commands, transmitting pulse commands, and processing feedback signals. The lower computer motion control system includes STM32F103C8T6, AT8236 motor drive board, and DC motor JGB-520.
Radar following system: The LD-14 Radar obtains its position information and target object (consumer) position information in an unknown environment. The lower computer acquires the processed target position information and sends commands to the Stm32 board. Stm32 receives the signals to provide motor drive board signals, controlling the intelligent shopping cart to follow the target object automatically.
Intelligent detection system: The entire intelligent control system includes modules such as EAIDK-610, LCD display screen, camera OV9750, etc. As the main host computer, EAIDK-610 is responsible for deploying the YOLOv4 training model, calling cameras to identify and detect commodities in real time. At the same time, due to its rich peripheral interfaces, it communicates with the cloud through WiFi modules to upload checkout data, control the unlocking signal of shopping carts, and as the main carrier of the edge intelligence hardware platform, it can accelerate the implementation of scenario-based AI commodities in terminal applications.
Cloud platform system: abandoning the traditional reliance on ''coins'' to mechanically unlock shopping carts, the unmanned shopping cart system is unlocked through the coordination of mini programs and the cloud after members log in to the system. By recording user purchase records through the cloud, optimization of supermarket warehousing and personalized discount push for users can be achieved. The mini program can register the delivery address of the commodities, and after shopping, you can choose to have the merchant deliver it to your home uniformly. According to user needs, the purchased commodities can be queried online, and reimbursement vouchers can be printed at any time.

A. MECHANICAL STRUCTURE
The flexible integration of intelligent vehicle structures is the main task of mechanical design, and the primary consideration is the layout of LD-14 radar and sensors. From the perspective of sensor measurement, how to achieve humanmachine interaction is considered. Improving and innovating on the basis of the original vehicle model, the design of the mechanical structure of unmanned vehicles includes the design of the body, the layout of the hardware structure, and the design of sensors and human-machine interaction.
The flexible integration of intelligent vehicle structures is the main task of mechanical design, and the primary consideration is the layout of LD-14 radar and sensors. From the perspective of sensor measurement, how to achieve humanmachine interaction is considered. Improving and innovating on the basis of the original vehicle model, the design of the mechanical structure of unmanned vehicles includes: 1. The design of the body.
2. The layout of the hardware structure.
3. The creation of sensors and human-machine interaction. The tracked mobile chassis used in this article can adapt to complex road conditions with obstacles and is widely used in military, firefighting, industrial and agricultural fields. It has the following advantages: 1. The contact area between the track and the ground is large, the grounding pressure is slight, and the sinking degree is small on soft and uneven roads, resulting in good passing performance.
2. The tracked mobile chassis can use the differential speed of the tracks on both sides to achieve steering, a smaller turning radius, and even turning in place.
3. There are treads on the support surface of the track, which are not easy to slip and have a high traction force.
When the motor drives the drive wheel to rotate, the drive wheel continuously rolls up the track from the rear through the meshing between the teeth on the drive wheel and the track chain under the action of the reducer driving torque. The grounded part of the track exerts a backward force on the ground, and the ground correspondingly exerts a forward reaction force on the track, driving the motor forward. When the driving force is sufficient to overcome walking resistance, the supporting wheel rolls forward on the surface of the track, causing the machine to move forward. The front and rear tracks of the crawler running mechanism of the whole machine can control the steering independently, reducing the radius of gyration.
When the track moves without sliding on the ground, the vehicle's travel speed is equal to the movement speed of 77062 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. the platform frame relative to the grounded track, which is numerically equal to the speed of the track winding motion. When there is no relative sliding of the track on the ground, set the average travel speed of the smart shopping cart to V T , and the average travel speed of the vehicle is numerically equal to the average speed of the track winding motion: Among them, L t is the chain track pitch (m), K is the angular velocity of the driving wheel (s −1 ), and nK is the driving wheel speed (r/min). At the same time, as the number of meshing gears of the driving wheel increases, the speed of the track winding motion approaches its average speed and tends to a constant.

B. SOFTWARE SYSTEM
EAIDK-610 serves as an ARM China-embedded artificial intelligence technology development platform. EAIDK-610 adopts the Linux system Fedora, equipped with the deep learning inference framework Tengine and the lightweight CV acceleration library Bladecv. It can accelerate the implementation of AI commodities and the implementation of scenario-based applications and provide a unified interface for artificial intelligence applications.
The entire system consists of three hardware components: a webcam, EAIDK-610, and a display terminal. The network camera captures high-definition commodity images, analyzes and encodes the images locally, and transmits them to the image processing device EAIDK-610 through Ethernet. EAIDK-610 is used to receive transmitted image data, decode and analyze it, and then display the target detection results in real time on the display terminal through the image interface. The software is developed in C++, carrying OpenCV and TensorFlow, deploying trained models, and optimizing and accelerating calculations using the Tengine framework.
When the system runs, the webcam can use OV9750 to capture commodities images, encode them, and transmit them to EAIDK-610 through Ethernet. After decoding a series of images, EAIDK-610 performs object detection on each frame, frames the position of the target object, and accurately analyzes the category of the target object. Finally, the calculated image is displayed in real-time on the display terminal.
Due to the need to enhance the driverless device, a WeChat mini program user interface was developed on this basis, which displays the shopping cart's driving path and determines the parking position through data interaction with the cloud platform in order to facilitate remote control instructions for users.
Calculate the Euclidean distance between each descriptor in the test graph and all descriptors in the template graph. If the ratio of the nearest neighbor's distance to the next nearest neighbor's distance is less than a certain threshold, the feature point corresponding to the nearest neighbor's distance is considered a matching point. The template image with the most remaining matching points after screening is the recognition result, and the recognition process is shown in Figure 13.
When obtaining SIFT feature vectors, the same point may correspond to multiple directions, thus belonging to different feature points. All or some of them may produce correct matching pairs. However, in fact, they are the same point, so it is necessary to remove the above duplicate matching points according to the pixel coordinate values. Random sample consensus Algorithm (RANSAC) can effectively remove most of these duplicate matching points. As one of the effective algorithms for eliminating image mismatches, it is widely used.
The software platform is designed with a structured programming method. The software system is divided into three platforms: intelligent image processing, multi-sensor fusion, and multi-data management. Deploy the YOLOv4 trained model on EADK-610 and collaborate with the improved SIFT algorithm to complete the image processing of the commodity. Multi-sensor fusion is achieved by STM32C8T6. The data docking between embedded devices, WeChat mini programs, and database platforms can be performed on multiple device platforms. Use an LCD screen to display commodity information collected by the camera and display recognition results to enhance interaction between smart devices and users.
Identifying the stability and reliability of commodities is a necessary condition for promoting shopping carts. However, due to the influence of collateral or the surrounding environment, the use of feature point detection methods cannot perfectly meet the requirements. Therefore, we have created a dataset based on the types of frequently sold commodities in stores as templates and shot physical images of five commonly used commodities for offline purchase, namely apples, bananas, Coca-Cola, paper, and mineral water, to distinguish them. This design uses five types of commodities as templates to capture datasets and annotate rectangular boxes using the Labelimg(a labeling tool for image preprocessing) annotation tool (custom labels must be developed for each image preprocessing) and saves XML files in PASCAL VOC format. A fast image annotation method needs to be introduced to achieve image matching and acceleration using bounding boxes. Finally, compare the results of the five models mentioned above.
Five commodities, such as apples, are marked in Figure 16. As an image annotation tool, labelimg can save the generated annotations as Pascal VOC format XML files without the need for secondary conversion. To achieve good recognition accuracy, a rich dataset is needed to construct a familiar model. We take five commodities as examples to VOLUME 11, 2023 77063 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. verify the effectiveness of the established model and achieve comprehensive detection. This experiment requires setting YOLOv4-related parameters before model training. The Learning rate mechanism is adopted, and the size of the image input for training is 800×800, a total of 100 training cycles (Epoch), the batch size of the training cycle is 8, and the initial Learning rate is 1e −3 (if the Learning rate is too high, the jump will be too large, the accuracy will decline, and the best effect cannot be achieved). The data set score is divided into training sets and test sets according to a certain proportion, which is 215 and 5, respectively. The model parameter setting table is shown below. 77064 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   The specific process is as follows: After the consumer has placed the commodity and confirmed it, the system calls the camera to collect the commodities image and perform image preprocessing. The commodities detection module detects their images and cuts several sub-images of the commodity from the original image based on the detection results. The commodities recognition module extracts feature from each sub-image of the commodity and retrieves and match them in the commodities feature library file to obtain the category of the commodities to be recognized. Finally, the bill settlement module uses commodities categories to search for information, such as commodities prices, in the database and summarizes them to form a bill.
The integrated platform for interconnected software and hardware systems designed in this article includes three important functional modules: user login module, cloud sharing platform module, and interconnected data management module. The user login module is a channel for direct interaction between intelligent systems and users, which needs to be easy to use, fully functional, and easy to control.
The development of WeChat mini programs is based on a set of application frameworks provided by WeChat. WeChat provides a complete set of Javascript APIs for the upper layer by encapsulating the essential functions provided by the WeChat client, such as file system, network communication, task management, and data security. This allows shoppers to easily determine the quantity and type of commodities they purchase, sense their preferences, and track the location of shopping carts in real time, achieving data visibility on mobile terminals. It greatly enhances consumers' shopping experience.
The user's checkout query relies on the storage of public clouds, and personalized discounts can be pushed for the commodities that the user often purchases based on their habits. You can choose delivery services through a mini-program if you have purchased too many commodities. Firstly, there is the user login interface, where logging into the membership system allows you to see whether to unlock the shopping cart and whether to receive promotional promotions.
The online settlement subsystem is one of the core components of the commodity recognition system. The detection model uses the YOLOX model trained on the dataset settlement graph, and the feature extraction network model uses the improved MobileNet-V2 feature extraction network structure trained on the dataset commodity graph. Moreover, feature extraction is performed on all commodity images in advance, and the obtained feature vectors are stored to form a commodity feature library file.   The flowchart is shown in Figure 20, after the consumer has placed the commodities and confirmed them, the system calls the camera to collect the commodities image and perform image preprocessing. The commodities detection module detects the commodities image and cuts several sub-images of the commodities from the original image based on the detection results. The commodities recognition module extracts feature from each sub-image of the commodities and retrieves and match them in the commodities feature library file to obtain the category of the commodities to be recognized. Finally, the bill settlement module uses commodity categories to search for information, such as commodity prices, in the database and summarizes them to form a bill.
Firstly, consumers log in to the mini-program, establish connectivity with the smart car, and unlock it, allowing the shopping cart to enter follow mode. Consumers can enter the ''My Home Page'' and choose to purchase commodities based on discounts or personal preferences, and click ''Publish to Cloud.'' Users who enter the system can synchronously browse the commodities types and resources of the commercial supermarket in their interface, determine the location of the commodities they need to purchase, and add them to the shopping cart in the mini program.
After shopping, when consumers arrive at the supermarket exit, they enter the payment password on the mini program to checkout automatically. When users purchase fewer commodities, they can organize and pack them themselves to go home; When users purchase too many commodities, they can choose the delivery service. The shopping cart will activate FIGURE 20. Self following shopping cart process usage flowchart.
the self-inspection function to reach the packaging point and then be delivered home by the delivery staff. After the checkout is completed, the shopping cart will be locked, the internet will be logged out, and the shopping will be completed.
The backend management subsystem integrates four functional areas: account settings, commodities management, inventory management, and information statistics. Each area has one or more functions available for use. The commodities management functional area is the core of backend management and also the innovation of the entire system.
Onenet is positioned as a PAAS (Platform as a service) service, which aims to build an efficient, stable, and secure application platform between IoT applications and real devices. It is device-oriented, adaptable to various network environments and standard transmission protocols, and provides fast access solutions and device management services for different hardware terminals.
This design scheme stores and matches the control data flow to ensure real-time access to location data and user information of WeChat mini programs and EAIDK-610, achieving intelligent interaction and remote control of the entire intelligent shopping cart. We need to provide the corresponding data stream key to access users through Cloud data.
The WeChat mini program at the end of information transmission is interconnected with the cloud platform and connected to the upper computer EAIDK-610 using a WiFi module. There are two possible working conditions when considering issues with the cloud platform network. Firstly, when the mobile phone is powered on and logged into the network, the first attempt to access the network fails. The transportation cloud platform terminal sends a message to the upper computer EAIDK-610 through the WiFi module, calling the stop function module program. The smart shopping cart brake is disabled until the consumer successfully joins the network and unlocks the lock status. Secondly, during the operation of the smart car, the wireless network of the cloud 77066 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   platform is interrupted, and the terminal device calls the main function to capture the event NWK_JOIN_REQ processes this event in the network layer data processing function, reinitializes the mini program in the mobile terminal, and the network continuously loops through the rediscovery network access function until the consumer's mobile end joins the network. In event NWK_JOIN_ REQ processing program, a stop command is sent in UART protocol data format to control the 610 main controller to run the stop function, clear all status bits of the smart car, and ensure it stops in place to prevent safety and quality accidents when the smart car is not under control. Ensure stable and reliable operation of the system.
Due to the fact that the cloud platform, as a network establishment module, is always connected to the WeChat mini program terminal module, but during this process, it is not possible to verify that the network sub nodes are down. During normal use, the WiFi module sends data interaction commands to the transportation controller 610 through the wireless cloud platform network at specific intervals to verify normal network communication. If the interaction fails, the mini program will display a communication error, reminding consumers that if the shopping cart malfunctions, they need to replace it with another shopping cart.

IV. EXPERIMENTS AND RESULTS
This chapter evaluates the test results using experimental procedures. It verifies the accuracy of radar tracking in detecting VOLUME 11, 2023 five types of commodities (apples, bananas, paper, Coca-Cola, and mineral water) in actual working environments. Simultaneously verify the entire system control process. This design considers the adaptability and individual differences of various commodities, and we conduct repeated experiments on different commodities. We made assumptions about the experimental environment: 1. We use embedded intelligent shopping carts to simulate real shopping carts and use motion control systems to simulate the motion of real vehicles.
2. The signal coverage of the entire experimental environment is good.
3. Using the EAIDK-610 dedicated development board for visualization processing, real-time road section information is obtained through cameras.
4. The tested WeChat mini program is only for debugging and has not been released yet.
This section evaluates the commodity image detection algorithm based on the Radar attention mechanism in the official dataset. The dataset is environmental information recorded by a 2D Radar installed in an intelligent cart at the height of 37 centimeters above the ground indoors. The Radar is recorded at a frequency of 13Hz, a viewing angle of 225 • , and a resolution of 0.5 • .
The radar obstacle avoidance module detects obstacles through the input of obstacle images from the upper computer. It drives the drive wheel motor to rotate through linkage with the lower computer Stm32, achieving intelligent vehicle turning and obstacle avoidance. We take three common obstacles as examples. When obstacle images are detected, we pre-store different obstacle images on the local end of the vehicle. The local obstacle images are matched and detected in collaboration with the cloud, and after successful verification, we achieve safe obstacle avoidance for unmanned vehicles. We conducted 300 tests on three types of obstacles to verify the accuracy of obstacle recognition. When encountering obstacles, the front indicator light turns on, and the indicator light sets the intelligent cart to wait for 0.3 seconds and react. After correctly determining the obstacle, wait for the driving wheel to rotate, and the intelligent cart will be awakened by the serial port (if allowed by the cloud).

A. COMMODITY IMAGE RECOGNITION EXPERIMENT
The dataset controls the dynamic changes of the intelligent vehicle movement scene with specific control modes, and the OV9750 depth camera installed in front of the intelligent vehicle obtains depth data for annotation of this data.
This dataset contains a total of 220 annotated radar scanning information, divided into a training set (215 sheets) and a testing set (5 sheets). The annotations in the dataset include the positions of two types of objects: shelves and target pedestrians. The research only focuses on the detection of commodities, therefore ignoring the annotation of backgrounds such as shelves in the dataset.
According to the standards in the field of target detection, the average precision (AP) under different correlation distances is used as the main evaluation index, and the equal error rate (EER), such as the Harmonic mean balance score F1 Score of accuracy and recall, and the proportion of error acceptance equal to the ratio of error rejection, is added as the evaluation standard. The definition of Precision and Recall is shown in the following formula, which is only applicable to  binary classification problems.
Recall = TP TP + FN (15) AP is the area under the Precision-Recall Curve, representing the total performance of the classifier in terms of accuracy and recall. F1 Score represents the harmonic mean of precision and recall, with a value range of 0-1 for both indicators. The larger the value, the better the performance of the two classifiers.AP is the area under the precision recall curve, representing the overall performance of the classifier in terms of accuracy and recall.
Recall represents the proportion of all positive cases correctly predicted, used to evaluate the detector's detection coverage of all targets to be detected. Taking commodity recognition as an example, the commodities in the image are usually taken as positive cases. A high recall rate indicates that more objects in the image can be found from the model.Precision refers to the proportion of true positive examples in the predicted results, used to evaluate the precision of the detector based on successful detection. High precision indicates that the majority of the commodities detected by the model are indeed commodities, with only a small number of objects that are not commodities being treated as commodities. In summary, recall represents the proportion of useful parts in the entire detection result to the useful part of the entire dataset, while accuracy represents the proportion of useful parts in the entire detection result to the entire detection result.
In Figure 27, the recognition results indicate that it has good recognition performance for commodities with a single color. However, under the condition of diversified colors of the commodity, the recognition accuracy of the commodity will decrease by 0.02. For our 800*800 pixel image, the actual processing time for each frame is 0.09s. In actual testing, we expect the feedback time of the communication device to be within 1s. To verify the recognition effect based on the SIFT algorithm, five types of commodities were evaluated under the conditions of the dataset at an association distance of R=0.5m. Qualitative and quantitative evaluations were conducted separately in the test set and validation set. The qualitative evaluation results are shown in Figure 25, respectively. The AP in the Figure 32 represents the area under the PR curve, and the quantitative evaluation results are shown in Table 4.
To ensure the fixed variables, the improved algorithm model is also trained on a single training set. Four sets of Radar scanning information are selected as samples for a total of 40 rounds of training. The Adam optimizer is used in the training process. The initial Learning rate is 10 −3 , and the index drops to 10 −6 in the whole training process (after each iteration). The binary cross entropy loss is used for classification, while the Ll norm of regression error is used for regression, converting the network output into detection.
According to the experimental results, under the evaluation of R = 0.5m in the test set algorithm, the highest AP value of the first three commodities reached 100%, the highest AP value of paper extraction reached 95.24%, and the highest AP value of mineral water reached 96.29%. The top three commodities have a maximum F1 value of 100%, a maximum AP value of 98.1% for paper extraction, and a L. Yao et al.: Radar Self-Following Shopping Cart Based on Multi-Sensor Fusion   maximum AP value of 96.8% for mineral water. The AP and F1 values of this target detection algorithm have significant effects.
However, except for mineral water, the accuracy measurement indicators of all other commodities can reach over 95%. Through research, it has been found that the color, shape, and background of the commodity have particular specificity in commodity recognition. The slightly lower accuracy is due to the detection of more complex items and the generation of more false detection targets; However, the recall rate is similar to other commodities testing results, indicating a relatively low rate of missed detections.
For commodities with a single color and relatively fixed shape, such as apples and bananas, the precision of high precision is relatively high, and the selection of Score    Threshold = 0.5 is relatively high. The accuracy of the detection results is high, but there is also a risk of hidden dangers in missing detection.

V. SUMMARY AND PROSPECT
This article designs a radar autonomous tracking shopping cart to assist consumers in shopping. This system has the characteristics of automatically identifying commodities, loading commodities, and following consumers independently. It enhances users' sense of autonomy and also designs a member login system to optimize the warehousing of unmanned supermarkets, while providing personalized discount promotions for users; Through the application of Internet of Things technology, achieving the interaction between WeChat mini programs and cloud data helps to promote the shared concepts of autonomous driving and automatic following.
We have demonstrated the feasibility of using YOLOv4 micro network and carrying OPEN AI LAB embedded development platform for commodity identification through experiments. At a performance comparable to EfficientDet, the computing speed is twice as fast. Compared with YOLOv3 micro network, the average recognition accuracy is higher, with a value of 97.28%; Compared to single sensor systems, machine vision has a shorter processing time.
In future work, we plan to use advanced graph neural network algorithm or super recognition 3D model algorithm based on Convolutional neural network to replace SIFT fast recognition algorithm based on traditional neural network, which means that parameter design can be reduced, interactive memory learning can be strengthened, and recognition speed and efficiency can be further improved. We will also add voice recognition function, which can interact with customers at any time, respond to and solve consumer problems, and add AI learning based scene capture function to help supermarkets monitor shelf shortages, assist in intelligent management and efficient operation of supermarkets.