Real-Time Analysis of Animal Feeding Behavior With a Low-Calculation-Power CPU

Our goal was to develop an automated system to determine whether animals have learned and changed their behavior in real-time using a low calculation-power central processing unit (CPU). The bottleneck of real-time analysis is the speed of image recognition. For fast image recognition, 99.5% of the image was excluded from image recognition by distinguishing between the subject and the background. We achieved this by applying a binarization and connected-component labeling technique. This task is important for developing a fully automated learning apparatus. The use of such an automated system can improve the efficiency and accuracy of biological studies. The pond snail Lymnaea stagnails can be classically conditioned to avoid food that naturally elicits feeding behavior, and to consolidate this aversion into long-term memory. Determining memory status in the snail requires real-time analysis of the number of bites the snail makes in response to food presentation. The main algorithm for counting bites comprises two parts: extracting the mouth images from the recorded video and measuring the bite rate corresponding to the memory status. Reinforcement-supervised learning and image recognition were used to extract the mouth images. A change in the size of the mouth area was used as the cue for counting the number of bites. The accuracy of the final judgment of whether or not the snail had learned was the same as that determined by human observation. This method to improve the processing speed of image recognition has the potential for broad application beyond biological fields.

Due to the variety of shapes and behaviors among individual animals, it is difficult to automate the evaluation of animal behaviors using a machine. Recent progress in machine learning and its improved availability, however, have enhanced the utilization of computers for assessing animal behavior.
Several automated devices using image recognition technology are commercially available for studying learning and memory in rodents. Many of these devices, however, require expensive high-precision computers with 3D video tracking systems, which is a limiting factor for researchers in institutes and laboratories with a restricted budget [1], [2]. Thus, it is strongly required that the processing speed of image recognition should be improved for broad application beyond biological fields.
The pond snail Lymnaea stagnalis can be both classically and operantly conditioned in a manner that allows for monitoring the acquisition of learning and the subsequent consolidation of associative learning into long-term memory [3], [4]. Application of an appetitive stimulus, such as sucrose (the conditioned stimulus; CS), in naive snails increases the feeding response, whereas application of an aversive stimulus, such as KCl (the unconditioned stimulus: US), causes snails to withdraw into their shells and terminate feeding. Repeated presentation of CS-US pairs significantly suppresses feeding behavior in response to the CS presentation. This is referred to as conditioned taste aversion (CTA) [5]- [10]. In snails, CTA persists for more than a month [11].
Until recently, snail conditioning was performed manually, and the memory status of the snail was assessed by human observation [12]- [15]. Manual conditioning of a large number of snails is laborious and time-consuming for the experimenter. Further, it is difficult, if not impossible, to analyze multiple parameters simultaneously by human observation. For example, the size of the mouth opening and the time required for the first reaction to the CS presentation have not yet been systematically analyzed. Although an automated learning apparatus for snail CTA was recently developed, the analysis itself still requires manual judgements made by the experimenter [16].
We built an analytical system to determine the memory status of snails in real-time using a low-performance CPU. For accurate image recognition in real-time, 99.5% of the image was excluded from the image recognition process by distinguishing between the subject and the background. Here, we present a fully automated learning apparatus that covers all stages of learning from execution of the conditioning to assessment of the animal memory status on the basis of feeding behavior.

A. Animals
Laboratory-reared freshwater pulmonate mollusks Lymnaea stagnalis with a shell length of 20-27 mm were used. All snails were maintained in dechlorinated tap water under a 12-h light:12-h dark cycle at 20-22°C. They were fed ad libitum on turnip leaves (Brassica rapa var. peruviridis known as Komatsuna in Japanese) every other day. Food deprivation was performed 1 day prior to the conditioning

B. Experimental Conditioning Apparatus
The basic concept and composition of the experimental setup were identical to that described previously [16], and are therefore only briefly summarized below.
The experimental system comprised multiple parallel independent training systems with a 50-ml test tube containing a continuous flowing water stream (3.3 ml/s). The system had 2-channel signal receivers and drivers equipped with transistor to transistor logic (TTL) that activated the CS and US at any temporal sequence, which was used to control the interstimulus interval and stimulus duration. The CS, i.e., the sucrose solution (100 mM, 2 ml/s for 5 s), or the US, i.e., the KCl solution (200 mM, 2 ml/s for 5 s), was applied by a trigger signal to control a diaphragm valve connected to each reservoir. The TTL signal was initiated by an on-board micro controller. In place of the 3-channel digital stimulator described in the previous study [16], we used the Arduino-Uno REV3 (http://arduino.cc/forum/). Video images of mouth movement, captured with a camera (Raspberry Pi Camera Module V2: Raspberry Pi Foundation; sensor image area 4.6 mm diagonal) placed underneath the test chamber in the laboratory with room light conditions, were fed into an on-board microprocessor, the Raspberry Pi 3 (http://www.raspberrypi.org). The video images were transferred to a Linux-based notebook computer (OS: Raspbian 4.14, CPU: Intel R CentrionTM mobile technology Intel R Pentium M Processor ULV 753, RAM: 1024 MB, HDD: 40GB) or another Raspberry Pi 3 for image processing and analysis.
All experiments were performed at room temperature ranging between 20-25°C.

C. Image Capture
Images of snail feeding behavior were captured with the video imaging device placed underneath the test chamber through a transparent window at a video rate of 6 frames per second (fps) with 32 bit19 640 × 480 pixel color images. To detect the mouth portion of the snails, we extracted an area morphologically characterized by a dark round shape surrounded by light beige foot muscle. Lymnaea has a maximum feeding behavior response to sucrose solution of 15 to 20 times per minute, corresponding to ∼0.3 Hz [3]. Thus, we could analyze the images at 6 fps (6 Hz) without loss of feeding events. The video images of the mouth movements were stored in a computer for subsequent analysis. The snail's response to the stimulus application was recorded for 1 min immediately after the termination signal for the stimulus application.
Image processing and machine learning were carried out using the shareware programs of OpenCV 3.1 (https://opencv.org/). OpenCV provides various interfaces (e.g., C, C++, Python). In the present study, we used Python 2.7 (https://www.python.org/) as the development platform.
To obtain the coordinates of the mouth in real-time, a machine learning algorithm was applied to characterize and identify the snail's mouth prior to the conditioning experiment. For this purpose, we used 4491 sample mouth images as positive references and 6583 images other than the mouth as negative sample images for machine learning using the algorithms of Real Ad-aBoost with the Local Binary Pattern (LBP) feature detection operator, as described below.

D. Conditioned Taste Aversion (CTA) Procedure
Snails placed in the test tube were physically fixed in place at the anterior and posterior portions of the shell with a hand-made clip to prevent them from changing their position. Following 10 min acclimatization, the pre-test was performed to examine the innate feeding behavior before conditioning. Feeding behavior in response to the CS presentation was estimated by counting the number of mouth openings per minute. Originally this procedure was conducted by human observation through a mirror placed underneath the test tube, but in the previous report by Takigami et al. [9] this task was performed by video image observation captured with the Raspberry Pi Camera Module V2, and the video images were played back after the experiment for off-line evaluation.
After the pretest, a 10-min adaptation was interposed and then repeated presentations of the CS-US pairs, i.e., forwardconditioning, were applied 10 times with an inter-stimulus interval of 1 min. The CS was presented for 5 s, and then the US was applied beginning at the termination of the CS for 5 s at 2 ml/s. The conditioned response was observed 10 min after the last presentation of the CS-US pair. To examine the temporal specificity, we applied US-CS pairs, in which the sequence of the CS and US was reversed, i.e., backward conditioning, as a control group in addition to a naive group, which was placed in the test tube without any prior CS or US presentation (Fig. 2). In the present study, for the sake of simplicity, forward-conditioned snails are referred to as conditioned snails.
To determine whether the conditioning was successful, we observed the number of bites (i.e., mouth openings) elicited by presentation of the CS alone for 5 s beginning 10 min after the last CS-US/US-CS pairings (i.e., the 10-min post-test). After the 10-min post-test, the snails were returned to their home aquarium. All the behavioral experiments were performed in the morning.

E. Statistics
The data are expressed as the mean ± standard error of the mean (SEM). Significant differences were defined as P < 0.05. Welch's paired t-test was used to compare the number of bites between human visual observation and computerized automated-counting. Welch's independent t-test was used to assess whether the automated-counting could detect the significant suppression of the snail's response to the CS the same as human observation. To evaluate the performance of the computerized measurement program, the ratio of good learners vs. poor learners was estimated using Fisher's exact probability test. We defined a good learner as a snail that made 0-1 bites/min during the post-test session in response to the CS. A poor learner was defined as a snail that made ࣙ2 bites/min in response to the CS during the post-test session. Data analysis was performed using R software version 3.3.1 (https://www.r-project.org/).

III. APPROACH
Although the snails were mechanically constrained with a clip at the anterior and posterior portion of their shell, the snail was still able to change its position within the shell; therefore, we used image recognition technology with a machine-learning algorithm to obtain the coordinates of the mouth from continuous video images.

A. Machine-Learning Procedure to Detect Mouth Coordinates
As the first step, the original 32-bit color images (640 × 480 pixels) were processed to detect the mouth area as shown in Fig. 1. To detect a mouth, we used the LBP [17], [18] and AdaBoost [19] algorithms.

1) Local Binary Pattern (LBP):
LBP is a strong grayscale texture operator that is robust against intensity variations. Using the neighborhood set P, we computed the difference between the central pixel "g c " and its neighborhood {g 0 , . . . g P −1 }. Function (1) calculates the LBP value.
Pixels with a grey-scale value greater than that of the central pixel are given a value of 1, otherwise 0. Fig. 3 shows the principle of the LBP operator, and Fig. 4 shows the flow for obtaining  Let the upper left of the target pixel be a(i, j) and treat nine pixels from a(i, j) to a(i + 2, j + 2) as one unit. The brightness of each pixel was digitized with 256 levels. Pixels greater than the target pixel were mapped to 1, otherwise 0. The value, read clockwise from a(i, j) to a(i + 1, j), becomes the feature intensity.
the LBP features from an image. The original gray scale images of various sizes were transformed into 40 × 40 pixels, with the intensity of each pixel ranging from 0 to 255. The intensity of each pixel was compared with that of the surrounding 8 pixels (i.e., P = 9) by the function (1). Thus, we obtained a uniquely featured density histogram.
2) Real AdaBoost: Boosting the algorithm can improve the performance of a weak classifier by iteratively repeating to find a small number of weak classifiers and combine them into a strong classifier. The Real AdaBoost algorithm is a variation of AdaBoost. It is not a simple Boolean classification, and assigns confidence to candidate weak classifiers.
The following is the brief description of Real Ad-aBoost algorithms. Given a set of training examples S as where l = ±1. W l j is the weighted appearance probability of the positive sample whose number is j in the training sample 2 Set the output of h on each X j as where ε is a small positive constant to avoid division by 0. 3 Calculate the normalization factor 4) Select h t that minimizes Z.

5) Update the sample distribution
and normalize so that the sum of D t+1 (i) becomes +1. 6) Output the strong classifier H where b is a threshold whose default is zero. LBP always had a smaller false detection ratio than did Haar-like. and • detectors were used for experiments. Out of the total number of 24,101 images of Lymnaea during a CTA procedure, we obtained a mouth coordinate with a mean success ratio of 55% using the LBP feature. The false match ratio was 7%, and the false non-match ratio was 38%. In the case of the Haar-like feature, we obtained a mouth coordinate with a mean success ratio of 45%. The false match ratio was 9%, and the false non-match ratio was 46%.
Mouths of various sizes in the images were detected using the cascade classifier cv2.CascadeClassifier.detectMultiScale in OpenCv. During this extraction process, a dataset of 4491 mouth images was used as the positive samples and a dataset of 6583 images other than mouth images was used as negative samples for the template of the machine learning procedure. Fig. 5 shows the successful mouth detection ratio by the Real AdaBoost with the LBP feature detection. Out of a total of 24,101 images, we obtained a mouth coordinate with a mean success ratio of 55%. The false match ratio was 7%, and the false non-match ratio was 38%. Detection failure occurred due to the fact that the mouth did not appear on the camera if the body was twisted in the shell, the closed mouth was small, or if there was another organ similar to the mouth (see Section V). If the loss of mouth images lasted for longer than 1 min, the data were aborted and discarded. If it was difficult to assess the mouth areal changes, the program did not produce a final judgement. Abrupt movements of the animal that interfered with human observation occurred at about the same ratio as with the computer detection.
It is common to compare another algorithm, such as the Haar wavelets feature, with the detection error trade off curve (DET curve). Haar-like features are digital image features used in human face detection. Viola and Jones adapted the idea of using Haar wavelets and developed the Haar-like features algorithm [20]. The Haar-like features algorithm considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region, and calculates the differences between these sums. Because the Haar-like features algorithm does not require heavy calculation power, we compared the DET curve between these two algorithms. For realtime analysis, the results of the Real AdaBoost with LBP algorithms were better than those of the Haar-like features algorithm (Fig. 5).

B. Image Analysis for CTA
The signal processing sequence is shown in Fig. 6. Analysis of whether a snail acquired and retained the associative memory was performed using the following three steps: detect the mouth area from the global images; measure the temporal rate change in the mouth area; and judge the conditioned or unconditioned state from the animal's behavioral response to CS application in terms of the number of mouth openings per minute. Although the Raspberry Pi 3 camera module V2 can record video with a maximum frame rate of 30 fps, we recorded only 6 fps to reduce the computational burden and allow for real-time analysis.

1) Mouth Detection:
The bottleneck for real-time analysis of feeding behavior is the mouth extraction process. To obtain accurate image recognition at high speed with a lowperformance CPU, we excluded 99.5% of the image area using a non-mouth area by a connected-component labeling technique. Fig. 7 shows the preprocessing procedure of the connected-component labeling technique: original color video images recorded at 6 fps ( Fig. 7-A) were transformed into gray scale images (Fig. 7-B); through several filtering processes; "Adaptive Threshold" (cv2.adaptiveThreshol) to detect the edge (Fig. 7-C); "Morphological Transformation" (Fig. 7-D), "Moving Filter" (Fig. 7-E), and "Binarization" (Fig. 7-F) to remove small white dots. The morphological gradient filter (scipy.ndimage.morphological_gradient) in SciPy.org extracts the difference between the dilation and erosion of an image. The connected-components labeling technique (cv2.connectedComponentsWithStats) in OpenCV was applied after binarization ( Fig. 7-F). With these filters, we could reduce the candidate mouth area by 22%, as shown in Fig. 8-D. Subsequently, we excluded mouth candidate regions that were too big or small, and whose average color differed from that of the mouth. We could reduce the candidate mouth area to 0.5%. These processes allowed us to identify a mouth even with a low-performance computer. This means that after reducing the candidate mouth region, it is not necessary to analyze the entire 640 × 480-pixel image for mouth areal analysis.
After setting the strong classifier through the learning process, we could effectively select the mouth candidate region from the global images, as shown in Fig. 9. When several mouth candidates were detected from one frame, the candidate area with the most neighboring positively-classified rectangles was defined as the mouth.
2) Mouth Areal Change: The image pixels I(x,y) were binarized using the threshold t(x,y) according to the functions (8) and (9) to prevent the effects of the difference in the background brightness.  I (x, y) The enlarged mouth images obtained by image recognition were then processed by gray-scale conversion, Gaussian filtering, adaptive threshold, and morphological transformation, as shown in Fig. 10-A. After noise reduction through the morphological gradient filter, the number of white pixels in the processed images (dilation image of Fig. 10-A) was calculated. The temporal moving average of three consecutive images was calculated by linear convolution in SciPy.org as numpy.convolve (a,v,mode = 'valid'), and applied to the following formula: a: first one-dimensional input array of mouth area; v: second one-dimensional input array of time.
For detection of one cycle of open/close mouth movement, 2 to 4 s of imaging was required [21], [22]. In the 6 fps video, 12 to 24 frames were required for one mouth opening. We empirically found that the minimum areal change from an open to a closed mouth was a total of 200 pixels for the entire sequence; thus, the maximum threshold value for detecting a mouth movement was estimated as 16.7 (200/12) pixels/frame. On the basis of this estimation, the absolute threshold value for mouth area change was set to 5, which gave the best correct answer ratio. The mouth-opening rate was defined by the temporal mouth areal change, calculated by the numpy.gradient in SciPy.org (https://www.scipy.org/). When the gradient exceeded the threshold value for three successive frames, the flag was set to +1 or −1 as an index indicating the mouth open/close condition, respectively. The number of sign changes of the flags indicated the number of mouth openings or closings. Therefore, the number of bites was represented as half of the number of sign changes.

IV. AUTOMATED EVALUATION OF MEMORY RETENTION STATUS
To determine whether the algorithm could accurately count the number of mouth openings and adequately judge the snail's learning and memory status, we analyzed 82 experimental videos obtained from conditioned (32) and unconditioned (50) snails. Examples of the mouth area (pixels) traces in conditioned and naive snails are shown in Fig. 11. Well-conditioned snails no longer exhibited feeding behavior in response to the   Fig. 11-B, D), whereas naïve snails responded with mouth openings to the CS (Fig. 11-A, C).

CS (
The feeding rate (bites/min) did not differ significantly between the two counting methods, i.e., the analytical method developed in the present study and that of human observation (Welch paired t-test, n = 50 each, P = 0.44 for naive snails, P = 0.64 for conditioned snails; Fig. 12). The mean counting error of the machine was ±1.1 bites per minute. Both counting methods showed that feeding behavior of conditioned snails was significantly suppressed in response to the CS compared with that of naive snails (Welch independent t-test, n = 50 each, P < 2.2 × 10 −16 ; Fig. 13-A, B). To further confirm that the automated counting method could properly judge the snail's learning and memory status, we examined the ratio between the good and poor learners in response to the CS in the conditioned group. On the basis of our previous studies [23], almost all conditioned snails never open their mouth (i.e., no biting) in response to the presentation of the CS, exhibiting long-term memory for the association. Some snails, however, exhibit chance biting behaviors (i.e., spontaneously opened a mouth) in the absence of a delivered stimulus [11]. As such spontaneous openings occur at a rate of 1/min, a good learner was defined as showing 0-1 bite/min, and a poor learner opened/closed the mouth more than 2 times/min.  The ratio of good learners to poor learners was not significantly different between the human and computer evaluation methods, as examined by Fisher's exact probability test (n = 50 each, P = 1.0; Fig. 13-C). By human observation, 90% of snails learned and formed a CTA memory, and by computer evaluation, 94% of snails learned and formed a CTA memory. The computer evaluation produced 6% false positives and 2% false negatives.
Computer analysis, however, made it possible to examine new elements of snail feeding behavior. Fig. 14 shows the comparisons of the latency to respond to the CS presentiation and the degree of mouth opening between naive and conditioned snails. Conditioned snails had a significantly longer latency to respond to sucrose (56.5 ± 1.4 s for conditioned snails; 7.5 ± 1.4 s for naive snails, Welch independent t-test, n = 50 each, P < 2.2 × 10 −16 ; Fig. 14-A), and the degree of mouth opening was significantly smaller than that in naive snails (125.9 ± 10.3 pixels/bite for conditioned snails; 319.0 ± 9.0 pixels/bite for naive snails, Welch independent t-test, n = 50 each, P < 2.2 × 10 −16 ; Fig. 14-B). These two new analyses allowed us to evaluate memory from various viewpoints. As mentioned before, the memory score evaluated by counting the number of bites is affected by the occurrence of spontaneous bites, but by using both the latency and degree of mouth opening, we can avoid the influence  of spontaneous bites. The number of bites and latency depend on the amount of signal produced by the motor neurons [24], [25]. On the other hand, Haque et al. showed that the opening size of a pneumostome depends on the burst duration of motor neurons [26]. Therefore, the size of a mouth opening may also depend on the burst duration of motor neurons. Our analytical system made it possible to estimate not only the frequency of the motor neuron activities, but also the burst duration of those neurons.
V. LIMITATIONS   Fig. 15 presents the three main failure modes of our analysis system. In the first example (top), the pneumostome was mistakenly detected as a closed mouth due to the similarity in their shapes. In the second example (middle), no good matches were found because the mouth appeared as almost a single point and had no distinguishing morphological characteristics. In the third example (bottom), the snail twisted its body and the mouth was out of frame, resulting in a failure to detect the mouth in the image.
As shown in Fig. 5, the maximum ratio for detecting correct coordinates of the mouth using the LBP algorithm was 55%. It is important to note that this ratio was not used to determine whether or not the snails had learned. There are two main reasons for this low detection ratio: one is due to the small sample size (4491 images) for machine learning, probably resulting from insufficient learning, and the second is the shape difference between the open/closed mouths, even in the same snail. In this study, we attempted to use a single detector to detect the mouth, and, due to the dynamic change in its shape during feeding (as shown in Fig. 10), it may be difficult to detect the mouth using only one detector. Thus, a single detector could detect an open mouth with a high ratio, but was less able to detect a closed mouth. Dynamic changes in the mouth were easily detected by analyzing only the open state without analyzing the closed state. Therefore, even with a detection ratio of 55%, it was possible to determine whether or not the snails had learned.

VI. GENERALIZABILITY
Here, we present a simple and efficient method to improve the processing speed of image recognition. The present method is particularly effective when the contrast between the object and background is large and the color of the region surrounding the object is relatively uniform. Because many behavioral experiments satisfy these conditions, this method has the potential for broad application across biological fields. For example, in the field of invertebrate conditioning studies, the method can be applied to evaluate feeding behavior of a snail eating an actual food item [24], siphon-gill withdrawal reflexes of Aplysia [27], taste learning of Drosophila and Apis (honeybees) [28]- [30], and locomotor behavior of Drosophila larvae and C. elegans [31]- [33]. The method can also be applied to behavioral experiments in rodents [34], such as in the open field test, Barnes maze, water maze, and elevated plus-maze. In each case, the background color is relatively uniform and the contrast between the animal and background is large. In addition, our method can be applied to fields other than biological research, such as to assess a driver's performance by evaluating eye blink latency using a dashboard camera and the detection of unacceptable products in automated manufacturing.

VII. CONCLUSION
We successfully developed a real-time evaluation system to assess animal feeding behavior using a low calculation-power CPU. Conditioned snails were characterized as exhibiting suppressed feeding behavior, in contrast to naive snails. Because this behavior can be observed as a decrease in the number of bites in response to sucrose application, we developed an analytical system to detect the mouth, to count the temporal changes in the mouth area, and to count the number of mouth openings/closings. The results obtained using the machine did not differ statistically from those obtained by human observation. Furthermore, our system could evaluate not only the bite rate, but also the response latency and the degree of mouth opening in response to sucrose, which is impossible with human real-time observation. These new evaluation metrics can reveal additional aspects of an animal's neural activity index behavior. Moreover, this simple and efficient method can improve the processing speed of image recognition. This method permits broader generalization and requires substantially less computing power than other available methods. Our method makes it possible to easily perform the necessary analysis with less computing power by more researchers, and will contribute to the future of science.