Design and Verification of a Low-Cost Multispectral Camera for Precision Agriculture Application

With the rapid increase of the world population, climate changes, and the slow expansion of cultivated areas, only precision agriculture (PA) can provide enough food or resources. PA requires flexible instruments for measuring the spectral signatures of the crops to understand their conditions. Unfortunately, the high initial costs of multispectral (MS) cameras reduce the implementation of PA in small farms, which constitute a large portion of arable land in Europe and contribute with social bonds, local know-how, and cultural legacy. With the objective to speed up the use of MS imaging, in this article, we present a novel low-cost imaging device consisting of an MS camera with nine bands and a thermal imager, whose price is several times lower than commercially available ones. This article describes the design and the calibration of the imaging device based on the off-the-shelf components: Raspberry Pi, dedicated quad camera kit, thermal core, and multiband optical filters. The spectral reconstruction accuracy has a high average R2 score of 0.986. Finally, images from multiple sensors are aligned using phase-only correlation and dense optical flow, providing a method that can be implemented on all platforms. The presented solution is open source, permitting one to modify and expand the capabilities of the described device and adapt to specific needs. Moreover, the device is flexible as the thermal camera can be removed to reduce the total system cost if its usage is not required. Even if its primary application is PA, the proposed solution can be used for other applications.


I. INTRODUCTION
A S THE world population grows exponentially, reaching 9.7 billion people by 2050 [1] (which is more than tripled from the 1950s to 2022 [2]), there is a constant demand for agriculture to produce more food and resources for other industries.At the same time, the essential acceleration of productivity growth is being impeded by the deterioration of natural resources, the decrease in biodiversity, and the transmission of plant and animal pests and diseases, some of which are developing immunity to antimicrobial agents [3].In addition, extreme weather events, such as floods and droughts due to climate change, greatly impact agricultural productivity [4].With no more than 30% of the expansion in the cultivated areas in the last 70 years [5], and considering the whole mentioned before, the motivation to push agriculture efficiency is clear.
To meet the needs of a growing population, agricultural intensification is necessary, which involves using fertilizers, pesticides, water, and other resources.However, it leads to environmental pollution and economic losses [6], especially with contamination of drinking water and aquatic ecosystems [7], [8].Also, spreading pesticide residues in the environment results in mass killings of nonhuman biotas, such as bees, birds, amphibians, fish, and small mammals [9].As high-input, resourceintensive farming systems have caused massive deforestation, water scarcity, soil depletion, and high levels of greenhouse gas emissions, there is a need for innovative systems that protect and enhance natural resources, delivering sustainable food and agricultural production [3].
Achieving enhanced crop yield production, improved quality, reduced operating expenses, and diminished environmental contamination are the objectives of precision agriculture (PA), which can be considered a key component of sustainable agriculture in the 21st century [10], [11] with the main goal to design a location-specific crop management system based on measuring, observing, and responding to crop variations [10].
Remote and proximal sensing technologies acquire information about vegetation.These sensing techniques allow gathering spectral, spatial, and temporal data about land and crops from ground-based vehicles, aircraft, satellites, and handheld radiometers [10].Then these data can be analyzed to determine the health of plants and crops [12], weed presence [13], soil types [14], water availability [15], disease occurrence [16], and other factors that can impact vegetation.
The most widespread method for obtaining information about vegetation is through multispectral (MS) imaging.With MS data, it is possible to calculate several vegetation indices, among which the normalized difference vegetation index is a standard measure of crop condition and plant health [7].Usually, these MS images are acquired from aircraft or satellites, but only a portion of available satellite images are distributed free of charge.However, free satellite images have a resolution at best 10 m per pixel that is usually too coarse to detect problems in crops since it calculates the average reflectance in the area corresponding to each pixel [17].Moreover, satellite data are affected by cloud coverage in regions of Europe, which is an issue for image interpretation [18].Also, free software solutions for analyzing these images tend to be too complex for nonexperts [18].
A practical and adaptable approach is unmanned aerial vehicle (UAV) based MS image acquisition [19].Up to some level, this approach is adopted by medium and large farms in different percentages depending on the geographical region.Crop farms in the USA show an adoption level of 67% which is much higher than the 25% on average for the EU [20].Unfortunately, small farms are lagging behind in the application of PA due to the significant initial costs of technology [18], [21].The starting investment can be as 25 000 USD without the expertise required for operation [20].Furthermore, the fact that agriculture is a lowmargin business [22] justifies that small-sized farms need longer periods to benefit from the savings potential of this technology.This situation is especially evident within the highly fragmented arable land in mountain zones [23], like in Italy and Greece [24].According to a survey of 360 farmers in Italy, only 4.4% declared to currently use UAV in their fields [25], mainly due to high initial investment.However, in the EU, small farms cover about 86% of all farms [20]; in Italy, this category counts 6771 farms [26].Nevertheless, small farms are vital for preserving the territory and local rural areas' economic development [26].They guarantee the permanence of populations and agricultural activities in rural areas, not only contributing to incomes but also to the social capital, local knowledge, and cultural heritage [27], [28].Moreover, it is crucial for global food security that smallholder farmers adopt PA in the following years [29], [30].
To overcome this high price barrier, in this article, we present a new, low-cost imaging device with nine narrow bands in visible/near-infrared (VIS/NIR) and one thermal infrared channel, which is in a price range around seven times lower than the commercially available systems in the market.In the design of the device, we use only off-the-shelf components.Compared with proposed devices in the scientific literature, our solution does not require specific tools, expertise, or fabrication equipment (e.g., for lithography).In addition, the imaging device has no moving mechanical parts, which introduces additional synchronization mechanisms and acquisition constraints.
After assembling the proposed device, we performed calibration of MS and thermal cameras.During the calibration procedures, we also verified the validity of assumed camera models using high-performance hyperspectral and thermal cameras as reference devices.The last step in designing this imaging device was aligning images from five cameras.It is necessary to produce a final output image as there are separations between these five sensors, and fields of view do not completely overlap.
The rest of this article is organized as follows.Section II-A presents the most used commercial MS cameras and a review of the literature on MS solutions described in scientific sources.Section II-B is a brief review of the standard methods for image alignment.Section III describes the design process of the proposed camera, presenting each hardware element and relevant details of software implementation significant for proper system functioning.Section IV explains the calibration procedure used for MS and thermal cameras and shows high accuracy between reconstructed MS values and true values demonstrated with R 2 score higher than 0.985 in most cases.Similarly, it is verified that a small thermal core provides accurate temperature readings with R 2 score of 0.974 compared to the data obtained with the high-quality infrared camera, FLIR Duo Pro R. Section V illustrates a simple algorithm for the alignment of all images, which consists of a two-step procedure.In the first step, homography between two images is found and applied to one image.In the second step, nonrigid image registration based on dense optical flow is applied to the result of the first step.The second step successfully removes all residual misalignments present after the first step.Finally, Section VI concludes this article.

A. State of the Art on Multispectral Cameras
As MS cameras are the most common source of data for PA applications, there are several commercially available MS devices whose prices depend on the number of different spectral bands.Table I shows the main systems, their characteristics, the number of bands, and the price range.This price range should be considered only as an approximation derived by analyzing information available on the web.
All models presented in Table I are standalone, except the new DJI's product (camera, UAV, incident radiation sensor, and RTK GPS module) dedicated for UAV-based acquisition when four narrow bands + RGB image are sufficient [35].For comparison with our device, two solutions (Altum-PT from AgEagle [37] and 6X Thermal from Sentera [41]) should be mentioned, which incorporate a thermal camera from FLIR, a Boson thermal sensor with a resolution of 320 × 256 pixels and thermal sensitivity of 50 mK.
The custom-design devices found in the literature are based on special sensors or optics.Ono [42] describes the MS camera realized using an industrial camera with four directional polarization filters on each quad pixel (Sony IMX250MYR sensor) and a uniquely designed lens with nine bandpass filters.The suggested solution does not need any postprocessing alignment.However, filters and polarizers reduce the amount of light that reaches each pixel, requiring more extended exposure to obtain a sufficient signal-to-noise ratio (SNR) and can introduce motion blur.However, some specific tools are required to modify a standard industrial lens, as aligning polarizers at the custom-made lens and polarizers at each pixel is necessary.Otherwise, the camera does not operate properly.In [43], authors proposed a filter array with 16 narrow band filters in which spectral transmittance depends on the thickness of the amorphous silicon or aluminum oxide layer sandwiched between the top and bottom silver layers deposited on a glass substrate.Although the solution performs well, the main issue is that special equipment is required for evaporation and lithography to produce the suggested structure.If this equipment is not available, producing a specially designed filter array is too costly for prototype purposes.Only a few manufacturers can realize a mosaic of filters, and a minimal request is usually 10 000 items [44].Williams et al. [45] demonstrated a similar solution with a 3 × 3 filter pattern, but special equipment is still needed for lithography.MS design with only one sensor can be accomplished using a plenoptic (or light field) camera.That idea is presented in [46], showing an MS camera with 18 small lenses (17 narrow band + panchromatic).Despite a compact design, prolonged exposure is required for a usable SNR due to the low amount of light collected with individual lenses.Moreover, several optical elements have be added to each lens to compensate for high distortion levels, and to assemble such lens design, specific tools and expertise are required.Another type of solution uses several sensors with additional optics to split incident light before optical filtering.In [47], three image sensors and three beam-splitter prisms were combined in one housing, while in [48] two cyan, magenta, yellow, and green (CMYG) cameras with two different four-band filters and beam-slitter were utilized.The first solution does not need any alignment postprocessing.Loss compensation due to light splitting can be obtained with more prolonged exposure, which introduces motion blur for not steady scenes.In addition, splitting optics can be bulky, and tools and practice are needed for optical elements alignment.The second approach provides eight narrow bands, but CMYG cameras are rarely on the market.A similar idea to [48], but with a typical RGB sensor and triple-band filters, was suggested by Fawzy et al. [49].By using six different triple-band filters and an optical wheel, it is possible to reconstruct 18 narrow bands.Using a wheel with several single-band filters (between 5 and 9) is one of the most common ways to build an MS camera, especially where there is no rapid movement during the acquisition of one MS image [50], [51], [52], [53].With this approach, Morales et al. [50] described a device that incorporates five single narrow band filters, one industrial c-mount camera, and a mechanism for rotating, with an overall price of 2000 EUR.However, all wheel-based designs can have issues with rotating mechanisms, especially with moving objects.Even in static scenes with vegetation, a slow wind can cause misalignment for many leaves, which is easier to spot as images are more separated in time.Also, there is a need for precision synchronization between the camera's trigger and the wheel's rotation.

B. Brief Description of the Mostly Used Image Registration Methods
Traditional image alignment is usually conducted in three steps.First, keypoints matching between the reference image and the image that should be registered are found.As keypoint detectors, SIFT [54], SURF [55], or ORB [56] are the most common choices.Furthermore, only differences between the results of these and similar keypoints detectors can be spotted in some specific tasks when there is a high rotation or scale difference between matching keypoints [57].The next step calculates the homography matrix, which transforms keypoint coordinates from the second images to the coordinates of matched keypoint in the reference images [58].In the final step, the obtained homography is applied to the second image.For UAV-acquired MS images, some researchers suggested aligning final orthorectified images [59], while others advised to align individual images in the dataset before mosaicking, using SIFT [60] or AKAZE [61] for feature detection (AKAZE is very similar to SIFT).Instead of feature detection, a pyramid algorithm based on enhanced correlation coefficient (ECC) [62] is proposed [63].Also, there are attempts to register close-range MS images.They are primarily based on finding homography for the whole image.Only the approaches for keypoint detection and matching are different [64], [65], [66].
Besides keypoint matching and gradient-based correlation, images can be aligned using the phase correlation technique, like phase-only correlations (POCs) [67], which is FFT based.Multimodal images, like MS, have similar structural properties but contain different intensity and texture information.It was shown that structural characteristics are preserved in the phase of FFT rather than in its amplitude, which was implemented to register multimodal remote sensing images [68].
In recent last years, some algorithms based on deep neural networks for image registration have been presented [69], [70].However, not enough image pair for training deep-learning models is already a recognized problem in the remote sensing community [71], which is why we are not considered in this article, as we face the same issue.
Section V compares several chosen methods for alignment images obtained using the proposed device.Then, the two-step algorithm for image alignment is suggested that showed the best performances compared to other tested approaches.

III. MULTISPECTRAL CAMERA DESIGN AND DEVELOPMENT
In this article, we present a low-cost MS camera based on off-the-shelf components.The core of our solution is a quad cameras kit that triggers all four cameras simultaneously, avoiding the need for any synchronization mechanism.Furthermore, the proposed device does not restrict acquiring images of moving objects.We selected four filters, three of which are multiband, resulting in a device that can provide nine narrow bands without additional optical elements for splitting incoming light.As a control unit Raspberry Pi (RPi) 4 was chosen.Due to the possibility of controlling and acquiring additional data, we added a thermal camera to get information in the thermal infrared band.This is motivated by the fact that thermal properties can be helpful for some PA applications as there is evidence that plants with a disease during the hot day show temperature difference up to 1.3°C compared to healthy plants [72].The overall cost of the proposed solution is less than 2000 EUR (Table II shows prices for selected components), which is at least seven times smaller than available commercial devices with similar characteristics.
Due to modular design, if the thermal camera is unnecessary, the price of the proposed imaging device is around 1500 EUR.Also, multiband filters have a high price, and an additional reduction in the overall price can be obtained with optimization in the number of bands needed for specific PA applications.It is worth noticing that, unlike some previously described systems, our solution does not require any specific tool for assembling, and housing is made on standard three-dimensional (3-D) printers.Furthermore, as used filters are the only source that reduces the incident light intensity, standard exposure time for MS imaging is sufficient to obtain data with satisfactory SNR, and motion blur does not occur.The presented imaging device, even motivated by PA applications, can be considered a generalpurpose MS/thermal instrument based only on easily reachable components and with a significantly lower price compared to commercially available.In addition, unlike some interesting systems described in scientific literature, which require specific tools, expertise, and system (e.g., equipment for lithography), anyone with basic technical skills can assemble the proposed design.
More details on the proposed device are given in the following subsections.

A. Components Selection
We started designing the proposed device with three key objectives: modularity, low cost, and availability of all elements.As one of the most widely used computer boards worldwide [73], with the ubiquitous community and continuously growing free codes for various applications, we selected RPi, version 4, which has enough power and memory capacity for our purposes [74].Our idea was to start with a design similar like that described by Fawzy et al. [49], using an RGB camera and triple-band filters.However, instead of the rotating wheel, we considered using several cameras as an alternative to mechanical components.For acquiring images in VIS/NIR range, we chose a quad cameras kit dedicated to RPi [75], characterized by low weight and low power consumption.The image sensor is OV9782 with 1280 × 800 pixels and a global shutter.The RPi has only one mobile industry processor interface--camera serial interface (MIPI-CSI) for connecting a camera, as the second MIPI-CSI port is for connecting the display.The selected quad cameras kit has an additional chip or "hat" to integrate four camera streams into one CSI lane and deliver to RPi as a single image.In this way, RPi can be used for image acquisition from four MIPI-CSI cameras without additional hardware.The acquired image has a 4× bigger width than the image from a single camera, and it can be split into four images with a few lines of Python codes.The chosen solution with quad cameras kit has several benefits compared to using four cameras and connecting three of them to USB (one USB port is reserved for the thermal camera) and one to CSI port: additional hardware is needed to construct the triggering mechanism; USB cables are much heavier than flat MIPI cables which lead to heavier camera housing; USB cameras dissipate more heat, causing RPi overheating and battery drain.Finally, USB cameras will occupy all available USB ports, and connecting other devices will not be possible.Also, as was mentioned in the previous sections, the goal was to use only widely available components that do not require knowing specific hardware design and/or driver-level programming.Although this is not the cheapest solution in terms of component prices, it can be assembled without a specific tool, as we wanted a solution that is "plug and play."Users are not focused on hardware but only on the applications.The future users of the proposed device do not need to know any aspects of interfaces or communication protocols between selected components, and saving with using low-level components is less than the cost for assembling as specific knowledge and tools are necessary in that case.
RPi requires a power supply of 5 V, while current can go up to 3 A, but to be sure about maximum current consumption of designed device we monitored RPi input current for 30 min, while both camera modules (thermal and quad camera kit) acquired images at full frame rate and all images were continuously saved on RPI's disk.This measurement was conducted as there is no other possibility to estimate the power consumption of RPi and two added modules, only the maximum power consumption (15 W) of RPi is stated (which only occurs with processing demand tasks like video manipulation, which is not case here), thermal core consumes 300 mW, while for quad camera kit there is no information about power consumption (OV9782 image sensor consumes 156 mW, but power consumption of hat is not specified).Using a USB voltage and current meter, we noticed that during the test, the current never exceeded 1.9 A (voltage was 5 V) and for further calculation, we rounded this value to little higher value of 2 A (or 10 W power consumption).As this imaging device is designated for outdoor usage, we chose a standard power bank with 5 V/3 A output and a capacity of 10000 mAh, which can provide at least 5 h (10 Ah/2 A) of continuous image acquisitions, usually enough for most situations.
The key idea behind getting three narrow bands using a color camera and triple-band filters is to combine spectral response from a standard RGB microfilter array deposited on a CMOS camera chip and filter transmittivity in each of the three bands.Due to the wide passband of the RGB microfilter on each pixel, we get a linear combination of light transmitted by a triple-band filter [48], [49].Details about the reconstruction of each spectral band will be presented in the calibration section of this article.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.With a quad kit camera, we should be able to get 12 different bands.Unfortunately, tripe-band filters are not so common on the market.Thus, our final choice, guided by availability, price, and necessary wavelength regions for PA, was a combination of one single-band, one dual-band, and two triple-band filters, resulting in nine narrow bands in total (see Fig. 1) with the following central wavelengths (nearest as possible to central wavelengths of commercially available devices with 9 or 10 narrow bands, mention in Table I): 1) filter 1: 432, 517, and 615 nm, 2) filter 2: 577 and 690 nm, 3) filter 3: 750 nm, and 4) filter 4: 550, 660, and 850 nm.
As previously mentioned, two commercially available MS cameras share housing with a thermal imaginer [37], [41], the Boson thermal core from FLIR.However, due to high prices for devices that include a Boson core, the thermal sensor is not a standard part of MS cameras.Several researches indicate the potential use of thermal imaging not only for water stress detection [59], but also for optimal braiding [76] and diseases detection [77], [78], as there are both positive and negative measurable differences between healthy and infected leaves due to different ways how various leaf pathogens affect stomata opening and change transpiration between leaves and surrounding.Considering this and the significant knowledge gap between the acquisition of thermal images and their use for assessment at the individual plant scale [79], we were motivated to add a thermal sensor to our design and provide researchers and individual users with additional valuable tool.Guided by availability, price, and performances, we selected Seek Mosaic Core [80], which has low weight, small footprint, low energy consumption, spatial resolution of 320 × 240 pixels, and a thermal sensitivity of 65 mK.These characteristics are similar to Boson's specifications but for a much lower price of around 1/4 of Boson's.The chosen thermal core has a USB interface, the only option as MIPI-CSI port had been occupied with a quad camera kit.As stated previously, if a thermal sensor is not needed due to the modular design of an MS camera, it can be omitted for a price reduction.
Fig. 2 shows all cameras with 3-D printing of the component's holder and enclosure.RPi 4 is fixed below all cameras.We chose plastic for the enclosure material as its price is lower than metal.We did not give attention to other material than plastic as we used plastic enclosures in the field during the summer when the temperatures were up to 35°C without overhitting issues.However, aluminum would be more suitable for acquisitions in fields with high ambient temperatures due to better heat dissipation.If necessary, some active cooling, like a Peltier temperature controller, can be added; in that case, the metallic enclosure is the only solution.

B. Software Development
To control all components, we developed an application using Python programming languages.Although RPi 4 has enough resources for some image processing tasks to extend battery time and prevent overheating, we restrict the program's function on RPi 4 only to acquire and save images and change the camera's exposure time and gain during acquisition.Images are saved in raw format, as grayscale and demosaic can be conducted offline after transferring images to a PC.The raw image contains images from all four cameras, and it can be converted to RGB, split into four images, and saved into lossless compressed TIFF format immediately after transferring from hat to RPi before acquiring the next integrated image.However, this procedure is time-consuming and could lead to frame drop.Then, as it heavily loads the CPU of RPi, it causes both overhitting and battery drain, drastically reducing usage time in the field.The solution is to save the raw images directly, without compression, in BMP file format on an SD card placed in a dedicated slot on RPi.After the acquisition in the field, RPi is connected to the same LAN as the local PC, raw images are transferred to the PC, and then debayering with lossless compression is performed.
Illumination levels can change during flight on UAV or closerange imaging.To compensate for this effect, we implemented a simple algorithm that keeps the mean grayscale value of the brightest visible image at a constant level.
Quad cameras are triggered simultaneously, and this mechanism is implemented in the hat.Unfortunately, there is no triggering mechanism for the thermal camera.It always delivers thermograms at a constant frame rate (9 or 25, depending on Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the model).For static scene imaging, this does not introduce any problems.However, with UAV-based solution, some latency between thermal and visible images is present.Still, this can be mitigated by the alignment of final orthorectified images, which can be achieved with standard algorithms and ground control points.
All functions necessary for the further processing of acquired images were developed in Python using the OpenCV library.These functions include camera calibration for generating undistorted thermal and visible images, vignetting correction for visible images, image undistortion, image registration between one and three other visible and thermal images, thermal camera calibration, temperature correction for thermal image, calibration of MS camera, image reconstruction for each band, and saving final composite images.All these functionalities are realized as jupyter notebooks and can be executed from every computer with appropriate libraries.

IV. CALIBRATIONS OF MULTISPECTRAL CAMERA
In the computer vision community procedure in which parameters for removing geometrical distortion and compensating vignetting effect are found is called camera calibration.In this article, for this, we use the term standard camera calibration, and we call radiometric calibration the procedure in which we find spectral sensitivity in each of the nine bands.

A. Standard Camera Calibration
Before calibration of the MS camera, we have to compensate for the vignetting effect of each camera in VIS/NIR and to undistort images for all cameras, including the thermal one.
The first task was accomplished using the standard method based on obtaining dark and flat images [81], [82].A dark image is an image acquired with no light entering the lens.This image, also known as fixed pattern noise, is characteristic of a CMOS sensor.The flat image represents the signal acquired from a uniformly illuminated surface with the same reflectance at each point.The flat image decreases in pixel values radially from the center of the image toward the corners as a consequence of natural vignetting [66].If I i is the acquired image (i represents a camera in a quad camera kit), I df,i is the dark frame and I ff,i the flat image, then vignetting correction can be described by the following equation: where x and y are the coordinates of each pixel, c is one of three channels (RGB), and m i,c is the mean pixel value for channel c of flat image from the camera i. Fig. 3 shows an image from the camera before and after vignetting correction.Standard lenses introduce noticeable distortion in the image in the two forms of barrel and pincushion.The first one is present in the lenses we use [see Fig. 4(a)].Correction or undistortion was performed by the usual procedure described in [58].We made regularly spaced circular holes on a flat wooden board and took several images for each camera.Fig. 4(a) and (b) shows images before and after the undistortion step.If I corr,i is the vignetting free image for camera i (including thermal one), with I un,i we denote the corrected image after undistortion where in_par_cam i are the intrinsic parameters for the camera i and undistort is a function that conducts the undistortion process [84].

B. Radiometric Calibration
Radiometric camera calibration is applied after removing the vignetting effect and undistorting images.The goal of radiometric calibration is to obtain the reflectance r m of an object under inspection, which is the ratio of radiance measured by the sensor (camera) and incident radiance where r m is spectral reflectance of the target in the band m, L C,m is the measured spectral radiance in the band m at the camera, and L in,m is the incident spectral radiance in band m.In our system m = 1, …,9.Equation ( 3) is valid when the distance between the target and the camera is small, like when using a low-altitude UAV, around 100 m.In this situation, attenuation by the atmosphere layer between the ground and the camera is negligible, as was proven in the field measurements [85] and MODTRAN simulations [86].
To calculate spectral radiance from pixel value, a linear relationship between digital number at coordinate (x,y) and radiance is used [82] where g m and τ m are the gain and the exposure time for band m, respectively, I un,m (x,y) is a pixel value of the corrected Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and undistorted image, N is the number of bits in the digital image, a m and b m are the slope and the intercept of the empirical line model (ELM) [87], respectively, as ( 4) is called in remote sensing literature.ELM is either provided by the manufacturer of the MS camera or is found in the calibration procedure.
During developing the MS camera, it is essential to verify that the applied algorithm for spectral reconstruction is efficient and highly accurate.For that, it is needed a precise, calibrated device like a high-grade hyperspectral camera, broadband light source, and several uniform reflectance boards.We use this equipment to find the radiometric transfer function for each camera in the quad kit in only one setup.The radiometric transfer function converts the digital image into one, two, or three radiances depending on how many passbands (one to three) are present on the optical filter of the camera.Here, we describe the underlying idea of the triple-band pass filter, but a similar can be applied to dual and single-band filters.
Pixel sensitivity when the triple-band filter is mounted in front of the lens is influenced by the transmissivity of the corresponding microfilter (red, green, or blue) and the transmissivity of the triple-band filter.In other words, the response at a red pixel (the same is for the blue or the green pixel) is a linear combination of responses from incident radiation in three narrow bands (defined by the triple-band filter) weighted with a transmissivity of the red microfilter in those three bands.This is demonstrated in Fig. 5, where Fig. 5(a where i represent one of four cameras (i = 1, 2, 3, 4), L 1,i , L 2,i , and L 3,i are radiances in three spectral bands in which tripleband filter i is transmitting light, ω R1,i , ω R2,i , and ω R3,i are sensitivities for the red pixel in the first, second, and third spectral bands (same is for the green and the blue pixel).This sensitivity considers the spectral characteristics of the triple-band filter and the lens together.Each time one of the components is changed, the camera must be recalibrated.We can write (5) in the matrix from where T is the radiance reaching the sensor in three spectral bands, and Ω i is the spectral calibration matrix (all stated for camera i).Here, we call Ω i spectral calibration matrix to be distinguished from the standard camera calibration matrix, which is used for image undistortion.Ω i is given by Scaled image value Y i is obtained from corrected image I un,i (x,y) = [I un,i (x,y) R , I un,i (x,y) G , I un,i (x,y) B ] T dividing each pixel value by the exposure and gain of the camera during the acquisition of the considered image Although ( 5)-( 8) are valid for all VIS/NIR cameras, when on the camera a single or a dual-band filter is mounted, we reduced the number of coefficients in the spectral camera matrix to one or four, as they are sufficient, and the transmissivity of remaining two or one pixels in RGB pattern is less than those used in the calculation.At 750 nm, the red microfilter has higher transmissivity than the green and blue microfilters.Similarly, the blue microfilter has much lower transmissivity than the green microfilter at 577 nm and the red microfilter at 690 nm [Fig.5(a)].For 750 and 577/690 filters, (8) can be written in the following forms: where i = 3 represents the camera with a single-band filter of 750 nm, and i = 2 is for the camera with a dual-band filter of 577/690 nm.The result of radiometric calibration is the S i matrix, which is inverted of Ω i as we want to get radiance in each band from values of the digital image.
Fig. 6 shows the equipment we used to acquire the needed data for spectral calibration.Although for finding parameters of ELM, only two uniform targets with calibrated spectral reflectivity are sufficient, first, we need to verify that ELM can be applied on the designed camera.As it was previously mentioned, for this verification and for finding the spectral calibration matrix, a hyperspectral camera, broadband light source, and uniform target are one of the options.We used the pushbroom hyperspectral scanner (HSS) HySpex Mjolnir V-1240 [88] with 200 channels in the 400-1000 nm range and 1240 samples in each channel.The HSS was used as a reference as it was the only available device with high sensitivity and spectral resolution that was  previously calibrated.The HSS has a 12-bit resolution, which is sufficient for calibrating the proposed MS device, whose resolution is below 10 bits (although image sensor OV9782 is 10 bits, some information is lost due to debayering).
We selected the Quarz-Tungesten-Halogen (QTH) lamp and an LED reflector as an illumination source.QTH is a broadband spectral source with much less optical power in the blue wavelength range.To compensate for this shortcoming, we added an LED reflector, as it was shown that the combination of LED and QTH sources reduces standard deviation more than twice compared to when only QTH is used, leading to a better reconstruction of the reflectance in the 400-500 nm range [89].As a calibration target, we used ColorChecker Classic, which has 24 patches with uniform reflectance.The other three targets [Fig.6(c)] were presented to obtain their spectral reflectance, as they are more appropriate to use in the field for measuring incident radiance.
Equations ( 8) and ( 9) are linear, and we used linear regression to find the spectral calibration matrix for each camera.We split a set of 24 pairs (I un,i , L i ) in the training and test subsets (with a ratio of 2:1) and estimated matrix S i for each of the cameras.The radiance value (vector L i ) for each band in (I un,i , L i ) pair was obtained from the hyperspectral image as the average value inside a uniform box in that band, while the digital value I un,i is the average pixel value from the same box in RGB image.
For quality measurement, we used the R 2 score.Results for two patches in the test set are shown in Fig. 7, together with R 2 values for all patches in the test set.R 2 measures have high values (R 2 > 0.98 for all patches except bluish green patch), indicating that a very good radiance estimation can be obtained using the suggested design.Also, this procedure, based on a high-quality hyperspectral camera, verified that ELM is appropriate model for the proposed device, and that for calibration of the same new device, only a couple of targets with uniform known reflectance are sufficient.An even cheaper option for ELM calibration is to use low-cost uncalibrated uniform targets with low-cost spectral sensor.
Fig. 8 shows images from all five single cameras (including thermal for demonstrating purposes) followed by RGB reconstructed image and reconstructed spectral reflectance in each of the nine bands.All reflectances are normalized to the range [0, 1].For visible bands, color maps were selected to present the central bands of each filter, while the nonvisible band color map is grayscale.

C. Temperature Calibration
We performed temperature calibration of a small thermal core using a high-quality thermal camera (FLIR Duo Pro R) as a reference.FLIR Duo Pro R was used as a factory-calibrated thermal camera with a temperature sensitivity of less than 50 mK, which is sufficient as the selected thermal core has a sensitivity that is at least 30% less.Both these characteristics, certified, calibration, and temperature sensitivity <50 mK, guarantee that a small thermal core can be used as a temperature sensor after calibration.Also, with this setup, we can verify the linearity of the small infrared camera.In the procedure, a black-painted aluminum plate was attached with one surface to the 50 W resistor for heating.During heating, thermograms were synchronously acquired both from FLIR and Seek cameras.
Seek thermal core and FLIR Duo Pro R used same emissivity of 0.97, a value that within ±0.03 covers many of natural surfaces like vegetation, snow, water, soil, and rock [90].During simultaneous acquisition, we collected 18 pairs of temperatures from room temperature up to 80˚C.Fig. 9 shows a highly linear relationship between temperatures with R 2 score 0.974.Using linear regression, we got a model to convert the temperature measured from Seek thermal core to a more accurate value After a successful linearity check, for each new similar device, thermal calibration in only two points is sufficient and it can be done with just one low-cost contact sensor.
All images used for calibration procedures, 3-D drawings, and codes for image acquisition are available at Zenodo open repository (https://zenodo.org/)using keyword MCAPEFA.

V. IMAGE ALIGNMENT
The optical axes of all five cameras are almost parallel, but due to the distance between each sensor, their fields of view are not entirely overlapped.The images should be first aligned for further processing.Although we first developed a spectral reconstruction procedure to verify the starting idea in designing the MS system, for final use, we perform the alignment process after image undistortion and before radiometric correction, as RGB images are richer in features compared to individual bands.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.We tested several well-established classical methods for image registration.The following techniques were selected: SIFT, ECC, POC, and from SITK library B-Spline nonrigid algorithm [91].The idea behind nonrigid registration is that different image parts should have different transformations during registration.This is valid for close-range imaging where all objects in the scene do not lie in the same plane, and it was successfully  Table III presents results for image alignment with different algorithms.The average execution time for the alignment of one image is also shown with each method.
The B-spline method is the slowest, and although it shows the best NMI score, it distorts registered images significantly, and it was not considered for the final algorithm.The highest value of the NMI score can result from the optimization process, as the B-spline method uses the NMI score in an iterative procedure, unlike other tested alignment algorithms.
After B-spline, ECC-based algorithm showed the next highest SSI and NMI but with a significantly longer execution time compared to POC, which is better in the alignment than the feature-based method, SIFT algorithm.
To visually check the registration quality, we generated an image as the average of the reference and registered image (Fig. 11).Due to the depth of field in the scene, some misalignment is present, which can be noticed in Fig. 11(a).To address this problem, we applied additional image alignment using a dense optical flow algorithm but with a previously globally registered image, so the additional displacement vector does not have a value greater than several pixels.The dense optical flow finds a vector that maps each pixel in the registered image to one in the reference image.We used the Farneback algorithm [93] for dense optical flow as it is implemented in the OpenCV library.As the initial image for optical flow, we used the outputs of the ECC and POC method.We got comparable SSI and NMI values for both algorithms after applying dense optical flow (Table III), but POC is almost six times faster.Fig. 11(b) demonstrates that misalignment from Fig. 11(a) is successfully removed.
Next, we applied the same image registration methods on image pairs acquired with UAVs.SIFT did not produce a homography matrix due to insufficient keypoints, as feature matching is sensitive to nonlinear radiometric differences in UAV-based images [88], and there not enough strong features like in close-range images.ECC-based and POC algorithms successfully overcame this problem.However, all algorithms failed to align a thermal image to VIS/NIR one.Fortunately, for most applications, UAV-acquired images are usually stitched to produce orthorectified images.With orthorectified images, there is no need to register each pair of images, only final orthorectified images, which can be achieved with ground control points.Yet, if it is a necessity to register individual images from single UAV campaign, including thermal ones, solution is to obtain distance-dependent homography [92], and to applied it based on distance from the UAV to the ground.
Considering all previously demonstrated, we selected the POC algorithm with the dense optical flow as a fast and robust method for image registration of close-range images.

VI. CONCLUSION
In this article, we have presented the design of a low-cost MS camera based on commercially available components.We show that obtaining information in nine spectral bands is possible using four optical filters.The proposed device with these nine bands and thermal measurements has a several times lower price than other solutions in the market.RPi 4 controls all the components, the code is written using Python with open-source libraries, and it is publicly available.Both hardware and software designs are modular, and the camera can be adapted to a specific application, further reducing the price.This can overcome the high barrier to broader implementation of PA among small farms (more than 86% of all farms in the EU), thus increasing their sustainability.Also, it is essential to emphasize that the presented solution provides a low-cost MS acquisition system that is easy to use in other (not PA) research, which can increase the applications of MS imaging in different scientific areas.
We demonstrated the radiometric calibration procedure for VIS/NIR cameras and temperature calibration for the thermal camera.The reconstruction algorithm calculates the response in nine narrow bands.Compared to ground truth values obtained with a hyperspectral camera, accuracy is very high.The lowest R 2 value is 0.935, but most R 2 is higher than 0.985.In addition, we proposed a two-step image registration process.In the first step, global image registration is performed using a POC method.Based on dense optical flow (Gunnar Farneback algorithm), the second step removes misalignment in close-range images due to the significant depth of field compared to the distance from objects to the camera.Although misalignment after the first step is sometimes even difficult to spot, the second step is necessary for close-range imaging of vegetation, especially at the leaf level, where this misalignment can cause the feature vector from one spatial coordinate contains MS data from different leaves.This can lead to the wrong classification and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
even to missed detection of disease or other issues in the initial phase, introducing higher economic loss or ecological risk than is necessary.Furthermore, the proposed two-step alignment does not require special hardware and is easy to implement as both algorithms are part of the available Python libraries.
For our future article, two improvements are necessary.We used a hyperspectral camera and a high-quality thermal camera to calibrate the proposed device.During the development, they are needed to verify that the model for MS image reconstruction is valid and to assess the linearity of the small thermal core.However, these devices are very expensive, and new methods are mandatory to provide affordable tools for a wider group of users.Also, after verification of design, for each newly assembled device, it is only necessary to calibrate each imaging sensor in two points.We planned to develop the radiometric calibration procedure based on a low-cost MS sensor (around 70 euros) with 18 narrow bands in VIS/NIR range [94].Similarly, a single contact temperature sensor and plate with a temperature-regulated heater are sufficient for thermal camera calibration, and we will also design a procedure for generating the thermal camera's transfer function.All of that will provide a low-cost MS and thermal imaging solution together with calibration procedures.

Fig. 1 .
Fig. 1.Spectral transmittance of the four selected filters of the proposed camera.

Fig. 2 .
Fig. 2. Design of the proposed multispectral camera: (a) position of each camera inside the housing, on the top is thermal core; and (b) camera with enclosure.

Fig. 5 .
Fig. 5. Spectral transmissivity of RGB pixel with triple-band 550/660/850 filter on it.(a) Standard spectral transmissivities for RGB Bayer pattern and triple-band 550/660/850 filter.(b) Spectral transmissivity for each pixel in RGB pattern when both spectral transmissivities of microfilters and triple-band filters are considered (remark: in the wavelength range 800-900 nm, all three microfilters have almost the same spectral transmissivity).
) presents standard transmittance for the RGB filters in the Bayer pattern together with spectral transmissivity for triple band filter 550/660/850, while Fig. 5(b) illustrates how responses from both filters shape the spectral transmissivity on each pixel.Digital scaled values Y = [Y R , Y G , Y B ] at each RGB pixel can be calculated using following equations:

Fig. 7 .
Fig. 7. (a) Measured and predicted spectral radiance for two patches in test set.(b) Obtained R 2 score on all patches in the test set.

Fig. 8 .
Fig. 8. From top to bottom: (a) to (e) images for one scene from each of single camera including thermal image, (f) RGB reconstructed image, (g) to (o) normalized reflectance in each band, where band is shown at the top of each image.