Towards Visual Inspection of Wind Turbines: A Case of Visual Data Acquisition Using Autonomous Aerial Robots

This article presents a novel framework for acquiring visual data around 3D infrastructures, by establishing a team of fully autonomous Micro Aerial Vehicles (MAVs) with robust localization, planning and perception capabilities. The proposed aerial system reaches high level of autonomy on a large scale, while pushing to the boundaries the real life deployment of aerial robotics. In the presented approach, the MAVs deployed around the structure rely only on their onboard computer and sensory systems. The developed framework envisions a modular system, combining open research challenges in the fields of localization, path planning and mapping, with an overall capability for a fast on site deployment and a reduced execution time that can repeatably perform the mission according to the operator needs. The architecture of the established system includes: 1) a geometry-based path planner for coverage of complex structures by multiple MAVs, 2) an accurate yet flexible localization component, which provides an accurate pose estimation for the MAVs by utilizing an Ultra Wideband fused inertial estimation scheme, and 3) visual data post-processing scheme for the 3D model building. The performance of the proposed framework has been experimentally demonstrated in multiple realistic outdoor field trials, all focusing on the challenging structure of a wind turbine as the main test case. The successful experimental results, depict the merits of the proposed autonomous navigation system as the enabling technology towards aerial robotic inspectors.

Nowadays, Micro Aerial Vehicles (MAVs) are gaining more and more attention from the scientific community, constituting a fast-paced emerging technology that constantly pushes their limits for accomplishing complex tasks [1]. These platforms are characterized by their mechanical simplicity, agility, stability and outstanding autonomy to reach remote and distant places. Endowing MAVs with proper sensor suites, while navigating in indoors/outdoors, cluttered and complex environments, could establish them as a powerful aerial tool for a wide span of applications. Some characteristic examples of application scenarios for such a novel deployment of the aerial technology include infrastructure The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung .
One of the most common application areas that MAVs are employed, is in the filming industry, but there are efforts from other industries such as Mining, Oil, and Energy Providers, to invest in the commercialization of MAVs to perform remote inspection applications. Towards this vision, MAVs are powerful tools that have the profound potential to decrease the risks of human life, decrease the execution time and increase the efficiency of the overall inspection task, especially when compared to conventional methods [7]. Despite the fact that the research in the aerial robotics has reached significant milestones regarding localization [8], planning [9] and perception [10], successful real-life demonstrations of autonomous inspection systems have been rarely reported in the literature, with the majority of the applications focusing on impressive laboratory trials under full control environments and in most of the cases under the utilization of expensive motion capturing systems [11] or small scale and well defined outdoor environments [12], [13].
In [14] a UWB state estimation framework has been proposed for a quadrotor platform. The method used the dynamic model of the platform together with measurements from an IMU and the UWB to observe the platform's state. The method included the thrust constant estimation for predicting the acceleration of the vehicle. Similarly, in [15] an UWB-Inertial localization scheme has been proposed. The method is based on a non-linear observer that fuses measurements from the accelerometer, the gyroscope and the magnetometer with the UWB ranges. In [16] an UWB localization framework for two robot state estimation has been proposed. In this work the main aim is to extract the relative position between the robots, considering an anchor robot and a tag robot. In [17] a high accuracy and energy efficient localization scheme has been develloped. The system is based on UWB using the time-difference of arrival (TDOA) topology in which the mobile nodes transmit periodically and the infrastructure-based nodes are mostly passive.
In the related literature there have been many works that addressed the CPP problem in 2D spaces [18] and fewer approaches that addressed coverage of 3D spaces. In [19], a complete survey was presented on CPP methods in 2D and 3D. Towards the 3D CPP, Atkar et al. [20] presented an off-line 3D CPP method for the spray painting of automotive parts. Their method used a CAD model and the resulting CPP could satisfy certain requirements for paint decomposition. In [21], the authors presented a CPP with real time re-planning for inspection of 3D underwater structures, where the planning assumed a priori knowledge of a bathymetric map using an autonomous underwater vehicle, while their overall approach contained no branches. The authors in [22] introduced a new algorithm for producing paths that cover complex 3D environments. The algorithm was based on off-line sampling for autonomous ship hull inspection, while the presented algorithm was able to generate paths for structures with unprecedented complexity.
Observing the related literature it seems that there is a continuously evolving effort to deploy MAVs for infrastructure inspection [23]. In this survey although multiple application scenarios are listed for MAVs, the wind turbine case is not presented. In [24] a system for Micro Aerial Vehicle infrastructure inspection has been proposed based on an ultrasonic beacon network for localization, a CNN for damage detection and a geo-tagging method for damage localization. The authors report that due to regulations they could not perform field tests providing results from constrained lab environment, in contrary to the proposed system that has been experimentally deployed in a real wind turbine infrastructure. Nevertheless, performing experiments in real infrastructures reveals issues and lessons learnt that are not visible during the lab tests. In [25] present a method for localizing an aerial vehicle while simultaneously performing model fitting of a wind turbine using a skeletal parametrization. The authors in this work use a CNN network to estimate the projections of the skeletal model into image frames from the onboard camera. The method is based on graph optimization that considers measurements from the IMU and the GPS. In this work the method has been evaluated using datasets collected from already performed flights, therefore the method was not used to collect the datasets. Similarly, in [26] the authors proposed a method to estimate the relative position of the aerial platform and the wind turbine, as well as the position of the blades, based on Hough transform, while evaluating the method on already pre-recorded data and without using the method for the dataset collection. In [27] the authors proposed a method vision based method for MAV navigation around wind turbines. In this system the authors combined visual odometry with Hough transform to detect the position of the hub and the angle of the blades aiming to identify the relative position of the platform with the infrastructure. The overall goal was to extract 3D waypoints for the platform to visit, while using a path planner for generating the trajectories. This method has been experimentally verified in a downscaled 3 meter tall mocap wind turbine, showing the difficulty and complexity of getting access to such infrastructures. In [28] an aerial inspection system has been developed for dataset collection and damage analysis through histogram processing of images. In this work three aerial vehicles have been deployed to collect images from an infrastructure. The authors selected a bridge infrastructure which differs from a wind turbine since it scales horizontally in urban scenery and is more relevant for vision based approaches compared to high altitude wind turbines surrounded by forest. Similar work is presented in [29].
Thus, one of the most important contributions of this article is the establishment of an aerial system capable to visually pre-inspect an outdoors large scale infrastructure, through the coordination of multiple aerial vehicles. Towards this contribution, the article will further contribute with the implementation of a novel and accurate localization enabled scheme for collaborative aerial based visual data collection of infrastructure, a scheme that is based on Ultra WideBand (UWB) distance measurements and Inertial Measurement Unit (IMU) sensor fusion. In this approach, the aerial platforms navigate autonomously based on the UWB-Inertial fused state estimation, using a local UWB network, placed around the structure of inspection. A second contribution of this article is the experimental evaluation of a Collaborative Coverage Path Planner (C-CPP) algorithm that has the ability to guarantee the full coverage of the infrastructure by considering camera, geometry, collision, and other application posed constraints. The coverage path is generated for every MAV, based on the structure geometric characteristics, while identifying and assigning parts of the structure to different agents, leading to faster mission execution. Additionally, the coverage path planner allows to define the desired overlap between spatially adjacent frames in the dataset, which is needed for 3D model generation. The final contribution stems from the real life successful demonstration of a fully functional onboard visual sensor scheme that it is able to have the dual role of providing: a) low resolution compressed data for the visual assessment of the structure, and b) high resolution for post processing e.g. build 3D models and area image stitching. The overall concept of the proposed collaborative aerial system is presented in Figure 1, where two collaborative MAVs are performing an aerial data acquisition around a wind turbine with a corresponding video . 1 The rest of the article is structured as follows. The overall system is described in Section II. More specifically, Section II-C presents the geometric approach for the C-CPP problem for infrastructure navigation. Section II-D provides an analysis on UWB fused inertial based localization for aerial platforms, while Section II-E establishes the 3D reconstruction problem from multiple images and multiple MAVs. Section III demonstrates the experimental setup and presents the experimental trials for the proposed system. Finally, the concluding remarks are presented in Section V.

II. AERIAL DATA ACQUISITION SYSTEM
This article, inspired by the increasing capabilities of MAVs, establishes an autonomous aerial system, which is specialized in large scale industrial facilities. The system is realized by either a single agent or a team of agents and is characterized by advanced localization and structure coverage capabilities, all demonstrated in real life by inspecting a wind turbine power plant, where the aim of the system is to provide visual data to infrastructure owners for further analysis and asset management. The overall scheme of the proposed system is depicted in Figure 2.

A. FIELD TRIALS AND OPEN CHALLENGES
During the development of the proposed aerial framework, the wind turbine site located in Bureå, Sweden have been visited multiple times. In these sites, the wind speed was 1 https://youtu.be/z_Lu8HvJNoc  Operating MAVs outside the lab, and especially around large scale infrastructures such as wind turbines, raises significant multidisciplinary research issues where one of the most important is to provide an accurate localization system that at the same time would be easily deployable. At the wind turbines the GPS solution fails at low height due to the multipath errors, which happen when the GPS receiver cannot distinguish a direct signal from a reflection, a fact that causes significant errors in the measurements. Usually, the GPS works well in positions where the interference from the building is small enough, however this is only at significant heights in this case. Moreover, the trending technology of visual based odometry, opposite to GPS, cannot provide reliable localization feedback in high altitudes. These algorithms base part of their processing in visual measurements by detecting areas of high contrast and texture [30], to extract visual landmarks/features in the image used for the rotation and translation estimation. The feature-based methods aim to find distinctive points in the image and determine their position in pixels. The identified pixels should be uniquely described in a way that can be found in adjacent frames, usually tracking them using vectors that describe the local region around the identified points consisting the feature descriptors [31]. Most common features found in literature are Harris corner detector [32] FAST [33], ORB [34], BRIEF [35], SURF [36]. Features are usually denoted by a) pixel locations that correspond to the local maximum of the first derivative, b) intersection of edges, c) gradient rate of change and direction.
More specifically, in high altitudes this processing becomes unreliable, since they cannot detect and extract distinctive features from the environment due to lack of feature-rich local surfaces/areas, e.g. in the case of wind turbines which are simply described by a flat white color. This makes it difficult for the visual inertial odometry software to converge its state of movement to the actual state. As depicted in Figure 3, the detected features are far-away, while there are no features on the wind turbine tower itself except for unstable boundary features, and egomotion causes very little feature movement to the background.
Furthermore, the challenges of the visual sensors, identified for localization, extends also to other visual processing tasks, such as 3D reconstruction, where during the performed experimental trials it was found that the depth and stereo sensors failed to provide a solid 3D model of the wind turbine. Additionally, MAVs provide a limited flight time, which can be affected by external disturbances, such as wind gusts, payloads and temperature of the environment. This limits the feasibility of the mission with one MAV, especially in large scale structures, such as wind turbines. Moreover, strong wind gusts cause significant drift of the MAV from the predefined trajectory and it should be compensated by the MAV's position controller. Thus the limited flight time and the deviation from the trajectory should taken under consideration or the overall system can fail to perform the task, or even worse result in a collision with the infrastructure that might cause damages to both the infrastructure and the aerial platform.

B. SYSTEM HARDWARE 1) MAV
For the envisioned aerial inspection system for large scale infrastructures, the Ascending Technologies NEO hexacopter was utilized as the MAV platform, where in Figure 4 the overall specifications and the selected sensors are presented. This platform is capable of providing a flight time of up to 26 min without payload and in ideal conditions, with a maximum payload capacity up to 2 kg. For onboard processing, the belly of the MAV contains an Intel NUC computer with a Core i7-5557U and 8 GB of RAM that runs Ubuntu Server 16.04 with the Robotic Operatic System (ROS) as its core. The platform has been equipped with a large set of different sensors, as depicted in Figure 4, where each component will be explained in the sequel.

2) LOCALIZATION SYSTEM
Due to the feature-less surface of the wind turbines for visual odometry and the existence of multipath errors in the GPS measurements, as was discussed in the prequel, the localization algorithms based on cameras and GPS failed during the field trials, and thus the proposed localization system was based on UWB and IMU fusion. This component is extensively explained in Section II-D.

3) SENSOR SUITE
The proposed sensory suite for the aerial system included 3 different cameras: a) the Visual-Inertial (VI) sensor, b) the GoPro Hero4, c) the PlayStation Eye, and an additional laser range finder RPLIDAR, as depicted in Figure 4. The VI sensor developed by Skybotix AG with a weight of 0.117 kg was attached below the hexacopter with a 45 • tilt from the horizontal plane, which is a monochrome, high dynamic range, global shutter stereo camera with 120 • DFOV and with a resolution of 752 × 480 pixels, moreover it is housing an Analog Devices ADIS16445 tactical grade IMU. Both cameras and IMU were tightly temporally aligned with hardware synchronization, while the cameras were operated at 20 fps. The GoPro Hero4 camera was attached on top of the hexacopter facing forward with a weight of 0.2 kg, while it was capable of recording high-definition video at various resolutions, ranging from 720p to 4000p and at a rate of 15-120 fps, while during the experimental trials the camera was operated with a 2K resolution at 30 fps. The Playstation Eye camera was attached in the middle of the hexacopters housing, facing forward with a weight of 0.150 kg, this camera was operated at 20 fps and with a resolution of 640×480 pixels. The variety in the specifications of the camera suite was motivated by the need to test their performance under challenging conditions, regarding the dataset collection. Thus, the main aim was to use the captured frames for direct visual inspection by experts in the structure maintenance, while the data from the VI sensor and the GoPro camera were also used to provide 3D models of the inspected parts. Finally, RPLDAR was a low cost laser sensor, which provides a 360 • scan field at a 5.5 Hz/10 Hz rotating frequency with guaranteed 8 meter range. This laser scanner has also been tested during the experimental trials for enabling the online obstacle avoidance schemes.

C. COOPERATIVE COVERAGE PATH PLANNER
Towards the vision of the inspector MAV, the theoretical framework established in [2] is integrated in the autonomous framework and experimentally tested in the complex case of a wind turbine structure. The major difference of the application scenario is the scale difference between the campus fountain and a real wind turbine, 10 meters height compared to 60 meters height respectively. Moreover, another consideration among those two types of infrastructure is the location, where the fountain is located in a public space at the university, while the wind turbine is located in a windy private place without any public access. Briefly, the coverage scheme is capable of providing a path for accomplishing a full coverage of the infrastructure, without any shape simplification, by slicing it by horizontal planes to identify branches of the infrastructure and assign specific areas to each agent. Complicated structures have multiple branches e.g. in wind turbine the base and each blade are considered as branches, where the proposed method identifies these branches and assign paths to n agents. If the structure has one branch all n agents are assigned to the same branch, otherwise the n agents are equally distributed to different branches. Furthermore, to guarantee a full coverage to facilitate visual processing, the introduced path planning creates for each agent an overlapping visual area. The novel established C-CPP scheme, in addition to the position references, provides also yaw references for each agent to assure a field of view, directed towards the structure surface.
For the use of the C-CPP, initially the general case of an aerial platform equipped with a limited Field of View (FOV) sensor was considered, determined by an aperture angle α and a maximum range r max . Furthermore, ∈ R + is the user-defined offset distance ( < r max ), from the infrastructure's target surface and λ is the distance between each inspected plane. λ is equal to β tan α/2, where the parameter β ∈ [1, +∞) represents the ratio of overlapping. The horizontal planes are defined as λ i , with i ∈ N. The 3D map of the infrastructure is provided as a set S with a finite collection of points, denoted as S = {p i }, and p i = [x i , y i , z i ] ∈ R 3 . Furthermore, C j (x, y, z) with j ∈ [1, m] are the points in each branch and m is the overall number of branches in the structure. The proposed C-CPP method has been entirely implemented in MATLAB. The inputs for the method are a 3D approximate model of the object of interest and specific parameters, which are the number of agents (n), the offset distance from the object ( ), the FOV of the camera (α), the desired velocity of the aerial robot (V d ) and the position controller sampling time (T s ). The generated paths are sent to the NEO platforms through the utilization of the ROS framework. A graphical overview of this C-CCP scheme is presented in Figure 6.

D. UWB INERTIAL ODOMETRY FRAMEWORK
UWB Radio Frequency (RF) communication is based on using a wide band of the RF spectrum, rather than a single frequency as a carrier wave radio does, which has the temporal representation of a pulse and as a result is sometimes referred to as a pulse radio. Due to the high center frequency (3.1 to 4.8 GHz and 6.0 to 10.6 GHz) and the spectral width of the pulse (499.2 to 1331.2 MHz) the pulses have good spatial resolution, which makes them ideal for time stamping RF packets, referred to as messages, with high accuracy. This property of accurate timestamps, together with good reference clocks, give the ability to estimate the distance between two transceivers by exchanging 2 or more packets and thus it could be considered that the distance estimation is a byproduct of communication.
Furthermore, one major drawback of a carrier wave based radio is the problem of multipathing, where the carrier wave forms destructive interference with itself, effectively reducing the received signal strength, or introducing an unknown phase shift. This is a problem that is severely mitigated in the UWB radio, where the spatial length of each pulse is small enough for each pulse to be detected uniquely and this allows the receiver to reconstruct the pulse from multiple reflections. In a sense, the more the reflections are available, the stronger the received signal is, in contrary to GPS, which can give highly misleading measurements when close to tall structures.
In Figure 4, the UWB node developed by LTU is depicted when mounted on the MAV. This hardware contains all the embedded electronics including the microprocessor, 3-axis accelerometer, 3-axis gyroscope, the UWB RF transceiver and the antenna to enable the UWB communication and localization, while this system is fully self contained and can directly be deployed for enabling full localization of the MAV state.
For a proper operation of the estimation framework, it is needed to have the UWB transceivers with known and fixed positions, called anchors (while the transceiver on the MAV is called a tag), spread out in the working area to act as known positions to measure distances for the later trilateration and fusion with the IMU, as described in [37]. This configuration is directly analogous to GPS, while here the ''satellites'' (anchors) are placed as needed within the operating volume, conceptually presented in Figure 7. The state vector of the UWB-Inertial Odometry is formulated as shown in where p U , u U ∈ R 3 refer to the position and velocity in the coordinate frame of the UWB anchor network, q T UI ∈ SO(3) refers to the quaternion of the relative attitude between the IMU coordinate frame and the UWB coordinate frame. b T ω , b T a ∈ R 3 are the biases for the gyroscope and the accelerometer respectively.
The UWB-Inertial Odometry framework in this work considers the Error State Kalman Filter formulation proposed in [38] where the state is re-formulated as a nominal part and an error part (x = x n ⊕ δx), where the nominal part integrates the IMU measurements and the errors are observed through the UWB distance measurements. The error state is shown in where δq = 1, 1 2 δθ T is the minimal state representation using the small angle approximation of the error quaternion.
The error-states are observed through the UWB distance measurements (Equation 3) which is a function of the nominal and error states while including the distance between the IMU and the center of the UWB antenna.

E. SURFACE RECONSTRUCTION
As stated in the prequel, this work targets the application scenario of autonomous data acquisition by single or multiple MAVs, where the objective of the missions is the collection of high resolution visual data of regions of interest and the generation of 3D surface models. All available data will be used afterwards by inspection experts to analyze and detect possible defects on their assets. To this end, each aerial platform is equipped, but not limited, with a camera to record the required data from the infrastructure. During the navigation of the MAVs around the structure, the raw visual stream is directly available for defect assessment. Regarding the surface reconstruction, the main approach to process the data considers a monocular camera VOLUME 8, 2020 Structure from Motion (SfM) [39], where the MAVs fly around covering specific parts, with the aim to collaboratively process all the captured data into a global representation. The selection of monocular mapping is driven by the application scale and the object characteristics. Generally, the perception of depth using stereo cameras is bounded to the stereo baseline, essentially reducing the configuration to monocular at far ranges and to this end, stereo algorithms cannot perform in cases with large structures and high altitudes. The employed SfM approach is an offline process that provides a sparse 3D reconstruction and accurate MAV poses, by using different camera viewpoints and consists of a massive optimization process. Finally, the data collected during the navigation mission is down sampled, since they contain redundant information from all the camera frames and there is a need to keep the resulting outcome within a reasonable time, while the sparse pointcloud is inserted into Multi View Stereo (MVS) algorithm Clustering Views for Multi-view Stereo (CMVS) [40] to provide a densely reconstructed 3D model, by clustering an image set into overlapping view clusters and applying MVS algorithms.

F. SYSTEM SOFTWARE
The navigation system of the aerial platform is integrated within the ROS framework, where two main components provide autonomous flight, namely an UWB inertial odometry estimator, where a specific implementation of the ESKF is used based on the Multi Sensor Fusion Extended Kalman Filter (MSF) [41] and a linear Model Predictive Control (MPC) based position controller [42]. The sensor fusion node consists of an EKF filter that does tight inertial fusion from the hexacopter's IMU during the state propagation and the UWB range measurements are utilized during the filter correction step. The outcome of the UWB inertial odometry are the position, orientation (pose), the linear/angular velocity (twist) of the aerial robot and the IMU biases. This consists of an error state Kalman filter performing sensor fusion as a generic software package that has the unique feature to handle delayed measurements, while staying within the desired computational bounds. The linear MPC position controller [42] generates attitude and thrust references for the NEO's predefined low level attitude controller, with the aim to have separation of concerns, as the high level control and planning algorithms should have minimal knowledge of the low level controllers. The overall functional schematic of the experimental setup is presented in Figure 8 and the system architecture is described in Algorithm 1.
The C-CPP method, described in Section II-C, has been entirely implemented in MATLAB. The inputs for the method are a 3D model of the infrastructure of interest and specific parameters, which are the number of agents (n), the offset distance from the object ( ), the FOV of the camera (α), the desired velocity of the aerial robot (V d ) and the position controller sampling time (T s ). The generated paths are sent to the NEO platforms through the utilization of the ROS framework. Onboard actuation commands u generation 5: Record onboard visual data 6: end while 7: return LAND 8: if LANDED then 9: Visual Data Post Processing for 3D map generation 10: end if

A. MISSION PRELIMINARIES
The presented aerial platform with the sensor systems and combined with the developed algorithmic components, described in previous section, constitutes the autonomous aerial system. The capabilities of the system have been publicly demonstrated for the case of wind turbine infrastructure in Sweden, where the mission scenario was two-fold by targeting the coverage of two separate parts of the structure,  namely the wind turbine tower and the wind turbine blades. The requirements for the system were to provide a complete coverage of the inspected parts autonomously, while storing all necessary visual data for further analysis. Although, two agents were used for the specific case presented in this work, the presented inspection system can operate either in a single agent or multi-agent mode, depending on the application needs and the flying limitations of the MAVs.
The initial step for the deployment of the system was to setup the ground station for monitoring the operations and fix 5 UWB anchors around the structure, with specific coordinates presented in Table 1, which constitute the infrastructure needed for the localization system of each aerial platform. The number of anchors as well as their position has been selected in a manner to guarantee UWB coverage around all parts of the wind turbine. From a theoretical point of view [37], only 3 anchors are needed, however it is common that one anchor will be behind the wind turbine for the MAV's point of view, which gives rise to a minimum of 4 anchors to compensate, while a fifth anchor was added as redundancy. The resulting fixed anchor positions provide a local coordinate frame that guarantees repeatability of the system, and with the significant ability to revisit the same point multiple times, in case the data analysis shows issues that require further inspection. An important note for all the cases on the wind turbine and for the system in operation is that the blades are locked in a star position, as shown in Figure 7, which simplifies the 3D approximate modeling of the structure.
In the proposed architecture, all the processing necessary for the navigation of the MAVs is performed onboard, while the overview of the mission and the commands from the mission operators (inspectors) is performed over a WiFi link, while the selection of WiFi is not a requirement and can be replaced with the communication link of choice e.g. 4G cellular communication. The UWB based inertial state estimation runs at the rate of the IMU, which in this case was 100 Hz, and the generated coverage trajectory has been uploaded to the MAV before take-off, which is followed as soon as the mission started by the command of the operator. The paths have been followed autonomously, without any intervention from the operators on the site, and the collected data have been saved onboard, while after downloading the mission data post processing is performed in the ground station or in the cloud. The data provided by the system can be used for position aware visual analysis, examining high resolution frames or they can be post-processed to generate 3D reconstructed models. The key feature to be highlighted from the task execution is that any detected fault can be fully linked with specific coordinates, which can be utilized by another round of inspections or for guiding the repair technician. The final, is a major contribution of the presented aerial system, since this need is the fundamental information that is needed towards enabling a safe and autonomous aerial inspection that has the potential to performed the human based ones.

B. WIND TURBINE VISUAL DATA ACQUISITION
For the specific case of wind turbines the C-CCP generated paths have been obtained with two autonomous agents in order to reduce the needed flight time, and still be within the battery constrained flight time of the utilized MAV. However, due to the limited flight time of the MAVs in the field trials, the navigation problem has been split into the tower part and the blade part, where the specifics of each is presented in the sequel Table 2, while both can be performed at the same time with more MAVs to reduce the mission time even further. A common characteristic for both of the cases is that the generated path for each MAV keeps a constant safety distance from the structure, while at the same time is keeping it in view of the visual sensors, and maximizing the safety distance between agents, which gives rise to the agents being on opposite side of the wind turbine at all times. The area in which the field tests were performed is generally of high wind and while the tower part is protected from wind, owing to the forest, the blade part is above the tree line. Thus, the aerial platform have been specifically tuned to compensate strong wind gusts that were measured up to 13 m/s, where the tunning was targeting the MAV's controller's weight on angular rate that has been increased to significantly reduce the excessive angular movement.

1) TOWER COVERAGE
In the specific case of the wind turbine base and tower coverage, the generated paths are of a circular shape, as depicted in Figure 9, which is the result of the constant safety distance from the structure based on the C-CCP algorithm. As can be seen from the tracked trajectories the controllers perform well with an RMSE of 0.5464 m, while at the top of the trajectory a more significant error can be seen that is induced from the specific MAV transitioning above the tree-line, where a wind gust caused the deviation from the desired trajectory where the MAV compensates and finishes it's coverage trajectory.
From the depicted reconstruction in Figure 9, it is possible to understand that the base of the wind turbine, which is feature rich, provides a good reconstruction result, while as the MAV continues to higher altitudes, the turbine tower loses texture due to its flat white color, causing the reconstruction algorithms to not provide a successful reconstruction. However, the visual camera streams do have position and orientation for every frame, as depicted in Figure 9 for some instances, which allows for a trained inspector to review the footage and be able to determine if there are spots which need extra inspection or repairs. For the reconstruction in Figure 9, the [43] and [39] algorithms have been used, the former for pre-processing the images for enhance their contrast, while the latter was the SfM approach for providing the 3D model of the structure. The reconstruction took place on a PC with the configuration i7-7700 CPU and 32 GB of RAM, where the processing lasted approximately 4 hours.

2) BLADE COVERAGE
Compared to the base and tower coverage, for which the C-CCP algorithm generated circular trajectories, a similar approach was followed for the base case. This comes from the fact that this task is performed on the blade with a direction towards the ground and with the trailing edge of the blade towards the tower, which would cause the C-CCP algorithm to generate half-circle trajectories. However, in this case the same agent can inspect the final part of the tower by merging both tower and blade trajectories, as can be seen in Figure 10, while minimizing the needed flight time and demonstrating at a full extend the concept of aerial cooperative autonomous inspection. With the available flight time of the MAV, it is possible to inspect the blade with only one operating MAV, allowing for the safety distance between agents to be adhered FIGURE 9. Coverage paths followed by 2 agents with actual (solid) and reference paths (dashed) together with desired direction, which resulted in the depicted 3D reconstruction and sample camera frames of the base and tower to be used by the inspector.
to, by the separation of the inspected parts. However, during the blade coverage task, the tracking performance of the MAV was reduced to an RMSE of 1.368 m, due to the constant exposure to wind gusts and the turbulence generated by the structure, and as these effects were not measurable, until the effects are observed on the MAV, it has reduced the overall observed tracking capabilities of the aerial platforms. The second effect of the turbulence was the excessive rolling and pitching of the MAV, which introduced a significant motion blur in the captured video streams, due to the fixed mounting of the camera sensor, introducing the need for adding a gimbal for stabilizing the camera and reducing the motion blur. Finally, as can be seen in the camera frames in Figure 10, there are no areas of high texture on the wind turbine tower or blades which caused 3D reconstruction to fail. However, the visual data captured is of high quality and suitable for review by an inspector.

IV. LESSONS LEARNED
Throughout the experimental trials for this application scenario, many different experiences were gained that assisted in the development and tuning of the algorithms utilized. Based on this experience, an overview of the lessons learned is provided in the sequel with connections to the different utilized field algorithms.

A. MAV CONTROL
When performing trajectory tracking and position control experiments indoors a dedicated laboratory many disturbances, which are significant in the field trials, can be neglected and this is especially true for strong wind gusts and turbulence caused by the structure. In the case of indoor experimental trials, the MAV can be tuned aggressively to minimize the position tracking error, while in the full scale outdoor experiments this kind of tuning would provide excessive rolling and pitching due to the controllers trying to fully compensate for the disturbances. However, this has the side effect of making the movements jerky and oscillatory, and overall reduce the operator's trust in the system as it seems to be close to unstable. Furthermore, in the case that the controllers were tuned for a smooth trajectory following, larger tracking errors would have to be accepted in the trajectory following. During the field trials, some wind gust can even be above the operational limits of the MAV, causing excessive errors in the trajectory tracking. To reduce the effect in the outdoor experiments, the controller's weight on angular rate was increased to significantly reduce the excessive movement, while in general the tuning of the high level control scheme, for the trajectory tracking, is a tedious task and it was found to be extremely sensitive to the existing weather conditions.

B. PLANNING
The path planner provides a path to guarantee for a full coverage of the structure, however in the field trials, due to high wind gusts, there are variations between the performed trajectory and the reference. Thus, there is a need for an online path planner for considering these drifts and re-plan the path or to have a system that it is able to detect if a specific part of the structure has been neglected and provides extra trajectories to compensate. Additionally, due to the payload, the wind gusts and the low ambient temperature, the flight time was significantly less than the expected value from the MAV manufacturer. In certain worst cases, this time was down to 5 minutes, which is a severe limitation that should be considered in the path planning and task assignment to correctly select the correct number of agents for achieving a full coverage of the infrastructure.

C. SYSTEM SETUP
One of the most challenging issues when performing large scale infrastructure inspection is to keep a communication link with the agents performing the inspection, which is commonly used for monitoring the overall performance of the system. In this specific case, WiFi was the communication link of choice, mainly due to its simplicity of directly performing as expected, however it was quickly realized that the communication link was unstable due to height or occlusion of the MAV behind the wind turbine tower. To mitigate this issue, a different communication link should be used, e.g. the 4G cellular networks, and while WiFi can be used to upload mission trajectories it is not a reliable communication link at this scale.
Moreover, if it is desirable that the same mission can be executed again, the positions of the UWB anchors need to be kept. One possible way to achieve this is to consider the UWB anchors as supporting part of the infrastructure and have them permanently installed around the wind turbines, or to re-calibrate and consider the wind turbine as the origin, while only compensating for the rotation of the wind turbine depending on the mission setup.

D. 3D RECONSTRUCTION
Various visual sensors have been tested in the challenging case of wind turbine. The most beneficial sensor proved to be the monocular camera system. More specifically, the fixed baseline for stereo cameras can limit the depth perception and eventually degenerate the stereo to monocular perception. The reconstruction performance can also vary slightly, depending on the flying environment due to visual feature differences, therefore a robust and reliable, invariant to rotations feature tracker should be used. Another important factor for the reconstruction is the camera resolution, since it poses the trade off between higher accuracy and higher computational costs. Additionally, the path followed around the structure affects the resulting 3D model, which in combination with the camera resolution can vary the reconstruction results. Generally, the cameras should be calibrated and it is preferred to have set manual focus and exposure to maintain the camera parameters for the whole dataset. For SfM techniques it is required a large motion in rotation and depth among sequential frames to provide reliable motion estimation and reconstruction.
Moreover, a low cost LIDAR solution, that was tested during the field trials, failed to operate due to sunlight interfering with the range measurements. This sensor technology, should be further examined with more tests since they could be useful in obstacle avoidance and cross-section analysis algorithms.

E. LOCALIZATION
While UWB positioning was the main localization system in the presented approach, it should be noted that this should not operate stand-alone. In the case of infrastructure inspection, one reference system should not act as a single point of failure, and it should be the aim to fuse as many sensors as possible. In the case of a wind turbine, the GPS does not provide a reliable position until the MAV is at significant height and the UWB localization system works best at lower height, hence it should be the aim to fuse both and utilize the sensor that is performing optimally depending on the current height. Moreover, neither UWB localization nor GPS provides a robust heading estimate, and the wind turbine causes magnetic disturbances that causing the magnetometers to fail and thus in this case visual inertial odometry is a robust solution to provide heading corrections since the landscape can be used as a stable attitude reference.

V. CONCLUSION
This work presents a framework for autonomous aerial visual data acquisition of a 3D infrastructure by utilizing multiple MAVs. To address this problem, the developed framework combined the fundamental tasks of path planning, localization and visual perception. Initially, a geometry-based path planner was employed for the collaborative coverage of complex structures, while the navigation of the platform has been performed through a localization component which provided accurate pose estimates of the MAVs by using a UWB-Inertial estimation scheme. Moreover, the defined task considered compressed visual data streaming and visual data post processing for 3D model building. The performance of the proposed framework has the significant merit of being experimentally evaluated in realistic outdoor large scale infrastructure inspection experiments.