Comparative Evaluation of Tracking Capability of Spatial Patterns on Defective Urban Solar Panels between Unmanned Aerial Vehicle Video Stream and Photomosaics

requires a specific altitude when inspecting defective urban solar panels to avoid obstacles such as high-rise buildings, trees, and telegraph poles. Therefore, autopilot-based thermal imaging has severe data redundancy because the non-solar panel area occupies more than 99% of the ground target. We aim to explore the tracking capability of a UAV video stream for defective urban solar panels by comparing spatial and clustering patterns with autopilot-based photomosaics. The spatial patterns of distributions and clusters in defective solar panels have high similarity (80–100%) to those of autopilot-based photomosaics. The results of this study can serve as a valuable reference for video-stream-based thermal deficiency inspections of defective solar panels in urban areas


Introduction
A solar panel defect can be broadly defined as any abnormality on the panel. Common defects include mismatch, cracks, discolorations, snail trails/tracks, and soiling. (1,2) These defects can decrease the power generation efficiency of solar modules. Solar panels convert photons into electricity by exciting electrons in the atoms of a semiconductor material. The energized electrons then generate an electric voltage and current, and the former is transmitted to the inverter. If the solar panels are short-circuited or malfunctioning, abnormal heat generation due to overloading occurs. This heat is manifested as thermal hot spots-also known as thermal deficiency-on the solar panels. Accordingly, thermal imaging is commonly utilized to detect defective solar panels by identifying the thermal deficiency.
Autopilot-based thermal imaging using still images is now a standard procedure, replacing in situ visual inspection and I-V curve tests because of its shorter time and higher cost efficiency. (3,4) Autopilot flight is conducted along predefined waypoints following a specific flight plan for the target area. Typically, urban solar panels are scattered and account for only 1% of the total roof area in a city. Urban solar panels only take up 10% of the rooftop surface in a standard single-family house with six or fewer panels (1 m width and 1.6 m height). (5) The target area covered by autopilot flight may be at least several hundred square meters (for example, 30 × 30 = 900 m²). The possible keypoints are obtained from less than four predefined waypoints. (6) Additionally, autopilot flight in an urban area must be at an altitude that avoids obstacles, such as high-rise buildings, trees, and telegraph poles, increasing the difficulty of securing sufficient keypoints.
Thus, autopilot-based thermal imaging has severe data redundancy, with the area not covered by solar panels occupying more than 99% of the ground target. This causes a shortfall of thermal markers on solar panels, resulting in matching failure or mismatch on a single solar panel during the construction of thermal photomosaics. Data redundancy may cause errors in exterior orientation parameters, such as direct distance measurements, angles, positions, and solar panel areas. (7) Furthermore, unnecessary targets may contaminate the thermal signatures from solar panels through the influence of their ambient light. (7,8) The problem of insufficient thermal markers can be solved by exclusively obtaining still images of solar panels, comprising 1% of the total roof area, with a high overlapping rate.
Video has unique properties, offering several advantages for securing sufficient and accurate keypoints for targets. Unlike the static imagery captured by autopilot flight with predefined waypoints, dynamic stereo coverage between individual frames can be accomplished via intensive overlapping within a single solar panel. (9)(10)(11) Video-based thermal imaging can capture the thermal signatures from specifically targeted objects with constant overlapping rates within the confined area. (9) This can complement the data redundancy in traditional thermal images captured during autopilot flight owing to the scattered solar panels in urban environments.
Most studies on unmanned aerial vehicle (UAV)-borne video thermal imaging have evaluated its applicability to the real-time detection, classification, and tracking of objects, (12)(13)(14) such as in the field phenotyping of water stress (15) and fire monitoring. (16) Several studies have explored real-time mosaicking for detecting defective solar panels in large-scale solar farms with UAV video thermal imaging. (17,18) For example, Lafkih and Zaz proposed a live detection technique for shaded captured frames from solar panels in UAV video frames based on the Otsu thresholding algorithm. (17) However, these studies involved capturing images containing multiple solar panels in each scene at a relatively high flight altitude to cover a wide area. The capability of UAV-borne video in tracking the defects of a scattered urban solar panel has not been evaluated. Other studies have evaluated the suitability of UAV-borne video thermal imaging in terms of mapping accuracy (5) and thermal signature (19) and compared it with the autopilot-based imaging of solar panels. (2) However, to our knowledge, there have been no studies on comparing the tracking capability of spatial patterns on defective urban solar panels between UAV video stream and photomosaic techniques. To utilize video as a complementary tool for the thermal deficiency inspection of scattered urban solar panels, it should adequately detect the spatial patterns on defective solar panels. Therefore, we aim to compare spatial patterns of thermal anomalies from UAV video streams and photomosaics to evaluate the applicability of a video stream to the thermal deficiency inspection of urban solar panels.

Study area
The study area is in the southeastern part of South Korea at 35°50′54 N latitude and 128°32′41 E longitude (Fig. 1). It is in the Dalseo administrative district of the metropolitan city Daegu, the third most populous city in South Korea. Compared with other Korean cities, Daegu has low rainfall and abundant solar radiation, (20)(21)(22) making it suitable for solar power generation. The experimental target, Daegu Educational Training Institute (DETI), is in the Gamsam-dong residential area. It is characterized by districts and land mosaics, such as commercial and residential areas, schools, and parks, in the city center. Diverse tilts (25-88°) and azimuths (120°, 240°) are evident in the solar panels installed at DETI, which are scattered across 20% of its roof area. There is a sufficient number (645) of samples (solar panels) to ensure statistical significance. Therefore, the location is suitable for a comparative evaluation of UAV video and photomosaic techniques for the thermal deficiency inspection of urban rooftop solar panels.

Video-based thermal mosaic
The UAV video was recorded on 24 August 2020 (summer) when the solar zenith angle was the highest (13:00) to avoid shade and poor weather such as rainfall. The UAV thermal video of solar panels was recorded using a DJI Matrice 200 V2 quadcopter equipped with a DJI Zenmuse XT2 camera (Table 1).
In the case of the video, the CMOS sensor transmits the converted IR radiation as electrical signals into two different channels and processes a video frame image composed of two fields: (1) an odd field consisting of the thermal pixel values of odd-numbered lines and (2) an even field consisting of the thermal pixel values of even-numbered lines. (23) During this process of video frame formatting, the video frames are compressed and stored at lower spatial resolutions. Therefore, raw UAV thermal video frames do not contain geometric information. The DJI Matrice 200 V2 UAV and DJI Zenmuse XT2 camera provide telemetry data for full orientation (position and altitude) in subtitle format (SubRip Subtitle: SRT), along with the recorded thermal video. The time sync function of OSDK V3.8.1 embedded in the flight controller aligns the recording duration of the video, the GPS time, and the flight controller clock at 1 Hz. Thus, the SRT file provides second-by-second full orientation data consisting of a number indicating the sequence, Coordinated Universal Time (UTC), and all the orientation parameters (GPS coordinates, barometer altitude) acquired from the flight controller during the flight. In this study, we utilized FLIR Tools to extract thermal video imagery from the SEQ video file. The full frame rate of DJI Zenmuse XT2 is 30 Hz (30 frames/s). In this study, we built video mosaics using the extracted video frames. However, some frames were blurred due to flight vibration and the low aperture of DJI Zenmuse XT2 (F/1.0). Therefore, the frame interval per second for the UAV thermal video was chosen as 0.012 s (2 frames/s, overlap: 99%), 0.025 s (4 frames/s, overlap: 99%), 1 s (1 frame/s, overlap: 97%), and 2 s (0.5 frame/s, overlap: 88%) to reduce noise and guarantee the overlap rate. Autopilot flight was performed along a double-grid path with a 95% overlap rate. Mosaics of individual video frames and still photos were automatically created  (5) building of a digital surface model (DSM), orthomosaic, and index. Pix4dMapper utilizes the structure from motion (SfM) technique, inferring 3D information using overlapping images. This method obtains the information required to construct 3D images, such as focal length, camera type, and image size, from a set of corresponding points in two or more images. SfM photogrammetry facilitates the fast, automated, and low-cost acquisition of 3D data using a superimposed image without requiring the input of GCP information.
An SfM algorithm is applied to establish the camera exposure position and motion trajectory required to build a sparse point cloud. (24)(25)(26)(27)(28) The camera exposure position and motion trajectory are then used for camera calibration. Multiview stereo (MVS) is utilized to build a dense point cloud, along with the DSM generation method using reverse distance weight interpolation. (29,30) Figure 2 presents the number of overlapped images used to build the point cloud. Green areas represent an overlap of at least five images for every pixel. Generally, mosaics generated from autopilot and video frames are green except for their borders. The overlap ratios, keypoints, and matched keypoints are sufficient to generate high-quality results (Fig. 2).
To acquire the land surface temperature (LST) of individual solar panels, we set and number the locations and boundaries of individual solar panels through on-screen digitization with visual interpretation. The boundary of the individual solar panels is used to identify the mean LST of the solar panels. Table 2 shows the numbers of pixels and LST values detected from the solar panel boundaries presented in Fig. 1. The numbers of solar panels obtained from the autopilot-based mosaic (hereinafter referred to as a photomosaic) and video stream (hereinafter, video mosaic) are identical for the different frame intervals (15, 7.5, and 1 frame/s). However, the numbers of pixels and solar panels in the video mosaic processed with 0.5 frame/s intervals were lower than those obtained from the photomosaic and video mosaic with shorter frame intervals (15, 7.5, and 1 frame/s) (Fig. 1). This is because of the smaller amount of overlapping imagery (88%) for 0.5 frame/s (Fig. 2). This increases the standard deviation of the LSTs in the video mosaic for 0.5 frame/s to 1.15, indicating that the distribution of LSTs of solar panels in the video mosaic for 0.5 frame/s is different from that in the photomosaic.

Hot-spot analysis (Getis-Ord Gi*)
Spatial autocorrelation is based on Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things." Getis-Ord Gi* is a spatial autocorrelation technique. It statistically measures the degree of spatial autocorrelation and tests the null hypothesis-the region of interest does not show any spatial pattern other than an accidental distribution. (31) We analyze the adjacent distance and patterns of the LST in malfunctioning solar panels caused by soil and dust in the photo-and video mosaics through hot-spot analysis. Then, we evaluate the distribution of the LSTs at malfunctioning solar panels to compare the spatial patterns of the LSTs in photo-and video mosaics. Herein, the Getis-Ord Gi* statistic is calculated through the weight of the space using the LSTs. Thus, the spatial clustering can be determined from the high and low values of the calculated p-values and z-scores. The formula for calculating the Getis-Ord Gi* statistic is (32) , , where x j is the LST for point j, w i,j is the spatial weight between points i and j, and n is the total number of LSTs. The Getis-Ord Gi* statistic, calculated by considering the distance between points, determines statistical significance via the z-test. In other words, the clustering of high and low values of LST is determined through statistical significance by considering the concentrations of the high and low values of LSTs. If the z-score is positive (+), the high values are spatially clustered, and if the z-score is negative (−), the low values are spatially clustered. Figure 3 and Table 3 present the results of the hot-spot analysis and high-low clustering analysis (Getis-Ord General G) of LST p versus LST 15frames , LST 7.5frames , LST 1frame , and LST 0.5frame . The z-scores of high-low clustering indicate that LST p , LST 15frame , LST 7.5frames , LST 1frame , and LST 0.5frame have highly clustered patterns (2.94-5.08) with a statistically significant p-value (0.00). In other words, the high values are more concentrically clustered than the low values in this experimental site. However, LST 0.5frame shows less highly clustered patterns than LST 15frames . LST 15frames , LST 7.5frames , and LST 1frame have similar resultant mean z-scores) to LST p (hot spot: 2.27, cold spot: −2.72) in the hot spots (from 2.31 to 2.35) and cold spots (from −2.70 to −2.68) (Fig. 3, Table 3). LST 0.5frame has a similar mean z-score to LST p in the hot spots but a very different value in the cold spots. Longer frame intervals produce larger differences in the resultant mean z-scores in the cold spots, indicating lower LSTs ( Table 3). The video mosaics with longer frame intervals tend to underestimate the LST compared with the autopilot-based photomosaics. This tendency may result from the time lag and vignetting effects. The thermal UAV video stream is taken 20 min after the autopilot imaging is conducted along the flight plan. LST is highest at 14:00 even though the solar zenith angle is lower than that at 13:00. The solar panels remain at an elevated temperature until 14:00, at which the peak LST  Table 3 Results of hot-spot analysis (Getis-Ord Gi) and high-low clustering analysis (Getis-Ord General G). Similarity is calculated by dividing the number of hot spots detected from the photomosaic by the number of hot spots matched between the photo-and video mosaics.  (33) Thus, the LSTs of the solar panels are slightly higher than LST p in the video mosaics. Simultaneously, the video mosaics show larger differences in the spatial distributions in both the hot and cold spots for longer frame intervals (Table 3). In the video mosaics, the longer the time frame interval, the more heterogeneous the spatial patterns are compared with those obtained from the photomosaics. Figure 4 displays the spatial patterns of clusters that are classified with 1.5 °C LST intervals. Traditionally, clusters are classified with respect to the values used to inspect the inherent characteristics of the data set. Thus, when applying a traditional clustering method, such as the hierarchical clustering, k-means, clustering large applications, or Ward algorithm, (34,35) the standard for classifying a cluster is deduced in accordance with the LST. These standards preclude a subjective comparison of the spatial distributions of clusters between LST p versus LST 15frames , LST 7.5frames , LST 1frame , and LST 0.5frame . Accordingly, we utilize equal intervals and divide the sums of the minimum and maximum LST p values by five to classify five clusters with similar intervals as LST 15frames , LST 7.5frames , LST 1frame , and LST 0.5frame .  Table 4 presents the results of the cluster analysis; the numbers of solar panels in the classified clusters and the similarity in the spatial location between photo-and video mosaics are also indicated. This similarity is calculated on the basis of the number of matched solar panels being classified as the same cluster in both the photo-and video mosaics. In all video mosaics processed with LST 15frames , LST 7.5frames , and LST 1frame , the spatial patterns of clusters from the video mosaics are similar to those in the photomosaic. The similarity of spatial patterns of clusters to LST p appears to be high in the video mosaics with LST 15frames , LST 7.5frames , and LST 1frame . LST 0.5frame has the lowest similarity of spatial patterns of clusters to LST p in most clusters (38.4-66.7%), except for Cluster 5 (90.0%) ( Table 3). From 13:00 to 14:00, the difference in LSTs is under 1 °C. (36) However, the differences between LST p and LST 0.5frame range from −0.86 to 1.45 °C ( Table 5). The differences are excessive for LST 0.5frame even when considering the time lag during shooting.

Results and Discussion
The UAV images have radially decreasing brightness away from the center. (37) This is termed the vignetting effect, which is caused by optical transmission problems. (38) The spatial transmissivity of an image with vignetting is normalized to a maximum value of 1. Typically, a camera captures more light in the center of an image than at its borders. Thus, the transmissivity of an image is 1 in the center and decreases toward the borders. In Pix4dMapper, a vignetting polynomial is applied by modeling the camera optics using the coefficients included in image headers to correct the vignetting effect. (39,40) During mosaicking, the matched points among the images are calculated using the mean values of the matched keypoints. Thus, similar results to those when the image edges are excluded can be obtained. (41) The lower the image overlapping rate, the smaller the number of matched keypoints deduced. Insufficient keypoints lead to biased results, including the vignetting effect. This tendency is clearly apparent in the results of this study: longer frame intervals (15→7.5→1→0.5 frame/s) have lower overlapping rates (99→99→97→88%) and fewer 3D densified keypoints per cubic meter (21.36→21.10→15.76 →13.28 per m 3 ) (Table 6). Hence, the low similarity of spatial patterns of clusters to LST p in LST 0.5frame might be due to the lack of keypoints and the vignetting effect caused by the low overlapping rate (88%). The frame interval is strongly associated with the overlapping rate. When the frame interval increased from 15 to 0.5 frame/s, the overlapping rate decreased from 99 to 88%. The video mosaics built with the longer video frame intervals have a lower similarity to the photomosaic. Therefore, video mosaics can be used to inspect the spatial patterns of defective inner-city solar panels with a similar overlapping rate to that of photomosaics. However, we observed that a video mosaic with a lower frame interval than the photomosaic reduced the number of 3D densified keypoints per cubic meter. An insufficient number of keypoints leads to the generation of biased thermal signatures and exterior orientation parameters (distances, angles, positions, and areas of solar panels). Therefore, video thermal photomosaics must have frame intervals that match or exceed the overlapping rates achieved with autopilot to ensure the consistent quality of thermal signatures obtained from video mosaics and photomosaics.
The development stages for UAV autopilot can be divided as follows: (1) Level 0: Nonautomated, (2) Level 1: Automated assistance (3), Level 2: Monitored automation (4), Level 3: Conditional automation, (5) Level 4: Full automation. (6) The UAV autopilot used in this study was judged to correspond to Level 1 because the route flight was performed with each waypoint manually entered by the human pilot. Most of the solar panel monitoring using a UAV in this study depends on Level 1 technology. (42) In the full automation stage (Level 4), the autopilot allows the UAV to reach the desired destination without human intervention while recognizing and avoiding obstacles through its sensor. Therefore, it is judged that the problems raised in this study can be solved if technological development in the full automation stage (Level 4) is achieved in connection with the video sensor.

Conclusions
To our knowledge, this is the first study of comparing the capabilities of tracking spatial patterns on defective urban solar panels between UAV video mosaics and autopilot-based photomosaics. We experimentally validated the spatial distribution of hot spots. Clusters of thermal signatures in video mosaics have the highest similarity (80-100%) to those in the photomosaic while providing higher frame intervals (15 frames/s). Even when a video mosaic is obtained with a shorter flight duration and smaller coverage area than a photomosaic, it can achieve the required performance in tracking thermal deficiency on targeted urban solar panels. The results of this study can serve as preliminary evidence for the applicability of video-based thermal imaging to thermal deficiency inspection on urban solar panels.