Mathematical and Computational Modeling of Inversion of Iron Content Mining in Tailings Reservoir Using Unmanned-aerial-vehicle-enabled Hyperspectral Imaging

1China University of Geosciences (Beijing), 29 Xue Yuan Road, Haidian District, Beijing 100083, China 2China Center for Resources Satellite Data and Application, 5 Feng Xian East Road, Haidian District, Beijing 100094, China 3Guilin Tourism University, 26 Liang Feng Road, Yanshan District, Guilin, Guangxi 541006, China 4Department of Automatic Control Engineering, Feng Chia University, 100 Wen Hua Road, Xitun District, Taichung 40724, Taiwan 5International School of Technology Management, Feng Chia University, 100 Wen Hua Road, Xitun District, Taichung 40724, Taiwan 6Guangxi Normal University (Yucai Campus), 15 Yu Cai Road, Qixing District, Guilin, Guangxi 530015, China 7General Education Center, Feng Chia University, 100 Wen Hua Road, Xitun District, Taichung 40724, Taiwan

In this research, we focus on the detection and monitoring of iron content in mining areas, which is of great significance in many hyperspectral imaging (HSI) studies that can be used to assess the advantages and disadvantages of the soil environment. Compared with the traditional grid sampling and interpolation methods, the unmanned aerial vehicle (UAV) hyperspectral inversion method can be used to quickly account for the large-area inversion of iron content and draw thematic maps of iron concentration in a given area suitable for mining for deposits. In this paper, we propose a novel classification methodology for selecting the optimal model for the UAV hyperspectral inversion of iron content using mathematical and computational modeling. Through the cross-validation comparison of three regression models, the most suitable model is found for the inversion of soil iron content. In addition, we also analyzed and compared the effects of different feature sets, namely, band selection, principal component analysis (PCA), and minimum noise fraction (MNF), on the model accuracy. Our experiments have proved that among many inversion models and feature combinations, the partial least squares regression (PLSR) model combined with band selection, PCA feature extraction, and MNF feature extraction can greatly improve the inversion accuracy of iron concentrations in the identified areas.

Introduction
The monitoring of iron concentration in iron mining areas plays a guiding role in environmental protection and land reclamation. Since mining engineering is a series of mineral separation processes, iron ore mining areas usually contain multiple intermediate product areas with different levels of intermediate products, such as ore mountains, mine tailings, dumping sites, and fine mining areas. Mine reclamation usually adopts different restoration methods for different mining areas. Monitoring the iron concentration in the soil of different iron mining areas is helpful for assessing the quality of the production environment, production efficiency, and ecological restoration effects.
Traditionally, a large number of on-site sampling, chemical analysis, and interpolation methods are often used to study the concentrations of iron and other metals in the soil. (1) With the rapid development of hyperspectral remote sensing technology and by constructing a regression model between hyperspectral data and metal concentration, the efficiency and accuracy of soil metal concentration estimation have been greatly improved. (2,3) By utilizing the advantages of hyperspectral data with high spectral resolution, wide spectral range, and powerful feature extraction ability, point-based spectroscopy technology can be widely used to estimate the metal concentration in the soil. (4) However, point-based spectrometer data are difficult to apply to the inversion of metal concentration in a large study area. Although researchers can use multispectral images, such as Landsat and Sentinel, their spectral resolution is low, and they are not sensitive to changes in the concentration of metals in the soil. (5) With vigorous developments in the fields of robotics and unmanned aerial vehicle (UAV) technologies in the past two decades, the development of hyperspectral imaging (HSI) technologies and UAV-based HSI applications provide unprecedented opportunities for remotesensing-enabled metal concentration mappings that show promise in rapid deployment and highefficiency in feature detection and accuracy. The UAV-HSI iron concentration inversion not only has the advantage of hyperspectral data, but also has excellent feature extraction capabilities. Besides, it has image-based remote sensing technologies, which can cover a large land space at a given time. This technology is of great significance to environmental monitoring and evaluation. In this work, we describe the inversion of soil iron concentration levels by producing a thematic map based on the obtained UAV hyperspectral images, which includes data collection, preprocessing, model selection and construction, and iron concentration prediction in the study areas.
This paper is divided into the following themes: • First, the preprocessing of UAV hyperspectral big data is the key step of inversion. In this paper, a python script based on graphics processing unit (GPU) acceleration is developed independently on the basis of a photo scan library, which can automatically splice hyperspectral data and greatly improve the operating efficiency. • Second, the cross-validation estimation method can be used to unbiasedly evaluate the performance of common regression models in the retrieval of soil iron content. • Third, the generalization ability of the model is further optimized with feature extraction and band selection; thus, it can improve the accuracy of iron content inversion.
• Finally, we conclude the paper with some of the limitations and future directions of our research work.

Introduction of study area
The study site is located in a mine in Qian'an County, Tangshan City, Hebei Province. The metal deposit in Malanzhuang contains more than twenty types of mineral. The main minerals focused on this research work include iron, gold, copper, nickel, zirconium, and so forth, of which iron ore accounts for the largest proportion. In this study, we mainly choose the tailings pond in the mining area as the area for site selection and data collection. The land type of the study area is relatively simple, with a length of about 330 m from east to west, a width of about 240 m from north to south, and a total area of about 79200 m 2 as shown in Fig. 1.
The main function of the tailings pond is to store the metals and nonmetals of the mine after sorting. Complete tailings treatment facilities guarantee environmental protection and sustainable mining. If the "waste residue" accumulated in the tailings pond is not properly treated, it will cause serious damage and pollution to the surrounding environment. Therefore, it is of great significance to monitor the iron content of the tailings pond and the soil around the tailings pond.

Data collection
In previous studies, Yang et al. (6) used an HSI instrument to collect radiation data, whereas Wei et al. (7) used Nano-HyperSpec sensors to collect hyperspectral data from the river. In this research, we used the Cubert UHD-185 (8) hyperspectral-imaging camera mounted on a UAV. The camera parameters are shown in Table 1. The sensor provides 125 bands of information in the range of 450-950 nm and at a resolution of 21 cm. The flying height of the UAV was set to 178 m, and a total of 1022 50 × 50 hyperspectral data cubes were obtained. The corresponding RGB images were obtained at a size of 1000 × 1000, and a spatial resolution of 0.05 m was also obtained from the data collected. Through hyperspectral image fusion, the total amount of hyperspectral image data reached 144 GB.

Field work and chemical analysis
At noon on June 27, 2018, UAV hyperspectral data collection was carried out on a tailings pond in Malanzhuang Iron Mine, Tangshan. At the same time, ground control points were deployed and soil samples were collected, as shown in Fig. 2. There are six control points (longitude and latitude coordinates measured by RTK) and 26 ground sampling points (using a five-point sampling method to obtain soil samples).
After completing the field collection, a third-party testing agency was found to use inductively coupled plasma (ICP) technology to analyze the concentration of iron in these soil samples.

Image stitching
On the basis of the photo scan library, a python script based on GPU acceleration was independently developed. This process can automatically splice the hyperspectral data, which greatly improves the operating efficiency. The specific implementation method is as follows. First, python script is used to stitch together all the images on a specific band followed by other band frequencies as specified by the user and parameter setup. The algorithm used in this process mainly includes the automatic mosaic line and GPU-based Sift algorithm, which quickly realizes the automatic splicing of massive hyperspectral data and is then followed by geocoding in the next step.

Geocoding
After the image stitching is completed, a second-order polynomial fitting method based on six control points will be used to geocode the entire stitched hyperspectral image. The model selection is based on the model evaluation of partial least squares regression (PLSR), support vector machine (SVR), and artificial neural networks (ANNs) using cross-validation estimation. The derivation below is the mathematical formula of solving parameters using the least squares algorithm by fitting geocoding with polynomial parameters. The coding mainly uses multi-order nonlinear fitting, and the optimization method is the classic least square indirect method. The nearest neighbor index (NNI) was computed for both sets of geocoded incidents for each land type. The NNI is a common measure of spatial concentration and a component of other spatial statistics, such as nearest neighbor hierarchical clustering. All nearest neighbor calculations were computed, and the NNI is defined as D o is the average nearest neighbor distance for a dataset, computed as where D i is the nearest neighbor distance for land point i and n is the number of land points as shown in Fig. 2 in the dataset. In addition, D E is the expected nearest neighbor distance from a point pattern exhibiting complete spatial randomness, which is defined as where A is the geographic area of the study site. Assuming that n sets of control point spatial coordinates are given, there are n sets of image point coordinates corresponding to them, and the image point coordinates are found using the ground point marks on the image. Then, there is the relationship shown in Eq. (4), A second-order polynomial fitting is used as follows: where x = 1, 2, 3, …, n.
The known least squares indirect adjustment formula is Among which, By solving the equation, we can obtain For the parameter solving of c = g(x, y), we have replaced Eq.
For the calculation of the image point coordinates of the sampling point, it is assumed that there are m object coordinates as follows: (x 1 , y 1 ), (x 2 , y 2 ), …, (x m , y m ).
In this way, a one-to-one correspondence between the position of the ground point and the position of the pixel is established, and the image is geocoded into the WGS84 coordinate system, which can realize the automatic coding of hyperspectral images.

Model selection
In this research, three commonly used regression models were applied: PLSR, SVR, and ANN. These models were adopted in this work for the estimation of iron concentration. PLSR (9) is a specific form of multivariate linear regression, and it is used here for the estimation of soil properties. SVR, with the features of good generalization abilities and robustness to noise, is becoming popular in the investigation of geophysical and chemical properties. (9,10) ANN proved to be efficient in establishing the nonlinear relationship between soil heavy metal concentration and remote sensing data, which are used for the inversion of iron content that is widely used to estimate soil properties. (11)

Band selection and feature extraction
Magendran and Sanjeevi proved through experiments that iron will have absorption peaks in the 700 to 870 nm spectrum. (12) We can observe in Fig. 3 the spectral curve with a relatively clear absorption peak in the 120-130 band. On the basis of previous research results, we tried to use 45 bands between 700 and 900 nm to build a model to improve the estimation accuracy of soil iron content. We compared the use of all spectrum bands and band selections as inputs to the PLSR model to obtain prediction accuracy. In addition, feature extraction was performed on the input spectral data to build a better regression model (13) using principal component analysis (PCA) and minimum noise fraction (MNF) models.

Accuracy evaluation
By splitting the data set into a training set and a test set and evaluating the performance of the model based on the test set cross-validation, the available samples can be fully utilized and unbiased estimation can be guaranteed. (14) In this work, we mainly used the cross-validation method to design the experiment and evaluated the accuracy through the statistical coefficient of determination and the root mean square error.

Map of iron element content
The best model and the best feature combination can be found through cross-validation. Once the model parameters of the best regression coefficient and the most effective input features are determined, we will input the best feature combination extracted from the entire hyperspectral image of the study area into the selected training model with the highest regression coefficient, which is used in the generation of an accurate iron content map as shown in Fig. 4.

Model comparison
The model selection process mainly uses cross-validation to evaluate the three empirical models of PLSR, SVM and ANN. Table 2 shows the accuracy statistics of training and test data.  The regression coefficients of PLSR and ANN are close to each other. The median determination coefficient of PLSR on the test data set reaches 0.5954, which is much higher than the median regression coefficient (0.4598) of ANN. This shows that the overfitting phenomenon of the ANN model is very serious. For the SVR regression model, the training coefficient of determination is the lowest among the three regression variables. However, the tested coefficient of determination is higher than that of ANN and it is lower than that of PLSR. On the test data set, the root mean square error of support vector regression is the smallest, with a median of 7424 mg/kg. The average root mean square error of PLSR on the test set is 7432 mg/kg, which is slightly higher than that of support vector regression, but far lower than that of the ANN model. According to the statistical data comparison of the coefficient of determination and the root mean square error, it can be seen that PLSR has the best generalization ability on this data set.

Band selection
Owing to the iron absorption peak in the 700 to 870 nm spectral range, we extracted 45 bands from the 700 to 900 nm spectral range from the hyperspectral image as the model input for iron content inversion. To evaluate the impact of these 45 bands on the inversion of iron elements, PLSR was used for model construction and accuracy evaluation. The accuracy results are shown in Table 3. The median values of the coefficient of determination and the root mean square error are 0.6311 and 5571 mg/kg on the test set, respectively. Compared with using all hyperspectral bands as the model input, using band selection can improve the accuracy of the model with an average determination coefficient of 0.5589.

Feature evaluation and selection
In the inversion of hyperspectral soil heavy metal elements, adding feature extraction can effectively reduce the effect of noise on the model. Aiming at the impact of feature extraction on the accuracy of the regression model, in this paper, we mainly choose several common feature extraction methods to construct the regression model and evaluate the accuracy. Common feature extraction methods mainly include PCA, MNF, and so forth.
In this paper, the spectral data after band selection and different combinations of PCA and MNF are mainly used to retrieve the soil iron concentration. It mainly includes three different combinations of band selection + PCA, band selection + MNF, and band selection + PCA + MNF. Table 4 shows the three different feature combinations to obtain statistical data of determination coefficients through PLSR. When PCA or MNF feature extraction is separately added to the spectral data after the band selection, the accuracy will not improve, indicating that the presence of PCA or MNF alone does not help improve the model accuracy. However, when both PCA and MNF features are added to the spectral data for band selection, the median R2 and average R2 on the test set reach 0.7122 and 0.6087, respectively, which are higher than the median R2 (0.6311) and average R2 (0.5589) obtained using only band selection. It can be seen that when the other exists, the simultaneous use of PCA and MNF helps the model. This result is consistent with the feature importance theory. That is, features that are not useful in themselves can be used together with other features to play a role. (15) Therefore, to ensure the highest inversion accuracy of the regression model, we used the data after band selection and combined the features of PCA and MNF as the model input. Similarly distributed CNN technologies using the A 3p viGrid architecture (16)(17)(18) were proposed and previously studied by Shankaranarayanan and other researchers. In our previous work, (19)(20)(21)(22)(23)(24) the main innovations were twofold. First, the generative adversarial network (GAN) is based on a dense residual network, which fully learns the higher-level features of HSIs. Second, the loss function is modified using the Wasserstein distance with a gradient penalty, and the discriminant model of the network is changed to enhance the training stability. Hyperspectral image data were obtained from airborne visible infrared sensors of an imaging spectrometer, and the performance of ResGAN was compared with those of two HSI classification methods. Our other research works support high-performance experiments, which were previously used and form the basis of this study.

Iron concentration diagram
The spectral data after appropriate band selection and feature map extraction were used in the post processing of the HSI data. These in turn were used as the model input to obtain the optimal regression inversion model for the prediction and inversion of the iron concentration map of the study area as shown in Fig. 4. It can be seen that the iron content of the dumping site in the upper left corner (shown in the green frame) is much lower than that of the sedimentation tank area in the upper right corner (shown in the red frame). The lower left foot (indicated by the yellow frame) has the lowest iron concentration along the road, which is caused by dust. The land cover type proved the rationality and consistency of the iron content map. Although there are some shadow areas on the hyperspectral image, these shadow areas will affect the predicted texture, but the iron concentration changes in the shadow areas are not covered, and the boundaries of the high concentration areas are still clearly visible.

Conclusion
In this study, we used UAV hyperspectral images combined with some necessary sampling data to invert the iron concentration of the Malanzhuang tailings pond. Three classical regression methods, namely, PLSR, ANN, and SVR, were used to inverse the iron content. The PLSR model is better than the other two models. In addition, the combination of band selection and feature extraction can better retain the effective information in the spectral data, increasing the coefficient of determination to 0.7122. Finally, the selected wavebands and feature maps extracted from PCA and MNF signs are used as the input of the PLSR model to invert the iron content in the study area. It can be seen from the iron content map that the iron content is in good agreement with the land type in space. Experiments show that UAV-based hyperspectral remote sensing has great potential for metal concentration monitoring, and reasonable model selection and feature extraction play an important role in the retrieval of iron concentration. Some limitations of the research include error correction features, location errors, and resolution and interpolation issues, which will be taken as the limitations in this research. Our future directions will be in minimizing errors and improving our model. The GPU models could also be made faster using distributed approaches, which is beyond the scope of this paper and will be taken as future research areas.

Ri-hui Tan graduated from the School of Electrical Engineering of Guangxi
University with bachelor's and master's degrees in automation. After graduation, she worked in the Tourism Data College of Guilin Tourism University as a lecturer. She is mainly engaged in big data and artificial intelligence involving the Guangxi tourism economy and in works related to tourism data. IFaS Trier University, Germany. He has been actively involved in publishing several research papers and periodicals in top international conferences and journals. His research and teaching interests include big data in biomedical imaging, machine/deep learning, AI, high-performance grid computing, bioinformatics, applied social and cognitive psychology in education, game dynamics, material flow management, renewable energy systems, and policy and decision making. He has studied a diverse range of disciplines in engineering and social sciences and is fluent in a wide range of scholarly domains specializing in higher education and engineering. He serves as an editor and steering committee member on numerous research entities and regularly contributes reviews to the Journal of Supercomputing. He has recently received a grant for supercomputing research from the Ministry of Science and Technology, Taiwan, for his work on the X-ray detection of COVID-19 using supercomputing and HIS in this research supported by a Ministry of Science and Technology grant (MOST-1102222E035002).

Chih-Cheng
Xi Wang is the dean of the School of Teachers College for Vocational and Technical Education, GuangXi Normal University. Her interests are in Vocational and Technical Education and Teacher development, Tourism development and management.
Nan-Kai Hsieh received a Ph.D. degree in the Department of Mechanical Engineering at Nation Taiwan University, Taipei, Taiwan, in 2015. He has been an assistant professor at Feng Chia University since 2022. He currently crosses many professional fields including prognostic and health management (PHM), and mechatronics, and robot vision, and Internet of Thing.
Sheng-Nan Tsai has been an assistant professor at Feng Chia University, Taiwan, since 2016. He earned his M.S. degree in 1998 and Ph.D. degree in 2016 from the School of Management, Fudan University. He has host over 40 Academia and Industry Collaboration projects in the recent five years and owns three patents. He is also a business mentor for more than 50 start-ups. His research interests include Human-Computer Interaction, Entrepreneurship, Project Management and AIOT applications. (sntsai@o365.fcu.edu.tw)