Novel Embedded Smart Gateway Framework for Fruit/Vegetable Quality Classification

Even though a system continuously, its low recognition rate causes misjudgments in fruit quality. In this work, we propose a new, embedded gateway structure with simple implementation and enhanced success rates for fruit and vegetable quality classification. We have combined an edge-computing embedded development platform with an artificial intelligence (AI) algorithm structure, where microprocessors are used to control the gateway switch of a conveyor, and this platform is used to build a small fruit/vegetable quality classification system, which has been implemented and tested. The system’s hardware is controlled by different pulse widths. By combining AI algorithms of an image sensor for recognition, we have effectively enhanced the system’s capability of fruit and vegetable quality recognition.


Introduction
According to a report estimating Taiwan's population from 2018 to 2065, the percentage of people over 65 years old will increase from 14.5% in 2018 to 19.9% in 2025. Moreover, by 2050, the percentage will increase to 38%. The rapid decrease in the young adult population is raising the average age of people employed in agriculture. The traditional manual fruit screening method relies heavily on human labor, has low efficiency, and places a strong physical demand on middle-aged workers.
Traditionally, fruit growers refer to the Fruit and Vegetable Quality Classification Standard and Packaging Specification Manual provided by the Agriculture and Food Agency of Taiwan for guidelines to manually grade fruit and vegetables before selling them at wholesale markets. In this study, we propose a novel embedded smart gateway framework that applies artificial intelligence (AI) and edge computing to fruit screening methods as an alternative to traditional manual screening. The proposed framework is expected to reduce the fruit error rate and quality disputes between fruit growers and consumers. It will also reduce the labor required for the fruit harvesting process. The automatic screening system designed in this study uses an image sensor to recognize whether the fruit has serious pest infestation or sunburn and pick out damaged fruit to maintain the quality of fruit on the market.

Literature Review
Apte et al. applied the You Only Look Once (YOLO) algorithm in a mobile application in 2017. (1) It was able to process images acquired by a mobile device. They subsequently modified the YOLO feature extraction network model to enhance the inference speed. Iandola et al. applied a fire module layer to reduce the number of parameters in the YOLO algorithm and increase the image detection speed. (2) In 2018, Redmon and Farhadi proposed the YOLO-v3 algorithm, which was an improvement of the YOLO-v1 algorithm of Apte et al. (3) The YOLO-v3 algorithm was able to process 608 × 608 images on an NVIDIA Titan X graphics card at a speed of up to 20 frames per second.
Santad et al. proposed a system based on the YOLO algorithm to detect and analyze the relations between luggage and its owners. (4) The proposed system used camera images to enhance security surveillance.
In 2016, Liu et al. proposed a single-shot detector that combined object frame coordinates and classes in a single network architecture and used multiscale feature maps to obtain output predictions. The structure was implemented using a convolution neural network (CNN). (5) In the same year, Dubey discussed how to detect skin defects on fruit by creating a feature database of the external properties of fruit (color, size, shape, and texture). (6) The system first segmented the images and applied the speeded up robust features (SURF) algorithm to the segmented images. The features were then used to detect defects on the fruit.
Marimuthu and Roomi established a fuzzy model in 2017 to grade bananas into unripe, ripe, and over-ripe. The decision was determined by a framework with eight fuzzy rules created from a decision tree. (7) In the same year, Kamran and Pormah combined image processing and an artificial neural network to detect whether a cucumber's shape is ideal (cylindrical) or defective (curved or conical). A novel algorithm was implemented using MATLAB 2010a to preprocess and extract shape features from images. (8) However, these identification and classification systems are unsuitable for fruit classification since they can only perform simple classifications for the same type of fruit. (9,10) In 2018, Khaing et al. proposed a CNN based on the back-propagation algorithm and applied it to fruit classification, as well as a vision-based automatic classification system. (11,12) In the same year, Hossain et al. proposed a vision-based fruit classification framework using deep learning. The framework also used a CNN, where a finely tuned VGG-16 model showed outstanding accuracy. (13) Lu et al. used a six-layer CNN composed of convolutional, pooling, and fully connected layers for fruit classification. The system obtained an accuracy of 91.44%, which was higher than that of algorithms such as the voting-based support vector machine, wavelet entropy, and genetic algorithm. However, increasing the number of types of fruit decreased the accuracy. (14) Bochinski et al. proposed using the intersection over union (IOU) tracking algorithm for high-speed tracking. (15) The tracking method performed IOU calculations at the target position detected in the original image and the target position in the next frame. If the value of the IOU is higher than the established threshold value, the regions are considered as related and determined as the same object. This method can effectively track the same object. Chen et al. proposed the use of an embedded imaging system and AI to detect fruit quality. The accuracy rate was as high as 88% in experiments, thus demonstrating effective detection. (16)

System overview
Our newly proposed system with the embedded smart gateway framework was developed to satisfy the following research and development objectives: A. Its automatic gateway control system is capable of classifying the fruit quality. B. The neural network model of the system is embedded in a development board to perform model inference. C. The system has a graphical interface to display the delivery of the fruit in real-time. D. The system is capable of using the graphical interface to collect data and evaluate the neural network model. E. The system has a low fabrication cost.
The model combines AI and gateways for intelligent decision-making and classification. Fruit and vegetable image detection is performed through a camera sensor module. The AI model uses the detection data to determine the gateways' actions. The Keras model framework is used for the classification and evaluated the computation time of building the model. Keras is an open-source software library that provides a Python interface for artificial neural networks. Data collection was performed to obtain the data required for the neural network model, and an experiment was conducted using the collected data. Moreover, the graphical user interface of the system integrates the functions used for training the model. In summary, an effective procedure is proposed in this work to develop the fruit classification system.
The model is trained to search for functions that fit the features in the data. After evaluating the completed model, the model is integrated into the system architecture. The YOLO algorithm is implemented on a Jetson TX2 embedded platform to perform object detection, circle the fruit image, and perform image recognition using the CNN algorithm. On the basis of the recognition results, different pulse width modulation (PWM) signals are sent by an STM32 microcontroller development board to control the gateway automatically.

Hardware architecture
The system includes three pieces of essential hardware: a Jetson TX2 embedded platform development board, an STM32 microcontroller development board, and an MG996R servomotor. Figure 1 shows the hardware architecture of the system. First, the operator turns on the front gateway control switch. Then the front gateway delivers the fruit to the conveyor platform. Next, a camera on the Jetson TX2 development board is used to capture images to perform tracking and recognize fruit images. Finally, the recognition results are sent by automatically detecting whether the fruit is ready to leave the conveyor platform. Since there is no PWM or transistortransistor logic (TTL) for the RS232 UART output module on the Jetson TX2 board, the UART signals are sent to the STM32 microprocessor module through an FT232 Bluetooth module (USB to TTL). Different PWM signals are sent to control the direction of the MG996R servomotor and to sort the fruit on the conveyor platform according to the received string.
The computer is used to train the neural network model in advance, and the model is embedded in the Jetson TX2 development board. After transferring the images from the camera on the development board to the neural network model, the FT232 module sends the inference results to the STM32 microcontroller development board and the PWM signals to the MG996R servomotor. Figure 2 shows a flow chart of the system operation. Object detection is used to detect whether or not the fruit is delivered to the platform. The tracking and recognition function continues to work before the fruit leaves the conveyor platform during the delivery process and saves all the tracking results. When the fruit leaves the conveyor platform, all the recognition results are saved as the final answer to determine the quality of the fruit, and the gateway channel control is activated.  Figure 3 shows a flow chart of the control of the front gateway. Two servomotors take turns to perform lifting and lowering movements. When servomotor 2 for the gateway is lifted and released to pass the first fruit to the conveyor platform, servomotor 1 for the gateway is lowered so that the second fruit rolls over to the front row to the release position. At this moment, servomotor 2 for the gateway is lowered and servomotor 1 for the gateway is lifted to make a buffer for the momentum of the subsequent fruit. By continuing this action, fruits are consecutively rolled onto the conveyor platform.

Tiny-YOLO
Because the Tiny-YOLO algorithm is a network architecture for feature extraction provided by the YOLO algorithm, the Tiny-YOLO algorithm has a lighter neural network architecture design than YOLO. The tiny-YOLO's architecture is composed of a CNN and max-pooling. A batch normalization (BN) layer is added after each convolution layer to normalize the parameters, help the model learn, and accelerate training. A leaky rectified linear unit (ReLU) is used as the activation function. Figure 4 shows the Tiny-YOLO architecture, from which we can see that the sizes of the filters are increased as powers of 2, which allows the input dimension to be reduced using the max-pooling layer. In addition, a 1 × 1 size convolution kernel is used at the end of the architecture for training. Since the Tiny-YOLO algorithm uses a two-layer feature pyramid network (FPN), the dimensions of the output layer are configured as output tensors with sizes of 13 × 13 and 26 × 26 to perform object detection.

Uff model
The fruit classification recognition model (Filename Extension.h5) built by Keras is converted to a model file (Filename Extension.uff) available in TensorRT through the software developed by NVIDIA TensorRT. Furthermore, model inference is performed on the converted model file using C++.
As shown in Table 1, the original Keras model is written in Python, whereas the TensorRT model is written in C++. It can be seen that the use of C++ and the uff model enhances the  inference speed by 2.5-2.7 times. The enhancement will be even greater if a deeper network architecture is used. Figure 5 shows the control of the gate that waits to receive the final fruit identification results.

Back gateway
Since the system is divided into three channels, it accepts three different codes and sends out three types of PWM signals. After the gate is turned, it is set to the center position before the next motor turns.

Object detection model
The architecture of the feature extraction network is modified for the YOLO algorithm. Therefore, the architectures in Table 2 were designed for comparison in the experiment. Note that the system uses the FPN framework to output coarse and fine features for the regression prediction of object locations. Therefore, the output dimension must be set to the output tensor of the corresponding category.

Experimental Results
We use the YOLO algorithm to experimentally evaluate the model, and two parameters are set for the YOLO object detection: (1) non-maximum suppression (NMS)-IOU is set to 0.45 and (2) object confidence is set to 0.3 to make the system work more smoothly and achieve better accuracy. Two indices are used to perform the evaluation, as discussed in this section. An example is first used to explain the definition of the index used for evaluation, which is followed by the displayed experiment data.

Precision-recall (PR) curve
The IOU must be calculated before drawing a PR curve. In Eq. (1) and Fig. 6, A is the real annotated box and B is the predicted object box. The IOU represents the rate of overlap between the real object box and the predicted object box. Larger IOU values indicate better object detection capabilities. We refer to the model proposed by Padilla et al. for object prediction. (17) A graph is plotted after ranking the object's confidence levels (confidences) according to the model's predictions of the object type. The PR curve is plotted by calculating the precision and recall values of true positive (TP) and accumulated TP (Acc TP) against the values of false positive (FP) and accumulated FP (Acc FP) by calculating the accumulated TP or FP. Figures 7-13 show the PR curves for the neural model architectures used for comparison and the proposed architecture. In the figures, FRUIT AP represents the PR curve of good fruit and B_ FRUIT AP represents the PR curve of bad fruit. In Figs. 7-9, the precision value decreases gradually as the recall value increases. The recall value for the Tiny-Batch-Dropout architecture reaches a very high level, indicating that all boxes containing the objects are circled, and it correctly determines the quality of fruit for the circled object. The recall values for the Small- Table 2 Architectures used for comparison.

Small-Mobilenet-v1 Small-Mobilenet-v2
Tiny-Batch-Dropout Tiny-Nobatch-Have-Leaky Tiny-SqueezeNet Tiny-Octave        Mobilenet-v1 and Small-Mobilenet-v2 architectures are not significantly different and these architectures have average performance. Figure 10 shows the PR curves for the Tiny-Nobatch-Have-Leaky architecture without the BN layer. The results of the judgment box are incorrect, and the performance is very poor even though the recall values can be improved.
In Fig. 11, the PR curve for the Tiny-Octave architecture is based on the accuracy of detecting fruit quality. Since the architecture uses fewer parameters than the Small-Mobilenet-v2 and Small-Mobilenet-v1 architectures, the PR curves are rough with a downward slope. As a result, the performance of the model is poor and the accuracy is reduced by 10% compared with the two architectures. Figure 12 shows the PR curve for the Tiny-SqueezeNet architecture. Regardless of the quality of the fruit, there is a gradual decrease in precision as the recall increases, and the area under the curve (AUC) is quite small, indicating the poor performance of the model. Figure 13 shows the PR curve for the proposed Tiny-YOLO architecture. The curve for the good fruit is relatively smooth, which means that the model performs satisfactorily, and the overall accuracy is relatively high compared with that of the Tiny-SqueezeNet architecture. When the recall increases to 90%, the precision is still around 60%.

Mean average precision (mAP)
The mAP is calculated by summing the average precision (AP) for each class and dividing by the number of classes. In this work, the AP values are calculated by calculating the maximum precision corresponding to each value of the recall, similar to the AUC for each type of precision-recall curve. The precision-recall curve is plotted by calculating the precision and recall values of the accumulated TP or FP detections. Table 3 shows the mAP for various architectures of neural networks. An evaluation model was used with the same training dataset and test set to compare the mAP of training and testing The results of testing performed on the Jetson TX2 Developer Edition in terms of the number of parameters required for different architectures and the detection speed are given in Table 4. Most of the models have close to 50 layers. However, the Small-Mobilenet-v2 architecture has as many as 70 layers since it uses the Add layer twice. Both the Small-Mobilenet-v1 and Small-Mobilenet-v2 architectures have a reduced number of parameters owing to the use of a 1 × 1 filter compared with the use of no filter. The number of parameters for the Tiny-SqueezeNet architecture is 19.2 times less than that for the Tiny-YOLO architecture, and the computation time is reduced by over 0.02 s. Lastly, the performance of the Tiny-Octave architecture is typical. It has a reasonable number of parameters and an average operating speed.

Model training
The NVIDIA GTX1070 Ti training model is used for comparison of the architectures with the batch size set to 16. Also, the number of epochs is set to 100, where the first 50 epochs are set with a fixed learning rate and the first 20 network layers of the model are set to fixed weights. For the subsequent 50 epochs, all network layers are set to be trainable, and the learning rate is varied dynamically to gradually adjust to a lower learning rate. The initial loss value is set at a rate of 10 −3 units, and training is stopped if the new loss value is not lower than the original loss value. Figure 14 shows the changes in the learning rates of the models during training. The learning rates are adjusted in the subsequent 50 training sessions according to changes in the value of the loss function. The benefit of adjusting the learning rates is that the best local parameter is still searched for when there is no escape from the local optimization parameters. The Small-Mobilenet-v1 architecture uses a lower learning rate at the beginning and extends the number of training epochs to 86. On the other hand, the Tiny-Octave architecture is ineffective in adjusting the learning rate to reduce the loss function value. Figure 15 shows the loss function graphs for the training set of each model. The graphs of all models slowly flatten out without showing significant fluctuations in the learning loss function. However, the loss function of the Tiny-SqueezeNet architecture is the highest and its average accuracy is relatively poor. Similarly, the Small-Mobilenet-v1 architecture also requires a long training time. On the other hand, both the Small-Mobilenet-v2 and Tiny-YOLO architectures have relatively low loss function values. Figure 16 shows the loss function graphs for the validation set of each model. The loss values of all models tend to converge. Nevertheless, fluctuations occur for the Tiny-Nobatch-Have-Leaky architecture. However, the model training becomes unstable without using the BN layer and the loss function is relatively high. The loss functions of the Small-Mobilenet-v1, Small-

Conclusion
In this work, we proposed a novel embedded smart gateway framework that combines the YOLO-v3 algorithm and the Tiny-YOLO neural network model. We performed object detection and compared the proposed framework with several other models in terms of classification performances. Furthermore, we built a model architecture to develop the proposed system using TensorFlow and the Keras framework. We also acquired all images used in this work. The proposed Tiny-YOLO architecture achieves an mAP of 75% for object detection.
Overall, we implemented a vision-based application on an embedded system to realize an automatic screening system for fruit on a conveyor platform. In the future, the proposed model can be adapted for a field-programmable gate array board by loading only a single classification model for a single type of fruit.