Quality inspection of electric motors right on the factory floor (using YOLO)!

Industrial quality inspection of products is necessary to ensure that possibly defective products are recognized early in the production process, which reduces costs and avoids product recalls.
Automated industrial quality inspection using machine vision reduces or removes the need for demanding and expensive manual inspection of products. Traditional machine vision methods are rule-based: an image of a product is processed by using lower level image processing filters (edge and contours extraction, simple pattern detection), followed by checking if these lower-level image features are of correct size and in correct relation among each other. Rule-based methods are very sensitive to all kinds of variations: lighting / shading, reflections, variations in position (translation, rotation etc.), noise, blurring etc., because they generally can not extract higher-level semantic information about the image (which objects are present in the image and their precise positions). This results in false negatives and false positives which degrades any benefit of automated inspection. It needs to be noted, however, that in cases where a traditional rule-based approach works well, it should be used because of its simplicity and explainability.
With advancements in machine learning and deep learning methods in recent decades, machine learning-based methods have found their way into actual use in industrial production. Deep learning, when implemented properly, is much less sensitive to all kinds of variations and can extract meaningful semantic information (object detection, object segmentation etc.).
Here we describe a deep learning-based solution that we developed for a client in the manufacturing industry, and which is used in production. The part that needed inspection is stator[1], which is a common part in many machines. Certain parts of it must be in a precise position to avoid possible serious damage further down the production process. See images on Figures 1 and 2. This is a very good use case for development of automated solutions.

Figure 1. Image of a stator. 3 pins need to be in precise positions, which are determined from a reference sample or a 3D model. Roughly, they need to be within the circle of radius 0.5 cm from the reference.

Figure 2. Another example image of a stator. Wire can sometimes stick out and occlude the pins partially.


Traditional rule-based solution could be used, but since lighting variations can happen because of pin displacement, reflections and/or partial occlusion, this can result in false positives and false negatives. Therefore, we opted for a more robust solution based on machine learning, in object detection framework. Pins can be detected based not just on their bright top, but based on local features of the entire pin. Since we are interested in the precise position of the top, it should be selected as the center of the ‘’object’’ bounding box. Also, we separate detections for 3 locations because of the specific view angle, which improves detection reliability. It is important that, in this case, relevant image regions could be cropped, which decreases processing time considerably and removes the need for a specialized accelerator (GPU). Namely, input image size can generally increase processing time significantly.
YOLO[2] object detector was used as a state-of-the-art real-time object detector. Many improved YOLO variants have been developed over the years, v6, v7 and v8 being currently the most relevant ones based on accuracy and speed[3]. While YOLO is generally not the most accurate object detection model[4], for an application like this it is accurate enough, while its big advantage is speed, although the requirement in this project in terms of real-time processing was not too strict.
Regarding available hardware for this project, no specialized hardware accelerator (GPU etc.) was used, and the solution was developed to run on industrial computers with CPU only. However, usually a software accelerator for specific hardware should be used, with fast implementations of common operations used in machine learning models (like convolution), and with model compression and, if possible, quantization. In this case, OpenVINO[5] framework was used. Camera, lens and lighting were chosen based both on available hardware setup and on the requirements for image, which should be of excellent quality and in good position to guarantee that the algorithm will be able to give the correct output.
After installing the hardware, images were collected in real conditions. It is convenient that, in projects like this, no large dataset is needed for development of machine learning-based solution. Namely, there are no large variations because the conditions are controlled, and there are not many types of objects that need to be detected. Therefore, the number of training images of the order of 100 (or several hundreds) is enough. Data augmentation is also very important part of the development – it is used to synthetically add many common types of possible variations: the most common ones being rotation, translation, brightness changes, noise addition etc. Also, it is enough to annotate a relatively small number of images, as was done in this case. The trained model than accurately predicts true labels (bounding boxes in case of detection) for other images, after which all images are used for tuning of the final model. This reduces the need for long manual data annotation. The model was trained in transfer learning framework, using weights pretrained on large dataset (COCO in this case). The backbone of the model can be kept frozen (not changed), as we did in this case, because the dataset for transfer learning is very small.
Some general rules must be followed to make sure that the trained model generalizes well[6]. See Figures 3 and 4 for examples of detections.

Figure 3. Example of detection of objects of interest.

Figure 4. Example of detection of defective sample.


Generally, projects like this have many steps and components that all need to be done well, from hardware to data collection, software and integration with other industrial equipment (PLC controller etc.). This requires cooperation between all teams – hardware engineers, software engineers, machine learning / computer vision engineers, and industrial automation experts. Our part in this project was choosing the right hardware (camera, lens, lighting and their positioning), development and deployment of vision solution, and integration with PLC in cooperation with the client’s automation expert. Another important note is that continuous monitoring and improvement is generally needed for deep learning projects, which is related to the general lack of 100 percent guarantee that the solution will have perfect precision and recall in all cases.


There are enormous possibilities for solutions like this in industrial automation. We also have other projects that are either in development or in Proof-of-Concept phase. Interested? Contact us at hello@protostar.ai!