Protostar Vision Box – Part 3: Bottle cap defect detection

Welcome back 👋


This is the third part of our series of blog posts on Protostar Vision Box. Last week, we showcased the sensor suite. Today, we are going to showcase the defect detection AI algorithm, how it was trained and implemented.



With this post, we continue the series of blogs about the Protostar Vision Box, which we are very proud of, and we are happy to share with you some of the specs and algorithms that are incorporated in this product. For those who don’t know, in this series of blogs we share one specific case where our Vision Box controls bottling. During the bottle filling process, it can happen that the cap is not closed properly, such bottles must be removed from the conveyor belt. This article explains the development of a system which solves this problem. Here are some examples of bottles with improperly closed caps that need to be removed from production.



To detect improperly closed bottle caps, we have to use two cameras, one on each side of the conveyor belt, you can see this setup on the next picture below.




You can find more details about hardware solution for this problem in our previous post. In the following, we will concentrate more on the software part of our solution.


When we encountered this problem, we first thought that detection of improperly closed bottle caps could be solved with the template matching method, but we soon realized that the conditions in production are not always ideal for this method, for example, the bottles are not always at the same distance from the camera and bottles are often covered with water and milk.


Example of bottles at different distances from the camera


Bottles covered with milk and water


Therefore, we decided to train the YOLOv6 model for improperly closed caps detection, which is a more robust approach than a solution like template matching. The reason why we decided on the YOLOv6 and not on one of the other versions of the YOLO model is that this version is intended for industrial applications.


Data acquisition and annotating

The data acquisition process is an indispensable part of the development of such a solution. For data acquisition, we used our Protostar Vision Box as well as the features of the Protostar Platform, which we developed earlier precisely with the aim of automatic dataset collection, which combines the collection of images directly from production, automatic storage of images in the cloud, and provides the possibility of annotating the collected dataset. As we decided to train a model that recognizes improperly closed bottle caps, we had to find a sufficient number of images of such caps, which was a challenge because they are relatively rare. For this reason, we decided to find a certain number of such caps and train a temporary YOLOv6 model that facilitated the process of searching for images of improperly closed caps in datasets of several tens of thousands of images.


When manually annotating the images, we followed several good practices in order to make the result dataset as high-quality as possible. We defined annotation rules that all annotators had to adhere to, we checked each other’s annotations, and each annotation was checked by several annotators as well as our automated algorithm for checking annotations.



While the process of collecting images was still ongoing, we started test training the model with the images available to us, we could not wait to train the model after collecting a dataset containing all types of caps due to the dynamics of production. In addition, some types of caps had almost no defects at all, so collecting images of some types of caps took longer, we tried to solve that problem with the help of a synthetic dataset, but that approach did not give good enough results, you can read more about this in the last post of this series. For those caps that were collected more slowly, we collected images in our office where we recreated the setup from the factory.


Although it would be better to combine the recognition of the color of the cap, which was also necessary during the inspection of the caps, and the detection of an improperly closed cap into one model, it was not possible to do this precisely because of the unequal time of collecting the data set for a particular type of cap, so we have decided to separate these two solutions. More about identifying the color of the cap will be in our next post.


Several of the first test YOLOv6 models were trained on only two to three types of caps, but such a model achieved good results even for some caps whose images were not in the training set. The goal was to generalize the model so that it would begin to ignore the color of the cap and only know how to detect an improperly closed cap.


It took a lot of training iterations to find the optimal training hyperparameters, this process was facilitated by our computers on which we did training, sometimes even several trainings at once. One computer with NVIDIA RTX A5000 and RTX 5000 graphics cards and Ryzen Threadripper PRO 3975WX processor and other with two NVIDIA TITAN RTX graphic cards and AMD Ryzen Threadripper 3970X processor, both computers have Ubuntu 20.04.5 LTS operating system.

The final model used in production achieves a prediction time of less than 20ms on an industrial computer without GPU so that the prediction is performed on Intel Core i7-7700 processor. The input to this model are 3-channel images with a size of 256×256 pixels.


Integration with PLC

In the end, we have to use the results from our model to actuate the physical bottle ejector. This is done by sending the number of a bottle that’s bad to a PLC. Protostar Vision Box has multiple sensors throughout the path that the bottle takes. Each sensor has its own counter attached to it in the PLC program which helps us track the bottle as they go through the vision box, we explained this in the previous post (in case you missed it 👀). With that solution, each second we know how many bottles passed and where each bottle is located. Considering that, let’s continue on to how the information is sent. After inference is finished, the result is assigned as 1 or 0 depending on the outcome. We are also keeping track of the bottles that arrived for visual inspection, so in theory our bottle counters should match across the board. Using snap7 which is a multi-platform Ethernet communication suite for interfacing natively with Siemens S7 PLCs, we can write directly into our PLCs database. The process itself is pretty simple, we match the result to a current bottle counter, and we connect to a PLC database. Then we read the bytes that contain the flag for a bad bottle and edit it to flag the specific bottle as defective. Afterwards we just save it and let the PLC program do the rest.


PLC Program

So how do we actually eject bottles? It all boils down to  a simple FIFO stack. The first bottle that comes in the box will also be the one that’s potentially ejected first. As the bottle moves on the production line we are constantly keeping track of it and first checking the liquid level, afterwards during that process we can also receive the signal from the PC that a particular bottle is not on par with the production standards. Before the bottle arrives at the ejector portion, a final check is done to see if that particular bottle counter has been flagged as defective on any of the previous check stations. If the bottle number flagged as defective matches the counter on the last sensor, the ejector is activated and the bottle is slowly pushed away without interrupting the flow of other bottles behind it.

Simple block diagram showcasing a part of the plc program.


That’s it for this week, next week we will be talking about using cap color detection and how the system was integrated with bad cap detection.  Stay tuned! 😄