Welcome to the last part of our series of blog posts for Protostar Vision Box. Now, after everything we’ve shown you so far, it’s time to share some of our mistakes and what we learned.
In the beginning of the project, guided by the motto “less is more,” we decided to solve the color detection problem with a very simple approach. Before our color detector could start working, preparation was necessary. During the preparation, we selected one image of a bottle for each color of the cap, and in that image, we chose a pixel row from the part where the cap is located. From the selected pixel row, we calculated the average value for each of the 3 channels of the image (RGB), and we stored that data with the corresponding name of the cap color. When we repeated the described procedure for all types of caps, the detector was ready for cap color detection, or at least we thought so… The plan was that when a new bottle arrives, we would again take a pixel row from the part of the image where the cap is, calculate the average value for each of the 3 channels of the image, and compare them with the previously stored values. During the comparison of these values, we would determine which value, i.e., color, is most similar to the new value, and thus, we wanted to determine the cap color. Soon, we realized that this solution is not robust enough for industrial application because there is often liquid on the cap that is filled into bottles, the lighting is not always the same, and therefore, the average values of the image channels (RGB) are not consistent. Additionally, there is an expiration date printed on the cap, which is not the same color as the cap. Because of all these reasons, we were forced to improve our color detection algorithm. The improvement of the algorithm involved more images and a larger area on the image from which we calculated the average pixel values. You can find more about the final solution for color detection here.
Synthetic images created with 3D modelling
Blender – a 3D modelling software – was used for creating synthetic images of bottles. We have created a bottle and a cap 3D model that is a close copy of a real bottle, and then edited the cap model to create a couple of examples of defective caps. A python script manipulated the model, changed the rotation on the bottle and rendered an image with resolution close to real images obtained.
Real bottle (left) and synthetic bottle (right).
Using synthetic datasets in machine learning can increase model accuracy, but sometimes using only synthetic datasets can result in low performance, which is what happened when we trained a model on synthetic images of bottles only. Combining existing real data with synthetic data often creates models that are even more precise than using real data only. Another reason for the low performance of this model is the fact that the synthetic dataset was not diverse enough when it comes to images with defective caps. Defective caps in real life come in various states, and covering those states with 3D modelling was very difficult.
As with everything related to hardware, there absolutely has to be some hiccups along the way. Our first taste of what’s to come happened on our first trip to the factory when we were supposed to deploy our solution and as you might imagine nothing worked as it should. First, we had a lot of problems with bottles moving vertically during the production, which messed with our then genius idea of using template matching for bottle cap inspection. The second thing was that we didn’t have access to the motor encoders, so we didn’t have any information about the speed of the transport belt, which in turn caused us a lot of problems with correct timing on the bottle ejector. The third problem was that the client didn’t clearly communicate their desire to have adjustable levels of liquid based on the drinks and not on bottles. Icing on the cake was that we were relatively unfamiliar with the environment of working in the factory. It was loud, you were standing for long hours and since it’s the food industry and everything has to be clean, there is a lot of hydroxide in the air which can be unpleasant. All of those compounded into one messy trip where nothing worked and nobody was happy. But those things happen, and we must learn from them. Each of our next trips was better than the last in terms of comfort, progress on the deployment and stability of our system. We learned everything about factory deployment the hard way, but our next factory deployments will definitely be better because of what we already endured and don’t want to repeat. One thing we joke about in the office regularly and might help you in the future is that: “The more sensors you have, the easier it is to do something”.
Template matching and anomaly detection
You know the old saying: “Less is more”. We thought we did too! We tried combining simple methods to get a simple solution. What we ended up doing was the exact opposite. First, we used template matching to identify the area of interest of the image. We were looking for the part where the bottle cap tightens around the top of the bottle. The idea was to give the anomaly detector just the area of interest. This part worked really well. What we didn’t expect was how complicated it would be to detect anomalies with such a simple approach. We gathered a few images of good bottles, annotated parts such as the cap and the cap + ring. These two labels were the basis of our approach. We singled out the bad bottles, leaving only the good ones. Then, we calculated the bounding boxes and, ratio between them and the minimum and maximum position of the ring. With this information, we created the minimum and maximum allowed value for each parameter. Afterwards, we put the bad bottles back into the dataset and ran the anomaly detector with our tuned parameters. And it worked – for a specific bottle (if the bottle stays the same distance from the camera – which it didn’t). This meant that not only did we have to go through the same tedious process for each and every bottle type, we also had to worry about the distance from the camera. Added on top of all that, we also introduced human bias. Someone had to tune the threshold parameters manually, and that meant that what is acceptable and what isn’t – varied. As you can see, our “simple” approach turned out to be quite complex, and for no good reason, which is why we turned to an actual simple approach of just using YOLO and labeling bottles as good or bad.
Example from area of interest.
Example of ratios from a good bottle.
And that’s it! We hope you enjoyed #PVBFridays and gotten an understanding of how our Vision Box works. See you in the next blog series! 😀