Protostar Vision Box – Part 4: Roses are red, bottle caps are blue…

We continue with a series of posts about Protostar Vision Box. This article explains our approach to the bottle cap color recognition problem in a real industrial application. The solution to this problem is important because of stray caps of the wrong color during production, which need to be separated from bottles with the correct cap color. This solution must be robust and precise as well as light and fast, since cap color recognition is just one of many other inspections that need to be performed on every bottle that arrives on the conveyor belt.



We collected images of bottles directly in the industry with our Protostar Vision Box that we developed earlier. The collection lasted until we collected a sufficient number of each type of bottle cap produced in the factory, approximately a thousand images per cap color. After collecting, we annotated the caps on the images from the dataset. At first, we manually annotated the images, but later a pre-trained model helped us with annotating. When annotating, images containing improperly closed caps were discarded. Below, you can see one example of each cap color.




Our approach was to calculate the mean pixel value in every of three channels (RGB) of the image. To ensure the calculation using the pixels representing the bottle cap, we used the coordinates obtained by annotating the images. The region of interest is further reduced, which reduces the possibility of background appearance in this region. For better understanding, the image below shows the region on which the calculation is performed.




Here is also a function which calculates mean pixel value per each channel of an image.





After the calculation of mean pixel values per each channel, and so for each image from a set of images with equal bottle cap color, the proposed algorithm calculates the minimum, mean and maximum RGB triplet values per set. For example, let’s say that we have green cap color set with 1000 images, and we calculate mean pixel value from region of interest for every channel, suppose that the result of the calculation of one image is the triplet (234, 56, 142). This process is repeated for each image in the set, and we end up with a sequence of triplets: {(234, 56, 142), (230, 53, 152), (232, 53, 140), …, (232, 56, 141)}. From this sequence, we calculate the minimum, mean and maximum triplet values, which are represented by three triplets. We store those values and bind them with the color of the processed bottle cap, below you can see what the result of this process looks like.






Now we are ready to recognize the color of the cap on the new set of images. For each new image, it is necessary to first detect the position of the cap using the pre-trained YOLO model, reduce that region as it is mentioned in the previous section, and calculate the mean pixel value for each image channel.  Color, whose mean pixel values have the smallest Euclidean distance from the current processed values, is the color of the currently processed cap. The following code snippet shows the functions responsible for finding the nearest color.






Our cap color recognition solution has proven to be successful in this industrial application and fast enough to test every cap, which is challenging as the conveyor belt moves at high speed. The average processing time of one image using this algorithm in production is about 2 ms.


You may have wondered why we didn’t combine cap inspection and cap color recognition into one model, the reason for this is that during the development of the model we didn’t have a balanced dataset with which we could train such a model, but a more detailed answer can be found in the previous post from this series.


Next up, we’ll go over things that didn’t go quite so smoothly 😅. Stay tuned for the next #PVBFridays!