Overview
For a successful training of a machine learning model, a dataset of good quality and size is necesarry. Annotating a large dataset is a repetitve and exhaustive task that is prone to human error. Furthermore, data collection process is sometimes complicated or even impossible. For example, if the dataset should contain human images, there are privacy issues. Generating and using synthetic data overcomes both data collection and annotation obstacles.
Goals
Create systems for synthetic data generation that fit a specific use case. Use the large scale generated data to train a machine learning model. Finetune the model on a small dataset containing real data.
Solution
Use a 3D modeling software Blender and its embedded Python interpreter to write a script that automatically generated or modifies 3D models of the targeted objects or scenes. Since the generated 3D scene and object contains information about size and location of each point in 3D space, generate 100% precise annotations alongside images. Since this is an automated process, datasets created using this approach are not limited in size and are used to train a deep learning model. This model is later than finetuned on a small set of real data, or used for transfer learning.
Results
- Established system for generating annotated data using 3D modeling and Python. Used the generated synthetic data to train a deep learning model, which is then trained on a small set of real data.