Generating visual deep learning training data based on 3D scans (Paid)

Type: 
BA/PR
Persons: 
1-2
Workgroup: 

Description

Deep learning is a powerful tool for building visual classifiers. Many deep learning tools like Google's Tensor Flow [1,2], dlib [3,4] or Torch/pyTorch [5] are open source and available for free. However, to work properly, these tools either require previous trained kownledge, called models, or must be trained using, in many cases difficult to obtain, labeled data. Models for common use cases, e.g., classification of persons or cars in images, and the training data on which these models are based on are freely available for most of these tools [6,7]. However, there are no models available for classifying more specific objects like industrial robots. Creating models for classifying such objects is expensive due to the huge amount of required labeled training data. In addition to using real-world training data, synthetic generation of training data has been investigated [8]. Even if models based on synthetic data can be less accurate compared to models based on real-world training data [9], they are very promising in terms of efficient automatic training data generation. Utilizing 3D scanning technologies, objects can be digitized and used for labeled training data generation. Additionally, synthetically generated training data has much less privacy issues. In combination with available training data for common use cases, these synthetic data can be used to train models which can classifiy common objects as well as special objects.

This project aims to investigate and evaluate methods for:

  • easy and convenient 3D data acquisition of real-world objects.
  • generating renderings of objects in synthetic environments (e.g. Unity3D, Unreal Engine, Blender).
  • applying different post processing filters to generate different enviromental settings.
  • usage of synthetic and semi-synthetic training data for visual classification.

This project has a cooperative supervision with PKE Electronics. Students will get 500€ for their work (if working and finished within 6 months).

Tasks

The result of this project should be a prototypical pipeline for semi-automatic training data generation. The goal of this pipeline is to minimize the amount of required user interaction while simultaneously maximizing the classification accuracy.

This pipeline should include the following items:

  • 3D geometry data acquisition and optimization of a target real-world object class using mobile 3D scanning technologies (mobile phones, mobile ToF sensors, mobile stereo vision sensors, ...)
  • Labeled image training data generation of the 3D model in synthetic environments using rendering engines like Unity3D or Unreal Engine
  • (Optional) Applying post processing filters to generate different enviromental settings (e.g. different seasons or weather) [10]
  • Organizing the resulting data for usage in the training modules of Google Tensor Flow, dlib, and Torch/pyTorch.
  • Interfacing at least one deep learning framework to train a classification model
  • Evaluating the classification result.

The focus of this work is on the acquisition and optimization of 3D data as well as the generation of renderings for training. The deep learning part is done in cooperation with PKE Electronics.

Requirements

  • Knowledge of rendering engines, like Unity3D or Unreal Engine
  • Knowledge of image processing
  • (Optional) Knowledge of deep learning

Environment

The focus of this project is to create and evaluate a pipeline prototype rather than implementing algorithms from scratch.

References

[1] https://www.tensorflow.org/
[2] M. Abadi et al., "TensorFlow: A system for large-scale machine learning", in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pp. 265-283.
[3] http://dlib.net/
[4]	Davis E. King, "Dlib-ml: A machine learning toolkit", Journal of Machine Learning Research, 10:1755-1758, 2009.
[5] https://github.com/pytorch/pytorch
[6] http://image-net.org/index
[7] https://github.com/tensorflow/models
[8] A. Rozantsev, "Vision-based detection of aircrafts and UAVs", p. 116, 2017.
[9] S. Hinterstoisser et.al., "On Pre-Trained Image Features and Synthetic Images for Deep Learning", arXiv:1710.10710v2.
[10] https://github.com/junyanz/CycleGAN

Contact

For more information please contact Florian Rudolf or Michael Wimmer (wimmer@cg.tuwien.ac.at).