Generating 3D data for training neural networks for surveillance applications

Johannes Eschner
Generating 3D data for training neural networks for surveillance applications
[thesis]

Information

Abstract

As the demand for ever-more capable computer vision systems has been increasing in recent years, there is a growing need for labeled ground-truth data for such systems. These ground-truth datasets are used for the training and evaluation of computer vision algorithms and are usually created by manually annotating images or image sequences with semantic labels. Synthetic video generation provides an alternative approach to the problem of generating labels. Here, the label data and the image sequences can be created simultaneously by utilizing a 3D render engine. Many of the existing frameworks for generating such synthetic datasets focus the context of autonomous driving, where vast amounts of labeled input data are needed. In this thesis an implementation of a synthetic data generation framework for evaluating tracking algorithms in the context of video surveillance is presented. This framework uses a commercially available game engine as a renderer to generate synthetic video clips that depict different scenarios that can occur in a video surveillance setting. These scenarios include a multitude of interactions of different characters in a reconstructed environment. A collection of such synthetic clips is then compared to real videos by using it as an input for two different tracking algorithms. While producing synthetic ground-truth data in real time using a game engine is less work intensive than manual annotation, the results of the evaluation show that both tracking algorithms perform better on real data. This suggests that the synthetic data coming from the framework is limited in its suitability for evaluating tracking algorithms.

Additional Files and Images

Additional images and videos

Additional files

Weblinks

No further information available.

BibTeX

@bachelorsthesis{ESCHNER-2019-GDT,
  title =      "Generating 3D data for training neural networks for
               surveillance applications",
  author =     "Johannes Eschner",
  year =       "2019",
  abstract =   "As the demand for ever-more capable computer vision systems
               has been increasing in recent years, there is a growing need
               for labeled ground-truth data for such systems. These
               ground-truth datasets are used for the training and
               evaluation of computer vision algorithms and are usually
               created by manually annotating images or image sequences
               with semantic labels. Synthetic video generation provides an
               alternative approach to the problem of generating labels.
               Here, the label data and the image sequences can be created
               simultaneously by utilizing a 3D render engine. Many of the
               existing frameworks for generating such synthetic datasets
               focus the context of autonomous driving, where vast amounts
               of labeled input data are needed. In this thesis an
               implementation of a synthetic data generation framework for
               evaluating tracking algorithms in the context of video
               surveillance is presented. This framework uses a
               commercially available game engine as a renderer to generate
               synthetic video clips that depict different scenarios that
               can occur in a video surveillance setting. These scenarios
               include a multitude of interactions of different characters
               in a reconstructed environment. A collection of such
               synthetic clips is then compared to real videos by using it
               as an input for two different tracking algorithms. While
               producing synthetic ground-truth data in real time using a
               game engine is less work intensive than manual annotation,
               the results of the evaluation show that both tracking
               algorithms perform better on real data. This suggests that
               the synthetic data coming from the framework is limited in
               its suitability for evaluating tracking algorithms.",
  month =      oct,
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Research Unit of Computer Graphics, Institute of Visual
               Computing and Human-Centered Technology, Faculty of
               Informatics, TU Wien ",
  keywords =   "neural networks",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2019/ESCHNER-2019-GDT/",
}