Information
- Publication Type: Master Thesis
- Workgroup(s)/Project(s): not specified
- Date: 2026
- TU Wien Library: AC17909432
- Second Supervisor: Stefan Neumann
- Open Access: yes
- First Supervisor: Manuela Waldner

- Pages: 106
- Keywords: Sampling, Human-AI Collaboration, Visual Analysis, Unstructured Data
Abstract
Analyzing unstructured data such as images presents a major challenge for exploratory data analysis due to their high dimensionality. The data must first be transformed into embeddings, which results in lower dimensionality where the data are more closely grouped. Sampling is essential to make these datasets more understandable for humans through visualization. This work aims to explore how interactive systems can provide representative and interpretable samples quickly, even from large and unbalanced image datasets. Standard methods like random sampling can reach their limits and often fail to capture rare classes, leading to biases in interpretation. Within a standardized interaction protocol, various data-driven strategies (e.g., farthest sampling and Dα sampling) and model-aware strategies (e.g., min-margin and disagreement) are compared with random sampling. The goal is to investigate which strategies offer the best balance between fast class discovery, high model accuracy, and low latency within a defined interaction budget. The results show that data-driven methods are strong in the early stages of the iterative process, as they explore the data space and lead to faster discovery of new classes. In contrast, model-aware methods offer advantages in later stages, as they refine the decision boundaries and efficiently increase accuracy as labeled data become available. The superiority of targeted sampling over random sampling is particularly evident with unbalanced datasets. Furthermore, the work shows that GPU acceleration reduces latency in the iterative cycle, thus maintaining the critical threshold of less than one second per selection step, enabling smooth interactions.
Additional Files and Images
Weblinks
BibTeX
@mastersthesis{fitz-2026-isc,
title = "Interactive Sampling for Class Discovery in Unstructured
Data",
author = "Lukas Fitz",
year = "2026",
abstract = "Analyzing unstructured data such as images presents a major
challenge for exploratory data analysis due to their high
dimensionality. The data must first be transformed into
embeddings, which results in lower dimensionality where the
data are more closely grouped. Sampling is essential to make
these datasets more understandable for humans through
visualization. This work aims to explore how interactive
systems can provide representative and interpretable samples
quickly, even from large and unbalanced image datasets.
Standard methods like random sampling can reach their limits
and often fail to capture rare classes, leading to biases in
interpretation. Within a standardized interaction protocol,
various data-driven strategies (e.g., farthest sampling and
Dα sampling) and model-aware strategies (e.g., min-margin
and disagreement) are compared with random sampling. The
goal is to investigate which strategies offer the best
balance between fast class discovery, high model accuracy,
and low latency within a defined interaction budget. The
results show that data-driven methods are strong in the
early stages of the iterative process, as they explore the
data space and lead to faster discovery of new classes. In
contrast, model-aware methods offer advantages in later
stages, as they refine the decision boundaries and
efficiently increase accuracy as labeled data become
available. The superiority of targeted sampling over random
sampling is particularly evident with unbalanced datasets.
Furthermore, the work shows that GPU acceleration reduces
latency in the iterative cycle, thus maintaining the
critical threshold of less than one second per selection
step, enabling smooth interactions.",
pages = "106",
address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
school = "Research Unit of Computer Graphics, Institute of Visual
Computing and Human-Centered Technology, Faculty of
Informatics, TU Wien",
keywords = "Sampling, Human-AI Collaboration, Visual Analysis,
Unstructured Data",
URL = "https://www.cg.tuwien.ac.at/research/publications/2026/fitz-2026-isc/",
}