Information
- Publication Type: Bachelor Thesis
- Workgroup(s)/Project(s):
- Date: June 2026
- Date (Start): December 2025
- Date (End): June 2026
- Matrikelnummer: 12225096
- First Supervisor: Johannes Eschner
Abstract
The majority of text-to-image generation systems rely on linear text prompts, which struggle to represent complex relationships. Building upon an existing prompting tree interface for image generation which allows the description of prompts using nodes and edges, this research investigates the enhancement of graph-based prompting through two contributions: automated edge-label generation optimized graph visualization. By providing upgrades to the system, users can create graphs with an integrated Large Language Model (Qwen) or Masked Language Model (BERT) to automatically generate edge labels in the prompting tree. Furthermore, we introduce a secondary "overview layout" using the Dagre engine to address graph readability and label occlusion. The evaluation focuses on three aspects: the accuracy and efficiency of automated edgelabel generation, its impact on image quality as measured by CLIP scores, and the effectiveness of the overview layout for graph visualization. Experiments demonstrate that BERT has faster inference times compared to Qwen and generates more reliable predictions. Although the impact of augmented prompts on final image quality varied, the proposed overview layout successfully optimized the distribution of graph elements and eliminated label occlusion.Additional Files and Images
Weblinks
No further information available.BibTeX
@bachelorsthesis{lehmann-2026-label,
title = "Automatic Edge Label Prediction for a Graph-Based Image
Generation Interface",
author = "Fabian Lehmann",
year = "2026",
abstract = "The majority of text-to-image generation systems rely on
linear text prompts, which struggle to represent complex
relationships. Building upon an existing prompting tree
interface for image generation which allows the description
of prompts using nodes and edges, this research investigates
the enhancement of graph-based prompting through two
contributions: automated edge-label generation optimized
graph visualization. By providing upgrades to the system,
users can create graphs with an integrated Large Language
Model (Qwen) or Masked Language Model (BERT) to
automatically generate edge labels in the prompting tree.
Furthermore, we introduce a secondary "overview layout"
using the Dagre engine to address graph readability and
label occlusion. The evaluation focuses on three aspects:
the accuracy and efficiency of automated edgelabel
generation, its impact on image quality as measured by CLIP
scores, and the effectiveness of the overview layout for
graph visualization. Experiments demonstrate that BERT has
faster inference times compared to Qwen and generates more
reliable predictions. Although the impact of augmented
prompts on final image quality varied, the proposed overview
layout successfully optimized the distribution of graph
elements and eliminated label occlusion.",
month = jun,
address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
school = "Research Unit of Computer Graphics, Institute of Visual
Computing and Human-Centered Technology, Faculty of
Informatics, TU Wien ",
URL = "https://www.cg.tuwien.ac.at/research/publications/2026/lehmann-2026-label/",
}