Information

Abstract

The majority of text-to-image generation systems rely on linear text prompts, which struggle to represent complex relationships. Building upon an existing prompting tree interface for image generation which allows the description of prompts using nodes and edges, this research investigates the enhancement of graph-based prompting through two contributions: automated edge-label generation optimized graph visualization. By providing upgrades to the system, users can create graphs with an integrated Large Language Model (Qwen) or Masked Language Model (BERT) to automatically generate edge labels in the prompting tree. Furthermore, we introduce a secondary "overview layout" using the Dagre engine to address graph readability and label occlusion. The evaluation focuses on three aspects: the accuracy and efficiency of automated edgelabel generation, its impact on image quality as measured by CLIP scores, and the effectiveness of the overview layout for graph visualization. Experiments demonstrate that BERT has faster inference times compared to Qwen and generates more reliable predictions. Although the impact of augmented prompts on final image quality varied, the proposed overview layout successfully optimized the distribution of graph elements and eliminated label occlusion.

Additional Files and Images

Additional images and videos

Additional files

Weblinks

No further information available.

BibTeX

@bachelorsthesis{lehmann-2026-label,
  title =      "Automatic Edge Label Prediction for a Graph-Based Image
               Generation Interface",
  author =     "Fabian Lehmann",
  year =       "2026",
  abstract =   "The majority of text-to-image generation systems rely on
               linear text prompts, which struggle to represent complex
               relationships. Building upon an existing prompting tree
               interface for image generation which allows the description
               of prompts using nodes and edges, this research investigates
               the enhancement of graph-based prompting through two
               contributions: automated edge-label generation optimized
               graph visualization. By providing upgrades to the system,
               users can create graphs with an integrated Large Language
               Model (Qwen) or Masked Language Model (BERT) to
               automatically generate edge labels in the prompting tree.
               Furthermore, we introduce a secondary "overview layout"
               using the Dagre engine to address graph readability and
               label occlusion. The evaluation focuses on three aspects:
               the accuracy and efficiency of automated edgelabel
               generation, its impact on image quality as measured by CLIP
               scores, and the effectiveness of the overview layout for
               graph visualization. Experiments demonstrate that BERT has
               faster inference times compared to Qwen and generates more
               reliable predictions. Although the impact of augmented
               prompts on final image quality varied, the proposed overview
               layout successfully optimized the distribution of graph
               elements and eliminated label occlusion.",
  month =      jun,
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Research Unit of Computer Graphics, Institute of Visual
               Computing and Human-Centered Technology, Faculty of
               Informatics, TU Wien ",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2026/lehmann-2026-label/",
}