Visual Comparison of Natural Language Processing Pipelines | TU Wien

Information

Publication Type: Bachelor Thesis
Workgroup(s)/Project(s):
- Visual Information Foraging on the Desktop
Date: January 2019
Date (Start): April 2018
Date (End): January 2019
Matrikelnummer: 1425731
First Supervisor: Manuela Waldner

Abstract

Natural Language Processing comprises a variety of operations that can be applied on raw text to extract features. The sequence of operations is called NLP pipeline. However, the sequence and parameters of these individual operations differ between applications. In each step of the ongoing sequence, a single process is performed with a specialized task. Such a task can be the determination of the end of sentences or the removal of so-called stop words. There is no best-practice which combination is most effective and accurate to determine the descriptive features (key words) of a single document. The goal of this bachelor thesis is to compute the features of different variations of NLP pipelines and visualize them as basic word clouds. It is also important to know how the resulting word cloud of each pipeline is affected by varying the order of certain steps, adding steps or removing steps. The presented interface gives an overview of performance and similarity values of each computed pipeline.

Additional Files and Images

Additional images and videos

screenshot

Additional files

thesis

Weblinks

No further information available.

BibTeX

@bachelorsthesis{hromniak-2019-vcn,
  title =      "Visual Comparison of Natural Language Processing Pipelines",
  author =     "Patrick Hromniak",
  year =       "2019",
  abstract =   "Natural Language Processing comprises a variety of
               operations that can be applied on raw text to extract
               features. The sequence of operations is called NLP pipeline.
               However, the sequence and parameters of these individual
               operations differ between applications. In each step of the
               ongoing sequence, a single process is performed with a
               specialized task. Such a task can be the determination of
               the end of sentences or the removal of so-called stop words.
               There is no best-practice which combination is most
               effective and accurate to determine the descriptive features
               (key words) of a single document. The goal of this bachelor
               thesis is to compute the features of different variations of
               NLP pipelines and visualize them as basic word clouds. It is
               also important to know how the resulting word cloud of each
               pipeline is affected by varying the order of certain steps,
               adding steps or removing steps. The presented interface
               gives an overview of performance and similarity values of
               each computed pipeline.",
  month =      jan,
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Research Unit of Computer Graphics, Institute of Visual
               Computing and Human-Centered Technology, Faculty of
               Informatics, TU Wien ",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2019/hromniak-2019-vcn/",
}