Information
- Publication Type: Bachelor Thesis
- Workgroup(s)/Project(s):
- Date: April 2018
- Date (Start): May 2017
- Date (End): April 2018
- Matrikelnummer: 01426853
- First Supervisor: Manuela Waldner
Abstract
Having to read and understand lots of text documents and reports on a daily basis can be quite challenging. The intended audience for these reports has limited resources and wants to reduce time spent on reading such reports. Therefore a need for a tool emerges that assists the process of gaining relevant information out of reports/documents more quickly. These text documents are often unstructured and of varying length. They are written in the English language and are available from different sources (such as RSS feeds and text files). The aim of this project is to offer a tool that supports the process of analysing and understanding given texts. This is made possible by using natural language processing (NLP) and text visualization (TextVis). TextVis is already a well known and frequently used solution. The herein described project uses an NLP pipeline which serves as preprocessing for TextVis. To provide quick insight into the data, topic extraction mechanisms like Latent Dirichlet Allocation (LDA) or Non-negative Matrix Factorization (NMF) are available for the user to be chosen within the aforementioned pipeline. A major challenge for TextVis is the configuration of the NLP pipeline, because there are many different ways of doing so and a wide range of parameters to chose from. To overcome this issue, this project provides a solution that enables users to easily configure and customize their own NLP pipeline. It is designed to encourage these users to experiment with different sequences of NLP operations and parameter configurations to find a solution that suites them best. In order to keep it easy to use the software, it is implemented entirely using web technologies to be accessible in a common web browser. The resulting visualization will emphasize particular parts of the text based on a set of different factors, if selected so. These factors can be topics, sentiments and part-of-speech-tagged words. The focus of this work lies on a visual interface that enables and encourages users to adjust/optimize the underlying NLP pipeline (by selecting steps and setting parameters) and comparing their results. Evaluation with help of user feedback showed that certain pipeline configurations work better for certain types of texts than others. Using the solution created within this work, users can adapt the tool to their needs and also tweak it according to requirements. There is no universal configuration that works for all documents, however.Additional Files and Images
Weblinks
No further information available.BibTeX
@bachelorsthesis{smiech-2018-tei,
  title =      "Configurable Text Exploration Interface with NLP for
               Decision Support",
  author =     "Martin Smiech",
  year =       "2018",
  abstract =   "Having to read and understand lots of text documents and
               reports on a daily basis can be quite challenging. The
               intended audience for these reports has limited resources
               and wants to reduce time spent on reading such reports.
               Therefore a need for a tool emerges that assists the process
               of gaining relevant information out of reports/documents
               more quickly. These text documents are often unstructured
               and of varying length. They are written in the English
               language and are available from different sources (such as
               RSS feeds and text files). The aim of this project is to
               offer a tool that supports the process of analysing and
               understanding given texts. This is made possible by using
               natural language processing (NLP) and text visualization
               (TextVis). TextVis is already a well known and frequently
               used solution. The herein described project uses an NLP
               pipeline which serves as preprocessing for TextVis. To
               provide quick insight into the data, topic extraction
               mechanisms like Latent Dirichlet Allocation (LDA) or
               Non-negative Matrix Factorization (NMF) are available for
               the user to be chosen within the aforementioned pipeline. A
               major challenge for TextVis is the configuration of the NLP
               pipeline, because there are many different ways of doing so
               and a wide range of parameters to chose from. To overcome
               this issue, this project provides a solution that enables
               users to easily configure and customize their own NLP
               pipeline. It is designed to encourage these users to
               experiment with different sequences of NLP operations and
               parameter configurations to find a solution that suites them
               best. In order to keep it easy to use the software, it is
               implemented entirely using web technologies to be accessible
               in a common web browser. The resulting visualization will
               emphasize particular parts of the text based on a set of
               different factors, if selected so. These factors can be
               topics, sentiments and part-of-speech-tagged words. The
               focus of this work lies on a visual interface that enables
               and encourages users to adjust/optimize the underlying NLP
               pipeline (by selecting steps and setting parameters) and
               comparing their results. Evaluation with help of user
               feedback showed that certain pipeline configurations work
               better for certain types of texts than others. Using the
               solution created within this work, users can adapt the tool
               to their needs and also tweak it according to requirements.
               There is no universal configuration that works for all
               documents, however.",
  month =      apr,
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Institute of Computer Graphics and Algorithms, Vienna
               University of Technology ",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2018/smiech-2018-tei/",
}


