Visual Exploration of Biases in Word Embeddings

Master Thesis
1

Description

Word embeddings are learned feature representations of words, where semantically similar words have similar feature vectors. They are used to facilitate text / document queries, retrieval, and classification. 

As word embeddings are learned from large text corpora, they also learn data-inherent biases, such as gender and race biases. For example, the figure on the left shows a custom projection of selected words' feature vectors for two manually defined axes (liberal to conservative and poor to rich, by Kozlowski et al., American Sociological Review 2019). Identifying and analyzing these biases is an important task in social sciences. 

Tasks

Existing solutions visualize biases of pre-selected words on pre-selected axes. In other words, they are used to visually confirm biases that scientists expect to see in a text corpus. In this work, the student shall create a visual exploration interface, which allows users to interactively explore potential biases learned by the word embedding. 

Requirements

  • Strong interest in visualization, machine learning, and human-computer interaction
  • Solid programming skills, especially using Python and JavaScript
  • Prior experience with natural language processing and / or d3 is a plus
  • Openness to external collaboration

Environment

The project can build upon an existing visualization solution to visualize large-scale word embeddings using a Python backend and a JavaScript / d3 frontend. Collaboration with linguistic researchers from Universität Wien is possible. 

Responsible

For more information please contact Manuela Waldner.