Interactive Visual Exploration of Diachronic Word Embeddings

Master Thesis
1

Description

Diachronic word embeddings, i.e., high-dimensional time-dependent embeddings of textual information, can be used to reveal shifts of semantic word meanings over time. For example, diachronic word embeddings could reveal that the German word “Widerstand” changed its primary meaning from an electrical context to resistance to Nazism in the last century [1]. Word embeddings can be visualized by using dimensionality reduction (see, for example, Tensorflow’s Embedding Projector [2]), but such visualizations typically do not provide an expressive overview and do not reveal semantic changes over time.

Tasks

The task of this thesis will be to design, implement, and validate a comparative visualization [3] to support interactive exploration of semantic word shifts uncovered by diachronic word embeddings over time. Providing an expressive and scalable overview of changes is highly challenging, as such diachronic studies are usually based on tens of thousands to millions of words.

Requirements

  • Strong interest in human-computer interaction, visualization, natural language processing, and history
  • Very good programming skills
  • Experience with web technologies (JavaScript, d3,...) and / or Python is strongly recommended

Environment

The goal is to develop an online interface that enables a broad audience to perform inspection of language changes. The target platform therefore is the web using visualization platforms like d3.js or three.js. Experience and code to visualize word embeddings using JavaScript and Python is available at the group and can be extended. Pre-trained historical word vectors are available online [4].

 

[1] https://arxiv.org/pdf/1605.09096.pdf

[2] https://projector.tensorflow.org/

[3] https://journals.sagepub.com/doi/pdf/10.1177/1473871611416549

[4] https://nlp.stanford.edu/projects/histwords/

Responsible

For more information please contact Manuela Waldner.