Word Embeddings Visualized as Image Collages

Student Project
Master Thesis
1

Description

Word embeddings are high-dimensional vector spaces, where words or phrases are represented as vectors so that semantically similar words are in close proximity. Word embeddings can be visualized through dimensionality reduction. For example, Tensorflow’s Embedding Projector [1] shows word vectors trained from a large news corpus. However, such visualizations fail to convey a good overview of semantically related groups of words. Providing an expressive overview is challenging as such word embeddings usually consist of tens of thousands to millions of more or less frequently used words.

Tasks

In this project, the student shall investigate if groups of semantically similar words in embeddings can be visualized as collage of expressive images. Based on an existing implementation, groups of semantically similar physical objects that can be depicted through images should be found, and images should be retrieved to represent these semantic groups.

For a master thesis, the users’ impressions and understandings of image-based word embedding visualizations should also be formally compared to classic word embedding visualizations in a user study.

Requirements

  • Strong interest in human-computer interaction, visualization, and natural language processing
  • Very good programming skills
  • Experience with web technologies (JavaScript, d3,...) and / or Python is an advantage

Environment

The target platform is the web using visualization platforms like d3.js or three.js. Experience and code to visualize word embeddings using JavaScript and Python is available at the group and can be extended.

 

 [1] https://projector.tensorflow.org/

Responsible

For more information please contact Manuela Waldner.