Word embeddings are high-dimensional vector spaces, where words or phrases are represented as vectors so that semantically similar words are in close proximity. Word embeddings can be visualized through dimensionality reduction. For example, Tensorflow’s Embedding Projector  shows word vectors trained from a large news corpus. However, such visualizations fail to convey a good overview of semantically related groups of words. Providing an expressive overview is challenging as such word embeddings usually consist of tens of thousands to millions of more or less frequently used words.
In this project, the student shall investigate if groups of semantically similar words in embeddings can be visualized as collage of expressive images. Based on an existing implementation, groups of semantically similar physical objects that can be depicted through images should be found, and images should be retrieved to represent these semantic groups.
For a master thesis, the users’ impressions and understandings of image-based word embedding visualizations should also be formally compared to classic word embedding visualizations in a user study.
- Strong interest in human-computer interaction, visualization, and natural language processing
- Very good programming skills