Language models are trained on large text corpora that often include stereotypes. This can lead to direct or indirect bias in downstream applications. We present a method for interactive visual exploration of indirect multiclass bias learned by contextual word embeddings. We introduce a new indirect bias quantification score and present two interactive visualizations to explore interactions between multiple non-sensitive concepts (such as sports, occupations, and beverages) and sensitive attributes (such as gender or year of birth) based on this score.
You can try both visualizations online. Colors show the predicted association between each target and each attribute (orange = strong positive association, purple = strong negative association, white = no association).
Please be patient! The language models take some time to load.
The table view shows the predicted associations between a selected target category and sensitive or non-sensitive attributes. Table sorting can be used to interactively explore direct and indirect bias.
- To sort attributes according to their predicted association with a target category (e.g., occupation "homemaker"), click the column title once.
- To additionally sort the targets (columns) according to their similarity to the selected target category in terms of the shown attribute, click the column title a second time.
- This also works for attributes (rows).
- Table sorting is persistent. When changing targets or attributes, the sorting remains active.
The scatterplot view shows the similarity of a selected target category with respect to a selected sensitive or non-sensitive attribute. Color coding can be changed interactively to explore how attribute levels interact with the selected target category.
- Each dot represents one category of the selected target (e.g., a type of sports). Target categories are revealed upon mouse hover.
- The proximity of two target items represents their similarity based on the selected attribute (e.g., traits associated with the people doing these sports).
- The color represents the association of each target item with a selected attribute level (e.g., female).
For more information, please contact Manuela Waldner.