Gender Representation in German Plays#
by Sandra Densch-Glazov, Leonie Wichers, Kyung Yun Choi and Benedikt Schuh
Introduction#
The following Jupyter Notebook generates visualizations that provide a starting point to analyze gender relations and gender distribution in a selected drama from the German corpus of the DraCor dataset. DraCor is a showcase for the concept of Programmable Corpora. It revolves around an API that provides data extracted from our TEI-encoded corpora of plays in (mostly) European languages.
To generate the visualization, the following 4 code cells must be executed.
Step 0: Preparation#
In this code cell, the required libraries are imported, helper functions are defined and corpus metadata is requested from the API; nothing more needs to be done than to execute the cell.
import pandas as pd
import altair as alt
import ipywidgets as widgets
import networkx as nx
import nx_altair as nxa
from pydracor import *
import requests
def minmaxWords(list):
maxw = max(list)
minw = min(list)
return (minw, maxw)
def set_character_name_and_size(graphR, graphO):
words = nx.get_node_attributes(graphO,'Number of spoken words')
minWord, maxword = minmaxWords(words.values())
sumWords = sum(words.values())
for node_iterator in graphR.nodes:
node = graphR.nodes[node_iterator]
node['Name'] = node['label']
node['Spoken words'] = graphO.nodes[node_iterator]['Number of spoken words']
node['Size'] = graphO.nodes[node_iterator]['Number of spoken words']/maxword*200+25
node['Speech Percentage'] = round((node['Spoken words']/sumWords *100), 2)
return graphR
def relation_name_mapping():
relation = pd.DataFrame(
{'Relation': ['parent_of', 'lover_of', 'related_with', 'associated_with', 'siblings', 'spouses', 'friends']}
)
relation_name_mapping = {
'parent_of': 'Parent-child',
'lover_of': 'Lovers',
'related_with': 'Related',
'associated_with': 'Associated',
'siblings': 'Siblings',
'spouses': 'Spouses',
'friends': 'Friends'
}
relation['Relation_Display'] = relation['Relation'].map(relation_name_mapping)
return relation
def gender_name_mapping():
gender = pd.DataFrame({'Gender': ['MALE','FEMALE', 'UNKNOWN']})
gender_name_mapping = {
'MALE': 'Male',
'FEMALE': 'Female',
'UNKNOWN': 'Unknown',
}
gender['Gender_Display'] = gender['Gender'].map(gender_name_mapping)
return gender
def get_words_by_gender(nodes):
female_words = 0
male_words = 0
unknown_words = 0
for node_iterator in nodes:
node = nodes[node_iterator]
if node['Gender'] == 'FEMALE':
female_words += node['Number of spoken words']
elif node['Gender'] == 'MALE':
male_words += node['Number of spoken words']
elif node['Gender'] == 'UNKNOWN':
unknown_words += node['Number of spoken words']
return [male_words, female_words, unknown_words]
def chunked_title(title):
chunked_title = []
current_chunk = ""
for word in title.split():
if len(current_chunk) + len(word) <= 70:
current_chunk += f"{word} "
else:
chunked_title.append(current_chunk.strip())
current_chunk = f"{word} "
chunked_title.append(current_chunk.strip())
return chunked_title
german_corpus = Corpus('ger')
german_metadata = pd.DataFrame(german_corpus.metadata())
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 2
1 import pandas as pd
----> 2 import altair as alt
3 import ipywidgets as widgets
4 import networkx as nx
ModuleNotFoundError: No module named 'altair'
Step 1: Choose a play#
After the following code is executed, a dropdown menu appears below from which any drama can be selected.
You can also search for a specific drama by entering the first letters of the desired drama on the keyboard.
If the visualization has already been generated and you would like to select a different drama, select the appropriate drama from the dropdown menu and execute Steps 2 and 3 again.
dropdown_items = dict(zip(german_metadata['title'], german_metadata['id']))
dropdown_items = dict(sorted(dropdown_items.items()))
dropdown = widgets.Dropdown(
options=dropdown_items,
description='Select play:',
)
dropdown
Step 2: Request network metrics from API#
Now the network metrics to visualize the relationship network will be requested from the API. This may take a while.
play_id = dropdown.value
play = Play(play_id)
try:
relations_graphml = play.relations_graphml()
# networkX doesn't support mix of directed+undirected Graphs & nx_altair's arrows look broken
# workaround: make graph undirected
relations_graphml = relations_graphml.replace('directed="true"', 'directed="false"')
cooccurence_graphml = play.graphml()
except requests.HTTPError:
relations_graphml = None
cooccurence_graphml = None
print('The API does not contain a relationship network to visualize for this play. Please choose another one.')
Step 3: Visualize data#
After execution this code block generates the actual visualization about gender distribution and relations for the chosen play below the code block.
Important informations for working with the visualisation:
There are two filters on the left-hand side:
Gender Filter: Select the gender/s you would like to have displayed.
Relation Filter: Select relation/s you would like to have displayed.
Additional information in tooltips:
Node tooltip: shows name of character, number of spoken words and the percentage of speech
Pie chart tooltip: shows the number characters by gender and the resulting percentage of characters per gender
A circular layout is used for the character-relation network, because it results in an easier to read and better structured network. The arrangement of nodes solely relies on the order in which the characters are listed in the data source and does not encode any structures from the play.
if relations_graphml != None:
############################## Network Chart ##############################
# parse graphs
relations_graph = nx.parse_graphml(relations_graphml)
cooccurence_graph = nx.parse_graphml(cooccurence_graphml)
# add Name attribute to nodes
relations_graph = set_character_name_and_size(relations_graph, cooccurence_graph)
# define the graph layout
layout = nx.circular_layout(relations_graph)
# draw base graph with nx_altair
base = nxa.draw_networkx(
relations_graph,
pos=layout,
node_tooltip=['Name','Spoken words', 'Speech Percentage'],
node_color='lightgray',
edge_color='Relation',
node_size ='Size',
width=4
)
# get the edge layer
edges = base.layer[0]
# get the node layer
nodes = base.layer[1]
# define relation filter
relation = relation_name_mapping()
relation_selection = alt.selection_point(fields=['Relation'], toggle="true")
relation_color = alt.condition(
relation_selection,
alt.Color('Relation:N', legend=None),
alt.value('lightgray')
)
relation_filter = alt.Chart(
relation,
title=alt.TitleParams('Filter relation', anchor='start')
).mark_rect(cursor='pointer').encode(
y=alt.Y('Relation_Display', title=''),
color=relation_color
).add_params(relation_selection)
# encode relation as edge color and add relationship filter
edges = edges.encode(color=relation_color).transform_filter(relation_selection)
# define gender filter
gender = gender_name_mapping()
gender_selection = alt.selection_point(fields=['Gender'], toggle="true")
gender_color = alt.condition(
gender_selection,
alt.Color('Gender:N', legend=None),
alt.value('lightgray')
)
gender_shape = alt.Shape('Gender:N', legend=None)
gender_filter = alt.Chart(
gender,
title=alt.TitleParams('Filter gender', anchor='start')
).mark_point(
size=300,
cursor='pointer',
filled=True,
opacity=1
).encode(
y=alt.Y('Gender_Display', title=''),
color=gender_color,
shape=gender_shape
).add_params(gender_selection)
# encode gender as node shape+color and add gender filter
nodes = nodes.encode(
color=gender_color,
fill=gender_color,
shape=gender_shape
).add_params(gender_selection)
# layer network chart
network_chart = (edges + nodes).properties(
width=400,
height=400
)
network_chart_with_filters = ((gender_filter & relation_filter) | network_chart)
############################## Pie Charts ##############################
# count characters by gender
play_metadata = german_metadata[german_metadata["id"] == play_id].reset_index()
speakers = play_metadata[['num_of_speakers_male', 'num_of_speakers_female', 'num_of_speakers_unknown']]
numOfSpeakers = play_metadata.at[0,'num_of_speakers']
gender['Characters'] = speakers.loc[0,:].values.tolist()
gender['Percentage of Chracters'] =round(gender['Characters']/numOfSpeakers *100,2)
gender_distribution_pie_chart= alt.Chart(gender, title='Number of characters by gender').mark_arc().encode(
theta='Characters',
color=alt.Color('Gender:N', legend=None),
tooltip=['Characters','Percentage of Chracters']
).properties(
width=200,
height=200
)
# aggregate spoken words by gender
gender['Spoken words'] = get_words_by_gender(cooccurence_graph.nodes)
wordcountStage = play_metadata.at[0,'word_count_sp']
gender['Percentage of spoken words'] = round(gender['Spoken words']/wordcountStage*100, 2)
spoken_words_pie_chart = alt.Chart(gender, title='Number of spoken words by gender').mark_arc().encode(
theta='Spoken words',
color=alt.Color('Gender:N', legend=None),
tooltip=['Spoken words' ,'Percentage of spoken words']
).properties(
width=200,
height=200
)
stacked_pie_charts = (gender_distribution_pie_chart & spoken_words_pie_chart)
############################## Final Chart ##############################
title = chunked_title(f"Gender distribution and relations in \"{dropdown.label}\"")
final_chart = (network_chart_with_filters | stacked_pie_charts)
final_chart = final_chart.configure_view(
strokeWidth=0 # remove border
).configure_axis(
domainOpacity=0 # remove axis
).properties(
title=alt.TitleParams(
title,
anchor='middle',
fontSize=20
)
)
else:
final_chart = 'no visualization available'
final_chart
Source: German Drama Corpus provided by the Drama Corpus (DraCor) Project as of 08.03.2024. Licensed under CC0.
Fischer, Frank, et al. (2019). Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. In Proceedings of DH2019: “Complexities”, Utrecht University, doi:10.5281/zenodo.4284002.