Words in a Play#

installing dependencies#

Installing Spacy (a Python module for natural language processing) and its dependencies is a bit arduous but should work as follows:

!pip install spacy pydracor
#  !pip install spacy-transformers # first check if really required → has many dependencies
Requirement already satisfied: spacy in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (3.7.4)
Collecting pydracor
  Using cached pydracor-2.0.0-py3-none-any.whl.metadata (8.0 kB)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (8.2.3)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (0.9.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (4.66.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (2.6.3)
Requirement already satisfied: jinja2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (3.1.3)
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (65.5.0)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (24.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (3.3.0)
Requirement already satisfied: numpy>=1.19.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy) (1.26.4)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.16.3)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2024.2.2)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from thinc<8.3.0,>=8.2.2->spacy) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from thinc<8.3.0,>=8.2.2->spacy) (0.1.4)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from typer<0.10.0,>=0.3.0->spacy) (8.1.7)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from jinja2->spacy) (2.1.5)
Using cached pydracor-2.0.0-py3-none-any.whl (19 kB)
Installing collected packages: pydracor
Successfully installed pydracor-2.0.0

afterwards (see hints on selection of models):

!python -m spacy download en_core_web_sm
Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
?25l     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.8 MB ? eta -:--:--
     ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/12.8 MB 36.8 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━ 5.0/12.8 MB 72.9 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 10.9/12.8 MB 132.9 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 12.8/12.8 MB 156.7 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 105.5 MB/s eta 0:00:00
?25h
Requirement already satisfied: spacy<3.8.0,>=3.7.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from en-core-web-sm==3.7.1) (3.7.4)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.3)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.66.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.6.3)
Requirement already satisfied: jinja2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.1.3)
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (65.5.0)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (24.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)
Requirement already satisfied: numpy>=1.19.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.26.4)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.16.3)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2024.2.2)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.1.7)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.1.5)
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.7.1
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

load play from DraCor#

import pydracor

play = pydracor.Play(play_name = "a-midsummer-night-s-dream")
play.spoken_text()
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[3], line 3
      1 import pydracor
----> 3 play = pydracor.Play(play_name = "a-midsummer-night-s-dream")
      4 play.spoken_text()

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:1022, in Play.__init__(self, play_id, play_name, play_title)
   1020     self.title = self.play_id_to_play_title()[self.id]
   1021 elif play_name is not None:
-> 1022     play_names = list(self.play_id_to_play_name().values())
   1023     assert play_name in play_names, f"No such play_name {play_name} in the corpora"
   1024     if play_names.count(play_name) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:390, in DraCor.play_id_to_play_name(self)
    377 @lru_cache()
    378 def play_id_to_play_name(self):
    379     """Map play id to the play name.
    380 
    381     Returns
   (...)
    387         }
    388     """
--> 390     return self.play_id_to_field('name')

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:322, in DraCor.play_id_to_field(self, field)
    309 @lru_cache()
    310 def play_id_to_field(self, field):
    311     """Map play id to the field value.
    312 
    313     Returns
   (...)
    319         }
    320     """
--> 322     return {
    323         play['id']: play[field]
    324         for corpus_name in self.corpora_names()
    325         for play in Corpus(corpus_name).corpus_info()['plays']
    326     }

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:325, in <dictcomp>(.0)
    309 @lru_cache()
    310 def play_id_to_field(self, field):
    311     """Map play id to the field value.
    312 
    313     Returns
   (...)
    319         }
    320     """
    322     return {
    323         play['id']: play[field]
    324         for corpus_name in self.corpora_names()
--> 325         for play in Corpus(corpus_name).corpus_info()['plays']
    326     }

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:538, in Corpus.__init__(self, corpus_name)
    524 def __init__(self, corpus_name):
    525     """Set corpusname, title, repository url and number of plays attributes from corpus_info method.
    526 
    527     Parameters
   (...)
    535         If there is no such corpus_name
    536     """
--> 538     assert corpus_name in self.corpora_names(), f'No such corpusname "{corpus_name}"'
    539     super().__init__()
    540     self.corpus_name = corpus_name

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:186, in DraCor.corpora_names(self)
    176 @lru_cache()
    177 def corpora_names(self):
    178     """Get all available corpora names.
    179 
    180     Returns
   (...)
    183         ['cal', ...]
    184     """
--> 186     return [corpus['name'] for corpus in self.corpora()]

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:174, in DraCor.corpora(self, include)
    140 """List available corpora.
    141 
    142 Get info about the corpora of Drama Corpus.
   (...)
    170     If include parameter is not equal either 'metrics' or ''
    171 """
    173 assert include in ['', 'metrics'], "Include parameter should be either 'metrics' or ''"
--> 174 return self.make_get_json_request(f"{self._base_url}/corpora/?include={include}")

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pydracor/dracor.py:53, in DraCor.make_get_json_request(self, url)
     35 def make_get_json_request(self, url):
     36     """Base method to send GET request and retrieve json from response.
     37 
     38     Parameters
   (...)
     50         Client or Server Error can be raised.
     51     """
---> 53     response = requests.get(url)
     54     response.raise_for_status()
     55     result = self.transform_dict(response.json())

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/requests/api.py:73, in get(url, params, **kwargs)
     62 def get(url, params=None, **kwargs):
     63     r"""Sends a GET request.
     64 
     65     :param url: URL for the new :class:`Request` object.
   (...)
     70     :rtype: requests.Response
     71     """
---> 73     return request("get", url, params=params, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/requests/api.py:59, in request(method, url, **kwargs)
     55 # By using the 'with' statement we are sure the session is closed, thus we
     56 # avoid leaving sockets open which can trigger a ResourceWarning in some
     57 # cases, and look like a memory leak in others.
     58 with sessions.Session() as session:
---> 59     return session.request(method=method, url=url, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/requests/adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    483     timeout = TimeoutSauce(connect=timeout, read=timeout)
    485 try:
--> 486     resp = conn.urlopen(
    487         method=request.method,
    488         url=url,
    489         body=request.body,
    490         headers=request.headers,
    491         redirect=False,
    492         assert_same_host=False,
    493         preload_content=False,
    494         decode_content=False,
    495         retries=self.max_retries,
    496         timeout=timeout,
    497         chunked=chunked,
    498     )
    500 except (ProtocolError, OSError) as err:
    501     raise ConnectionError(err, request=request)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/connectionpool.py:793, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    790 response_conn = conn if not release_conn else None
    792 # Make the request on the HTTPConnection object
--> 793 response = self._make_request(
    794     conn,
    795     method,
    796     url,
    797     timeout=timeout_obj,
    798     body=body,
    799     headers=headers,
    800     chunked=chunked,
    801     retries=retries,
    802     response_conn=response_conn,
    803     preload_content=preload_content,
    804     decode_content=decode_content,
    805     **response_kw,
    806 )
    808 # Everything went great!
    809 clean_exit = True

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    464 try:
    465     # Trigger any extra validation we need to do.
    466     try:
--> 467         self._validate_conn(conn)
    468     except (SocketTimeout, BaseSSLError) as e:
    469         self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/connectionpool.py:1099, in HTTPSConnectionPool._validate_conn(self, conn)
   1097 # Force connect early to allow us to validate the connection.
   1098 if conn.is_closed:
-> 1099     conn.connect()
   1101 # TODO revise this, see https://github.com/urllib3/urllib3/issues/2791
   1102 if not conn.is_verified and not conn.proxy_is_verified:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/connection.py:653, in HTTPSConnection.connect(self)
    650 # Remove trailing '.' from fqdn hostnames to allow certificate validation
    651 server_hostname_rm_dot = server_hostname.rstrip(".")
--> 653 sock_and_verified = _ssl_wrap_socket_and_match_hostname(
    654     sock=sock,
    655     cert_reqs=self.cert_reqs,
    656     ssl_version=self.ssl_version,
    657     ssl_minimum_version=self.ssl_minimum_version,
    658     ssl_maximum_version=self.ssl_maximum_version,
    659     ca_certs=self.ca_certs,
    660     ca_cert_dir=self.ca_cert_dir,
    661     ca_cert_data=self.ca_cert_data,
    662     cert_file=self.cert_file,
    663     key_file=self.key_file,
    664     key_password=self.key_password,
    665     server_hostname=server_hostname_rm_dot,
    666     ssl_context=self.ssl_context,
    667     tls_in_tls=tls_in_tls,
    668     assert_hostname=self.assert_hostname,
    669     assert_fingerprint=self.assert_fingerprint,
    670 )
    671 self.sock = sock_and_verified.socket
    673 # Forwarding proxies can never have a verified target since
    674 # the proxy is the one doing the verification. Should instead
    675 # use a CONNECT tunnel in order to verify the target.
    676 # See: https://github.com/urllib3/urllib3/issues/3267.

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/connection.py:806, in _ssl_wrap_socket_and_match_hostname(sock, cert_reqs, ssl_version, ssl_minimum_version, ssl_maximum_version, cert_file, key_file, key_password, ca_certs, ca_cert_dir, ca_cert_data, assert_hostname, assert_fingerprint, server_hostname, ssl_context, tls_in_tls)
    803     if is_ipaddress(normalized):
    804         server_hostname = normalized
--> 806 ssl_sock = ssl_wrap_socket(
    807     sock=sock,
    808     keyfile=key_file,
    809     certfile=cert_file,
    810     key_password=key_password,
    811     ca_certs=ca_certs,
    812     ca_cert_dir=ca_cert_dir,
    813     ca_cert_data=ca_cert_data,
    814     server_hostname=server_hostname,
    815     ssl_context=context,
    816     tls_in_tls=tls_in_tls,
    817 )
    819 try:
    820     if assert_fingerprint:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/util/ssl_.py:465, in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
    462 except NotImplementedError:  # Defensive: in CI, we always have set_alpn_protocols
    463     pass
--> 465 ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
    466 return ssl_sock

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/urllib3/util/ssl_.py:509, in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
    506     SSLTransport._validate_ssl_context_for_tls_in_tls(ssl_context)
    507     return SSLTransport(sock, ssl_context, server_hostname)
--> 509 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/ssl.py:517, in SSLContext.wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    511 def wrap_socket(self, sock, server_side=False,
    512                 do_handshake_on_connect=True,
    513                 suppress_ragged_eofs=True,
    514                 server_hostname=None, session=None):
    515     # SSLSocket class handles server_hostname encoding before it calls
    516     # ctx._wrap_socket()
--> 517     return self.sslsocket_class._create(
    518         sock=sock,
    519         server_side=server_side,
    520         do_handshake_on_connect=do_handshake_on_connect,
    521         suppress_ragged_eofs=suppress_ragged_eofs,
    522         server_hostname=server_hostname,
    523         context=self,
    524         session=session
    525     )

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/ssl.py:1104, in SSLSocket._create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1101             if timeout == 0.0:
   1102                 # non-blocking
   1103                 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1104             self.do_handshake()
   1105 except:
   1106     try:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/ssl.py:1382, in SSLSocket.do_handshake(self, block)
   1380     if timeout == 0.0 and block:
   1381         self.settimeout(None)
-> 1382     self._sslobj.do_handshake()
   1383 finally:
   1384     self.settimeout(timeout)

KeyboardInterrupt: 

tokenise text and detect parts of speech#

import spacy
nlp = spacy.load("en_core_web_sm")

import en_core_web_sm
nlp = en_core_web_sm.load()

doc = nlp(play.spoken_text())

text and parts of speech#

print([(w.text, w.pos_, w.lemma_) for w in doc])

most frequent parts of speech#

from collections import Counter

def count_words(doc, word_type):
    cnt = Counter()
    for w in doc:
        if w.pos_ == word_type:
            cnt[w.lemma_] += 1          # better than w.text
    return cnt

def print_top(words, n):
    for w, cnt in words.most_common(n):
        print(cnt, w, sep='\t')

word_types = Counter()
for w in doc:
    word_types[w.pos_] += 1
    
print_top(word_types, 10)

most frequent words per part of speech#

for word_type in ["NOUN", "VERB", "ADJ"]:
    print("---", word_type, "---")
    print_top(count_words(doc, word_type), 10)