SPAR.txt, a cheap shallow parsing approach for regulatory texts

Ruben Kruiper*, Ioannis Konstas, Alasdair Gray, Farhad Sadeghineko, Richard Watson, Bimal Kumar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)

Abstract

Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).

Original languageEnglish
Title of host publicationProceedings of the Natural Legal Language Processing Workshop 2021
EditorsNikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro
Pages129-143
Number of pages15
ISBN (Electronic)9781954085985
DOIs
Publication statusPublished - 10 Nov 2021
Externally publishedYes
Event3rd Natural Legal Language Processing Workshop - Punta Cana, Dominican Republic
Duration: 10 Nov 202110 Nov 2021
https://nllpw.org/workshop/nllp-2021/ (Link to event website)

Workshop

Workshop3rd Natural Legal Language Processing Workshop
Abbreviated titleNLLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period10/11/2110/11/21
Internet address

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'SPAR.txt, a cheap shallow parsing approach for regulatory texts'. Together they form a unique fingerprint.

Cite this