A Surface-Syntactic UD Treebank for Naija - Archive ouverte HAL Access content directly
Conference Papers Year :

A Surface-Syntactic UD Treebank for Naija

(1) , (2) , (2, 3) , (4)


This paper presents a syntactic treebank for spoken Naija, an English pidgincreole, which is rapidly spreading across Nigeria. The syntactic annotation is developed in the Surface-Syntactic Universal Dependency annotation scheme (SUD) (Gerdes et al., 2018) and automatically converted into UD. We present the workflow of the treebank development for this under-resourced language. A crucial step in the syntactic analysis of a spoken language consists in manually adding a markup onto the transcription, indicating the segmentation into major syntactic units and their internal structure. We show that this so-called "macrosyntactic" markup improves parsing results. We also study some iconic syntactic phenomena that clearly distinguish Naija from English.
Fichier principal
Vignette du fichier
syntaxfest.A Surface-Syntactic UD Treebank for Naija.pdf (438.99 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-02270530 , version 1 (25-08-2019)


  • HAL Id : hal-02270530 , version 1


Bernard Caron, Marine Courtin, Kim Gerdes, Sylvain Kahane. A Surface-Syntactic UD Treebank for Naija. TLT 2019, Treebanks and Linguistic Theories, Syntaxfest, Aug 2019, Paris, France. ⟨hal-02270530⟩
375 View
195 Download


Gmail Facebook Twitter LinkedIn More