BIM/SAHCOBA corpus: Syntactically Annotated Historical Corpus in Basque

Please use the following text to cite this item or export to a predefined format:
Ainara Estarrona; Izaskun Etxeberria; Ricardo Etxepare; Ander Soraluze and Manuel Padilla-Moyano, 2026, BIM/SAHCOBA corpus: Syntactically Annotated Historical Corpus in Basque, Dspace HiTZ Zentroa, https://hdl.handle.net/20.500.14614/43.
Date issued
2026-06-17
Size
600000 tokens
Description
Basque in the Making (BIM): A Historical Look at a European Language Isolate and Syntactically Annotated Historical Corpus in Basque (SAHCOBA) are two projects for the construction of a morphosyntactically annotated historical corpus of Basque. This corpus will comprise both part-of-speech and syntactic annotation, and a rich set of metadata structure. Our database will allow us to search the annotated corpus by words, lemmas, grammatical categories, by sequences of grammatical categories, and by specific structural configurations. The BIM project aims to collect the most significant works from the 15th century to the mid 18th century (Archaic and Old Basque), while the SAHCOBA project aims to extend this corpus from the mid 18th century to the mid 20th century (Early and Late Modern Basque) when standard Basque appeared. BIM and SAHCOBA are interdisciplinary projects, where experts on Linguistics and Natural Language Processing take part.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
BZENTROA-BIM.zip
Size
2.79 MB
Format
application/zip
Description
MD5
f7f1fb550c7c5a0d5548e114fecb2dad
Preview
  File Preview