Journal

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art

Read more about A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity

Read more about Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity

Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish.

The objective of this work is to set a corpus-driven methodology to quantify automatically diachronic language distance between chronological periods of several languages. We apply a perplexity-based measure to written text representing different historical periods of three languages: European English, European Portuguese and European Spanish. For this purpose, we have built historical corpora for each period, which have been compiled from different open corpus sources containing texts as close as possible to its original spelling. The results of our experiments show that a diachronic

Read more about Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish.

Cross-lingual Diachronic Distance: Application to Portuguese and Spanish

Read more about Cross-lingual Diachronic Distance: Application to Portuguese and Spanish

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool

Lately, discourse structure has received considerable attention due to the benefits carried out by its application in several NLP task such as opinion mining, summarization, question answering, text simplification, among others.

Read more about Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool

Neural Machine Translation of clinical texts between long distance languages

ABSTRACT Objective: To analyze techniques for machine translation of electronic health records (EHRs) between long distance languages, using Basque and Spanish as a reference. We studied distinct conﬁgurations of neural machine translation systems and used different methods to overcome the lack of a bilingual corpus of clinical texts or health records in Basque and Spanish.

Read more about Neural Machine Translation of clinical texts between long distance languages

Multi-label clinical document classification: Impact of label-density

Read more about Multi-label clinical document classification: Impact of label-density

LINGUATEC: Desarrollo de recursos lingüı́sticos para avanzar en la digitalización de las lenguas de los Pirineos

El objetivo del proyecto es desarrollar, probar y difundir nuevos recursos, nuevas herramientas y aplicaciones lingüı́sticas innovadoras para mejorar el nivel de digitalización del aragonés, vasco y occitano.

Read more about LINGUATEC: Desarrollo de recursos lingüı́sticos para avanzar en la digitalización de las lenguas de los Pirineos

Languages

Who we are

What we do

Others

Spelling Normalisation of Basque Historical Texts

EUSKOR: End-to-end coreference resolution system for Basque