
Measuring Language Distance of Isolated European Languages

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study

This is a summary of the PhD thesis written by Uxoa Iñurrieta under the supervision of Dr. Gorka Labaka and Dr. Itziar Aduriz. Full title of the PhD thesis in Basque: "Izena+aditza Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala". The defense was held in San Sebastian on November 29, 2019. The doctoral committee was integrated by Ricardo Etxepare (Centre National de la Recherche Scientifique), Margarita Alonso (Universidad de Coruña) and Miren Azkarate (University of the Basque Country).

Teknologia, testuinguru digitala eta konpetentzia digitalak hezkuntzan

Teknologiaren garapenak ez du etenik. Badirudi hainbat motako datuen bilketa (eta hein batean jakintza) negozio bihurtu dela eta enpresa handien eta pribatuen esku nabarmen geratzen ari dela. Datuen bilketa eta garapen mota horrek gure identitate digitala (eta bestelakoa) arriskuan jar dezake eta oro har arrakala digitala areagotu egin du, eremu publikoaren edo jendartearen esku dauden aukerak eta baliabideak murrizten direlako.

A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity

The aim of this paper is to apply a corpus-based methodology, based on the measure of perplexity, to automatically calculate the cross-lingual language distance between historical periods of three languages. The three historical corpora have been constructed and collected with the closest spelling to the original on a balanced basis of fiction and non-fiction.


