Aldizkaria

Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case

Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people’s behaviour based on their tweets’ contents and the social relations that arise from them.

Gehiago irakurriLarge Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case -ri buruz

Measuring the Effect of Different Types of Unsupervised Word Representations on Medical Named Entity Recognition

Gehiago irakurriMeasuring the Effect of Different Types of Unsupervised Word Representations on Medical Named Entity Recognition -ri buruz

LIHLITH: Learning to Interact with Humans by Lifelong Interaction with Humans

Gehiago irakurriLIHLITH: Learning to Interact with Humans by Lifelong Interaction with Humans -ri buruz

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages

This paper presents a successful domain adaptation of a general neural machine translation (NMT) system using a bilingual corpus created with captions for images in Wiki- media Commons for the Spanish-Basque and English-Irish pairs. Keywords: Machine Translation, Low-resource languages, Bilingual corpora, Language resources from Wikipedia

Gehiago irakurriAdapting NMT to caption translation in Wikimedia Commons for low-resource languages -ri buruz

Interpretable Deep Learning to Map Diagnostic Texts to ICD10 Codes

Background Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). Objective

Gehiago irakurriInterpretable Deep Learning to Map Diagnostic Texts to ICD10 Codes -ri buruz

Survey on Evaluation Methods for Dialogue Systems

Gehiago irakurriSurvey on Evaluation Methods for Dialogue Systems -ri buruz

Smoothing dense spaces for improved relation extraction between drugs and adverse reactions

Gehiago irakurriSmoothing dense spaces for improved relation extraction between drugs and adverse reactions -ri buruz

Literal occurrences of Multiword Expressions: rare birds that cause a stir

Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank.