Sintaxia-Morfosintaxia

Testu-corpusen informazio morfosintaktikoaren etiketatze automatikoa hizkuntz ezagutzan oinarriutz: zenbait arazo, hainbat erronka

Maila morfosintaktikoan etiketatutako euskarazko corpusen desanbiguatze-lanetan urtetan aritu ondoren, bide horretan topatutako hainbat zailtasunen berri emango dugu artikulu honetan eta, horrekin batera, hainbat irizpide birplanteatzeko beharra ere azalduko dugu. Testuingurua hizkuntzalaritza konputazionala izanik, guk erabilitako metodologia erregeletan oinarritutako gramatikena da, hau da, informazio linguistikoa baliatuz aurrera eramaten dena.

Moreus+: Word Parsing in Basque beyond Morphological Segmentation

This work describes the formalization of a word structure grammar that represents the complex morphological and morphosyntactic information embedded within the word forms of an agglutinative language (Basque), giving a comprehensive linguistic description of the main morphological phenomena, such as affixation, derivation, and composition, and also taking into account the modeling of both standard and non standard words. We have identified the relevant issues to be addressed in the representation of such a grammar.

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study

This is a summary of the PhD thesis written by Uxoa Iñurrieta under the supervision of Dr. Gorka Labaka and Dr. Itziar Aduriz. Full title of the PhD thesis in Basque: "Izena+aditza Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala". The defense was held in San Sebastian on November 29, 2019. The doctoral committee was integrated by Ricardo Etxepare (Centre National de la Recherche Scientifique), Margarita Alonso (Universidad de Coruña) and Miren Azkarate (University of the Basque Country).

Annotation guidelines for the Fact-Ita Bank Negation corpus

Fact-Ita Bank for FactA@EVALITA 2016 has been enriched with a new level of annotation, namely negation cues, their scope and their focus. Here we present the guidelines for negation information annotation.

Aditza+izena Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala // Verb+Noun Multiword Expressions: A linguistic analysis for identification and translation

Unitate Fraseologikoak (UFak) hizkuntzek bere-bereak dituzten hitz-konbinazio idiomatikoak dira. Hizkuntzaren Prozesamenduko (HPko) tresnek kalitatezko emaitzak izan ditzaten, beharrezkoa da halakoak ondo tratatzea, baina lan horrek hainbat zailtasun ditu; besteak beste, hitzez hitzeko itzulgarritasun eza. Tesi-lan honetan, aditza+izena motako UFen azterketa linguistiko bat egin dugu, halakoek HPren alorrean sortzen dituzten bi arazo garrantzitsuri aurre egiten laguntzeko: batetik, corpusetan UFak automatikoki identifikatzeari, eta bestetik, UF horiek gaztelaniaren eta euskararen

The DISRPT 2019 Shared Task on Elementary Discourse UnitSegmentation and Connective Detection

In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection. In this paper we review the data included in the task, which cover 2.6 million manually annotated tokens from 15 datasets in 10 languages, survey and compare submit-ted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

Literal occurrences of Multiword Expressions: rare birds that cause a stir

Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank.

Unitate Fraseologikoen agerpen literalak, urre baina urri

Unitate fraseologiko asko idiomatikoki eta literalki uler daitezke. Esate baterako, ziria sartzeak bi esanahi
izan ditzake testuinguruaren arabera: norbaiti iruzur egitea edo nonbait ziri bat sartzea literalki. Lan honetan,
corpusetan oinarritutako azterketa eleaniztun baten berri emango dugu, eta erakutsiko dugu, batetik, halako
hitz-konbinazioak oso gutxitan erabiltzen direla literalki praktikan, eta bestetik, idiomatiko-literal bereizketa

Orriak

RSS - Sintaxia-Morfosintaxia-rako harpidetza egin