Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish.

The objective of this work is to set a corpus-driven methodology to quantify automatically diachronic language distance
between chronological periods of several languages. We apply a perplexity-based measure to written text representing
different historical periods of three languages: European English, European Portuguese and European Spanish. For this
purpose, we have built historical corpora for each period, which have been compiled from different open corpus sources
containing texts as close as possible to its original spelling. The results of our experiments show that a diachronic
language distance based on perplexity detects the linguistic evolution that had already been explained by the historians
of the three languages. It is remarkable to underline that it is a unsupervised multilingual method which only needs a
raw corpora organized by periods

Authors (IXA members): 
Authors: 
José Ramom Pichel, Pablo Gamallo, Iñaki Alegria

Publication topic:

Year: 
2019
Publication place: 

Natural Language Engeenering

ISBN: 
ISSN 1351-3249 (Print), 1469-8110 (Online)

Publication type:

Publication clasification:

Journal evaluation:

HiTZeko zein jakintza arlotako argitalpena izango litzazteke?: