Integrating Croatian into Concepticon: a Corpus-Based Frequency Mapping of Croatian Vocabulary

Authors

DOI:

https://doi.org/10.15475/calcip.2026.1.6

Keywords:

Concepticon, Croatian, word frequency, dataset

Abstract

This study presents a Croatian frequency-derived wordlist mapped to Concepticon concept sets, based on the most frequent nouns, verbs, and adjectives extracted from the hrWaC web corpus. The resulting dataset connects corpus-based Croatian vocabulary to Concepticon's cross-linguistic framework and includes lexicalizations from nine additional languages for each mapped item.

Downloads

Published

2026-06-08

How to Cite

Integrating Croatian into Concepticon: a Corpus-Based Frequency Mapping of Croatian Vocabulary. (2026). Computer-Assisted Language Comparison in Practice: Tutorials on Computational Approaches to the History and Diversity of Languages, 9(1), 55-64. https://doi.org/10.15475/calcip.2026.1.6