Integrating Croatian into Concepticon: a Corpus-Based Frequency Mapping of Croatian Vocabulary
DOI:
https://doi.org/10.15475/calcip.2026.1.6Keywords:
Concepticon, Croatian, word frequency, datasetAbstract
This study presents a Croatian frequency-derived wordlist mapped to Concepticon concept sets, based on the most frequent nouns, verbs, and adjectives extracted from the hrWaC web corpus. The resulting dataset connects corpus-based Croatian vocabulary to Concepticon's cross-linguistic framework and includes lexicalizations from nine additional languages for each mapped item.
Downloads
Published
2026-06-08
Issue
Section
Data Note
License
Copyright (c) 2026 Copyright remains with the author.

This work is licensed under a Creative Commons Attribution 4.0 International License.
As a general rule, all articles in this journal are published with CC-BY Attribution 4.0 License.
How to Cite
Integrating Croatian into Concepticon: a Corpus-Based Frequency Mapping of Croatian Vocabulary. (2026). Computer-Assisted Language Comparison in Practice: Tutorials on Computational Approaches to the History and Diversity of Languages, 9(1), 55-64. https://doi.org/10.15475/calcip.2026.1.6