Dictionary of Croatian Idioms

A born-digital, corpus-driven, open-access dictionary of Croatian idioms

Dictionary Guide to using the dictionary Publication data

Project goal

The goal of the project is to create an open-access dictionary of Croatian idioms based on data from a large electronic corpus. The resulting dictionary will serve as a gateway for a large number of users and researchers to current idiomatic usage of Croatian.

Corpus

The dictionary is based on the Croatian web corpus hrWaC (1.2 billion words). Using a large electronic corpus to compile a dictionary is in line with one of the key principles of modern-day lexicography: we can obtain reliable linguistic data by observing language in use.

18 April

CLASSLA-Express 2.0: Workshops on using CLARIN.SI corpora, resources and AI tools in language research

The workshops focus on testing how large language models (LLMs) perform linguistic tasks, and determining the tasks for which the traditional corpus-based research methods are more suitable.
The workshops will be held between April and November 2025 in Klagenfurt, Zagreb, Graz, Rijeka and Bled.
Website: https://www.clarin.si/info/k-centre/workshops/classla-express/
The CLASSLA-Express team: Ivana Filipović Petrović, Jelena Parizoska, Taja Kuzman and Nikola Ljubešić

17 April

Corpus and AI-based research of Phraseme Constructions

Zagreb Training School
Convenors: Jelena Parizoska and Ivana Filipović Petrović
Faculty of Humanities and Social Sciences, University of Zagreb, 15 April 2025
COST Action 22115 – A Multilingual Repository of Phraseme Constructions in Central and Eastern European Languages (PhraConRep)

11 October

Filipović Petrović, Ivana; Beliga, Slobodan. “Lexicographic treatment of idioms and large language models: what will rise to the surface?”

Workshop Large Language Models and Lexicography. The 21st EURALEX International Congress Lexicography and Semantics, Cavtat, Croatia, 8 October 2024

29 September

Parizoska, Jelena; Filipović Petrović, Ivana. “Poredbeni frazemi u korpusu i digitalnom frazeološkom rječniku [Similes in the corpus and online dictionary of idioms]”

International conference  Izzivi in priložnosti frazeologije, tudi v novem, digitalnem svetu, University of Ljubljana, Slovenia, 20 September 2024

28 September

Round table on the use of Large Language Models (LLMs) in corpus linguistic research

Ivana Filipović Petrović and Jelena Parizoska participated in a panel discussion on the use of Large Language Models (LLMs) in corpus linguistic research. The event was part of the pre-conference program at the Language Technologies and Digital Humanities Conference (JTDH 2024) held in Ljubljana on 18 September 2024.

19 October

Ljubešić, Nikola; Kuzman, Taja; Filipović Petrović, Ivana; Parizoska, Jelena; Osenova, Petya. “CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route”

CLARIN Annual Conference 2024, Barcelona, Spain, 16 October 2024