Dictionary of Croatian Idioms

A born-digital, corpus-driven, open-access dictionary of Croatian idioms

Dictionary Guide to using the dictionary Publication data

Project goal

The goal of the project is to create an open-access dictionary of Croatian idioms based on data from a large electronic corpus. The resulting dictionary will serve as a gateway for a large number of users and researchers to current idiomatic usage of Croatian.

Corpus

The dictionary is based on the Croatian web corpus hrWaC (1.2 billion words). Using a large electronic corpus to compile a dictionary is in line with one of the key principles of modern-day lexicography: we can obtain reliable linguistic data by observing language in use.

11 October

Filipović Petrović, Ivana; Kocijan, Kristina. “Creating the Dataset of Croatian Verbal Idioms – Automatic Identification in a Corpus and Lexicographic Implementation”

The 21st EURALEX International Congress Lexicography and Semantics, Cavtat, Croatia, 11 October 2024

11 October

Filipović Petrović, Ivana; Beliga, Slobodan. “Lexicographic treatment of idioms and large language models: what will rise to the surface?”

Workshop Large Language Models and Lexicography. The 21st EURALEX International Congress Lexicography and Semantics, Cavtat, Croatia, 8 October 2024

29 September

Parizoska, Jelena; Filipović Petrović, Ivana. “Poredbeni frazemi u korpusu i digitalnom frazeološkom rječniku [Similes in the corpus and online dictionary of idioms]”

International conference  Izzivi in priložnosti frazeologije, tudi v novem, digitalnem svetu, University of Ljubljana, Slovenia, 20 September 2024

28 September

Round table on the use of Large Language Models (LLMs) in corpus linguistic research

Ivana Filipović Petrović and Jelena Parizoska participated in a panel discussion on the use of Large Language Models (LLMs) in corpus linguistic research. The event was part of the pre-conference program at the Language Technologies and Digital Humanities Conference (JTDH 2024) held in Ljubljana on 18 September 2024.

28 March

CLASSLA-Express workshops

The CLASSLA-Express workshop series focused on using CLARIN.SI corpora in language research. The workshops were held between April and September 2024 in Zagreb, Rijeka, Belgrade, Skopje, Sofia and Ljubljana.
Website: https://www.clarin.si/info/k-centre/workshops/classla-express/
The CLASSLA-Express team: Ivana Filipović Petrović, Jelena Parizoska, Taja Kuzman and Nikola Ljubešić

19 June

Filipović Petrović, Ivana. 5th Summer Datathon on Linguistic Linked Open Data (June 11-16, 2023).

Ivana Filipović Petrović participated in the 5th Summer Datathon on Linguistic Linked Open Data. This edition was supported by the Nexus Linguarum COST Action. Ivana’s team received the Best Miniproject Award.