A born-digital, corpus-driven, open-access dictionary of Croatian idioms
Dictionary Guide to using the dictionary Publication dataThe goal of the project is to create an open-access dictionary of Croatian idioms based on data from a large electronic corpus. The resulting dictionary will serve as a gateway for a large number of users and researchers to current idiomatic usage of Croatian.
The dictionary is based on the Croatian web corpus hrWaC (1.2 billion words). Using a large electronic corpus to compile a dictionary is in line with one of the key principles of modern-day lexicography: we can obtain reliable linguistic data by observing language in use.
Filipović Petrović, Ivana; Kocijan, Kristina. “Creating the Dataset of Croatian Verbal Idioms – Automatic Identification in a Corpus and Lexicographic Implementation”
The 21st EURALEX International Congress Lexicography and Semantics, Cavtat, Croatia, 11 October 2024
Filipović Petrović, Ivana; Beliga, Slobodan. “Lexicographic treatment of idioms and large language models: what will rise to the surface?”
Workshop Large Language Models and Lexicography. The 21st EURALEX International Congress Lexicography and Semantics, Cavtat, Croatia, 8 October 2024
Parizoska, Jelena; Filipović Petrović, Ivana. “Poredbeni frazemi u korpusu i digitalnom frazeološkom rječniku [Similes in the corpus and online dictionary of idioms]”
International conference Izzivi in priložnosti frazeologije, tudi v novem, digitalnem svetu, University of Ljubljana, Slovenia, 20 September 2024
Round table on the use of Large Language Models (LLMs) in corpus linguistic research
Ivana Filipović Petrović and Jelena Parizoska participated in a panel discussion on the use of Large Language Models (LLMs) in corpus linguistic research. The event was part of the pre-conference program at the Language Technologies and Digital Humanities Conference (JTDH 2024) held in Ljubljana on 18 September 2024.
CLASSLA-Express workshops
The CLASSLA-Express workshop series focused on using CLARIN.SI corpora in language research. The workshops were held between April and September 2024 in Zagreb, Rijeka, Belgrade, Skopje, Sofia and Ljubljana.
Website: https://www.clarin.si/info/k-centre/workshops/classla-express/
The CLASSLA-Express team: Ivana Filipović Petrović, Jelena Parizoska, Taja Kuzman and Nikola Ljubešić
Filipović Petrović, Ivana. 5th Summer Datathon on Linguistic Linked Open Data (June 11-16, 2023).
Ivana Filipović Petrović participated in the 5th Summer Datathon on Linguistic Linked Open Data. This edition was supported by the Nexus Linguarum COST Action. Ivana’s team received the Best Miniproject Award.