Project summary

The goal of the project is to create an open-access online dictionary of Croatian idioms based on corpus data. The resulting dictionary will serve as a gateway to current Croatian idiomatic usage for a large number of users – native speakers and learners of Croatian as well as researchers.

The dictionary is based on the Croatian web corpora hrWaC (1.2 billion words) and CLASSLA-web.hr (2.1 billion words), searched using the Sketch Engine. Using corpora to compile dictionaries is in line with one the key principles of modern-day lexicography: we can obtain reliable data by observing language in use. The Online Dictionary of Croatian Idioms is a corpus-driven dictionary: it represents what we have learned about idioms, their meanings and usage in the two corpora. The dictionary has been compiled by searching for co-occurrences of words that have figurative meanings. Two main criteria for inclusion of an idiom and its variant forms were applied: how typical a word combination is and how frequently it occurs in corpora.

The Online Dictionary of Croatian Idioms has been compiled in Lexonomy, a dictionary-writing and publishing system. The dictionary boasts a number of features characteristic of e-lexicography. Thus, each idiom is a headphrase in its own right and the entries are arranged alphabetically according to the first word of the expression. On entering a specific word in the search box, the search yields all the idioms containing it. Furthermore, each idiom can be found by typing any of its components. Entries may also be accessed via hyperlinks.

In addition to the standard dictionary features, the Online Dictionary of Croatian Idioms provides its users with real usage in three novel ways: (1) text boxes with detailed explanations of usage, 2) text boxes featuring the ways in which speakers play with idioms and change them deliberately, 3) cross-references to other idioms with similar and/or opposite meanings in some entries.

The Online Dictionary of Croatian Idioms is part of ELEXIS – European Lexicographic Infrastructure. The project was funded by the Foundation of the Croatian Academy of Sciences and Arts, the Zergollern – Čupak Foundation, and the Ministry of Science and Education of the Republic of Croatia.