skip to main content
European Commission Logo
en English
Newsroom
Overview    News

European Statistics Awards on Web Intelligence

The first European Statistical Awards in web intelligence have paved the way for more accuracy of WIH data. With more than 112,000 online job advertisements analyzed by participants from 17 countries, the competition tackled the important challenge of data deduplication, to avoid double counting of the same online job advertisements scraped from different web portals. Congratulations to our winners, who demonstrated exceptional analytical skills. Stay tuned for more updates and help spread the word about the competition.

date:  23/04/2024

Eurostat has recently announced the results of the first European Statistics Awards on web intelligence, showcasing interesting advancements in online data analytics. In response to the Online Job Advertisement (OJA) deduplication challenge, the competition’s participants proposed innovative methods helping to improve the quality of statistics based on web data. 

The competition’s overall objective is to foster innovation in various data domains and of various data production stages. The web intelligence-focused challenge addressed the critical issue of identifying duplicate job advertisements to avoid the double-counting of identical job postings. Competitors used algorithms in R and Python to analyze a complex, multilingual dataset of over 112,000 job advertisements, from approximately 400 European websites, enriched with intentional duplicates across languages to simulate real-world scenarios. 

The interest in the competition was impressive, with 69 teams and 137 individuals from 17 countries participating in European Statistics Awards on Web Intelligence. The rewards for the best teams solving the difficult data processing challenge were threefold: Accuracy, for those who identified the largest number of duplicates; AccuracyPlus, for the discovery of extra cases of potential duplicates (going beyond intentional duplicates); and Reproducibility, for solutions most suitable for regular data production. 

Our winners, representing countries across Europe, demonstrated exceptional skill and innovation. 'TwoTired' from Germany took the first prize in the Accuracy Award, while 'Smrek' from Slovakia won the AccuracyPlus Award. The Reproducibility Award saw 'TheDeDuplicators,' a team from Germany and Greece, clinching the top spot. 

We look forward to the next challenge, which is coming up soon. The task will be to classify online job advertisements by occupation. Learn more on: European Statistics Awards - Web Intelligence - Deduplication Challenge | Announcements (statistics-awards.eu).

Stay tuned for more updates and help spread the word about the competition.  

Related Big Data sources

Web data

Related Stakeholders

Researchers

Related Themes

Innovation

Related Trusted Smart Statistics Hubs

WIH