skip to main content
European Commission
en
Newsroom
Overview     News

European Statistics Awards to reward best algorithms for identifying OJA duplicates

The European Statistics Awards Programme (https://statistics-awards.eu/competitions/4) aims to discover promising methodologies for processing of content from the World Wide Web with the purpose to extract valuable data for statistical and analytical purposes.

date:  21/08/2023

The goal of Online Jobs Advertisements (OJA) Deduplication Challenge is to determine whether various published job advertisements are potential double-entries, also known as "duplicates," to avoid the risk of counting the same job advertisement multiple times. The challenge is based on a dataset provided by the WIH.

Why focus on deduplication? Well, it is important that duplicated entries are identified and removed from calculations in order to produce meaningful statistics.

Given the data collection method and the size of the datasets, we need to develop an efficient and robust automated solution for this purpose, based on scripted methodologies and algorithms.

This challenge is therefore stimulating innovation in the area of web intelligence for statistics. It also aims to benchmark different approaches to help process and analyse OJA using state-of-the-art technologies.

Several awards are up for grabs in the challenge, including two linked to accuracy and one for reproducibility. Applications and entries for the Deduplication Challenge have already closed, but stay tuned for the announcement of the winners on 20 October 2023!

The European Statistics Awards is also planning to launch a new challenge linked to classification. Keep an eye out for more details about this challenge, which is expected to launch after the summer.

Thank you for reading!