Eurostat is pleased to announce the Web Intelligence Competition as part of the European Statistics Awards Program. The competition aims at stimulating innovation in the area of Web Intelligence for European statistics. 

The Web Intelligence DEDUPLICATION CHALLENGE will focus on identifying potential duplicate job postings on websites as a basic condition to produce high quality statistics from online job advertisements. 

The competition will be launched in the second half of December 2022
Registration for participation will be open until 1 March, 2023
The deadline for submissions will be 31 March 2023 or 16 April 2023 depending on the chosen award category. Teams can compete in three different categories.

 

Poster: Web Intelligence Competition - Deduplication challenge


The Challenge 

Online job advertisements are typically published on job posting websites, company websites and job seeker portals. They advertise a job opening and try to attract potential job seekers. Job advertisements typically contain information about the company offering a job, a description of the job, requirements for skills and competences of the potential candidate, and benefits of a job.

Companies often publish job advertisements on different web portals. Web portals might also republish job advertisements found elsewhere. To be able to calculate meaningful statistics based on data from online job advertisements, duplicate advertisements must be identified and removed. Given the specific method of data collection and the large amount of datasets, an efficient and robust automated solution needs to be developed.  
 
The Web Intelligence DEDUPLICATION CHALLENGE aims at identifying potential duplicates from a large set of job advertisements in order to avoid the risk of overestimating the number of jobs available on the job market.  
 
The evaluation of the results will focus on the approach used to identify duplicates (the methods and algorithms developed) and their robustness for general use on identically structured, randomly selected datasets. As part of the evaluation process, the script submitted by participants will be tested on a dataset specified by the evaluation panel and compared to the baseline algorithm. 
 
The competition is open to teams of up to five persons. Each team member must meet the competition’s eligibility requirements and sign a non-disclosure agreement before the team can access the dataset and make a valid submission. 
 
This is a great opportunity to apply your knowledge of Web Intelligence to a real-world situation and win 

  • €10 000 for the first place in the Accuracy award, 
  • €3 000 for the first place in the AccuracyPlus award,
  • €10 000 for the first place in the Reproducibility award. 

Striking the first place in all three categories, your team could win up to €23 000
 
Nine prizes will be awarded:

  • 3 for Accuracy, 
  • 3 for AccuracyPlus, i.e. solutions identifying duplicates not found by the organisation committee, 
  • 3 for Reproducibility, i.e. the most reproducible and scalable solutions for regular production. 

 

Award First place Second place Third place
Accuracy €10 000 €4 000 €3 000
AccuracyPlus €3 000 €2 000 €1 000
Reproducibility €10 000 €4 000 €3 000

 

The teams compete on two dimensions: 

  • the ability of their model to provide accurate estimates
  • the potential of their methods to be replicated and extended to European statistical production. 


During the competition, a team can make up to 10 successful submissions, competing for the Accuracy and AccuracyPlus awards. To be eligible for the Reproducibility award, the teams should also submit a detailed description of the methodology used. This is counted as a separate submission. 

Teams will also have the opportunity to disseminate their work via the Eurostat communication channels and may be given a chance to present their solutions at events organised by Eurostat.  

The Web Intelligence competition makes part of the European Statistics awards program, which runs until the end of 2025. We plan to organise four annual rounds of competition on Web Intelligence. 

For more information

To learn more about the competition, the timing, the evaluation, and the awards, please visit the European Statistics Awards website 

Stay tuned for more information and follow us via our social media – Facebook, Instagram,  Twitter, and LinkedIn.  

If you have any queries, please visit our contact us page.