enEnglish
CROS

Big Data tools for data scientists

 

Big Data tools for data scientists

Course Leader

Marco PUTS

Target Group

IT professionals whose role is to support statisticians with big data infrastructure, either via local big data clusters or via cloud solutions, and the engineering of big data processing. Methodologists and statisticians with a strong IT background who are expected to handle big data infrastructures and unstructured data.

Entry Qualifications

  • Sound command of English. Participants should be able to make short interventions and to actively participate in discussions
  • The participants should be computer literate and able to programme in R and/or Python

Objective(s)

  • Learn how to extract relevant information for statistical purposes from huge amounts of data

Contents

  • Big data clusters;
  • Cloud computing;
  • Hadoop and MapReduce;
  • Analyzing data in Hadoop with SQL: Hive;
  • Distributed programming with Spark;
  • NoSQL databases;
  • Techniques and tools for extracting data from the web

Expected Outcome

Participants will have a broad overview of modern state of the art techniques for managing and analyzing big data, its tools and infrastructure.

Training Methods

  • Presentations and lectures
  • Exchange of views/experiences on national practices
  • Exercises
 

Required Reading

None

Suggested Reading

Required Preparation

Participants should have at least some basic programming knowledge, especially in Python and R languages. Knowledge of relational databases are strongly suggested.

Trainer(s)/
Lecturer(s)

Marco PUTS (CBS Netherlands)

Martijn TENNEKES (CBS Netherlands)

Bjoern Ole MUSSMANN (CBS Netherlands)

Donato SUMMA (ISTAT)

 

Practical Information

When

Duration

Where

Organiser

Application  via National Contact Point

05-10-2021

12-10-2021

19-10-2021

26-10-2021

4 sessions

ONLINE

ICON-INSTITUT Public Sector GmbH

Deadline: 09.08.2021