Overview

Contents:

  • Data types and their characteristics
  • Common functions of data science infrastructures
  • Storage, compute, and cloud infrastructures for data science
  • Concept of a data lake
  • Data pre-processing methods and selected tools
  • Time series and graph data, the respective DBMS, and query languages
  • Data analytics platforms
  • Data presentation and visualization
  • Data science workflows and selected infrastructure components

Key Information

NameValue
ContactPhilipp Wieder
Venueonline
TimeMondays and Fridays, 10:15 - 11:45
Language
ModuleB.Inf.1231.Mp: Infrastrukturen für Data Science
SWS4
ECTS6
Presence time56 hrs
Independent study124 hrs

Learning Outcome

Upon completion the course, students

  • understand the basic functions of data science infrastructures and their significance.
  • understand basic data types and their specifics.
  • understand the most important technical infrastructures for storing and processing data locally and in the cloud as well as their advantages and disadvantages in relation to data science applications.
  • can apply the concept of the data lake to basic data science problems.
  • are able to apply the different steps of data pre-processing to selected data sets.
  • can identify the characteristics of time series and graph data and are able to recall the functions of DBMSs designed for their processing.
  • can present the basic tasks of data analysis platforms and can describe them using examples.
  • can apply methods and tools for the presentation and visualisation of data.
  • can model basic data science workflows and are able to transfer their knowledge to basic data science projects.

Python and basic database knowledge

Examination

In-class, written exam (90 min) or oral exam (approx. 30 min.)

Examination prerequisites:

Students complete 50% of the homework exercises.

Examination requirements:

Through the examination students demonstrate that they are able to describe basic functions of (cloud-based) data science infrastructures as well as to specify and identify basic data types. Students can also prove their understanding of data lakes and can apply their knowledge of MapReduce and Hadoop in that particular context. They can analyse basic data pre-processing problems and sketch common solutions. Student can show that they understand time series and graph data as well as the corresponding DBMS and that they can present common tasks of data analysis platforms. Through the examination, students also demonstrate their ability to select appropriate methods for visualising data and show that they are able to create basic data science workflows

Responsible

Portrait von Philipp Wieder Prof. Dr.Philipp Wieder