Blockkurs Big Data
Course is already full, it's not possible to join it now.
In recent years, new scalable infrastructure for storage and data analysis has been developed which has become known under the name Big Data. This type of technology has become indispensable for storing and analysing data in all kinds of application areas, from business to science.
Goal of this course is a basic understanding of big data technology like Hadoop, Hive, Pig. Students will learn the fundamentals necessary to deal with this kind of data to start working on large scale problems.
Knowledge of elementary programming concepts will be helpful, as well as familiarity Java language and infrastructure (for example, knowing what a jar is, the classpath, build tools like amven). Be aware that lack of such knowledge will increase the time demand of the class.
Roughly, the following topics will be covered:
- Basic Hadoop and HDFS infrastructure.
- Writing Map Reduce jobs.
- Map Reduce streaming for interfacing with scripting languages like Python.
- Apache Hive and Pig for basic data extraction.
- Next gen Big Data infrastructure like Spark and Flink.
- Basic feature extraction and transformation.
- Computing statistics like counts and trends.
A TUBIT-Account is needed for the login on our computers and access to ISIS!
If you have your own laptop and access to eduroam, it is possible to participate without a TUBIT account.
This course is an elective course in the Machine Learning I module.