Big Data & Hadoop with HDInsight


Learn how to implement Hadoop solutions using HDinsight to implement applications that manage a great amount of data.


Duration: 40 hours

Level: 300


After the completion of this course, attendees will be able of: describe and understand what it is Big Data.

  • Understand and master the skills of Hadoop.
  • Understand Big Data problems.
  • Install and configure Hindsight.
  • Understand and master HDFS, MAP / Reduce and YARN.
  • Load and transform data using Hive / Pig / Sqoop.
  • Write applications with Map / Reduce and YARN.
  • Understand Data Science and Machine Learning

”Very knowledgeable and good energy. Kept the class moving and engaged”

WHEN and WHERE is this course running?

What does this course cover?

In this course the attendees will know how to implement Hadoop solutions using HDInsight to implement applications that manage a great amount of data.

This course introduces the student in the domain of the management of large volumes of data and the installation and provisioning of Azure HDInsight. It covers most of the components inside HDInsight to provide the in-depth understanding of Big Data and Hadoop with Microsoft HDInsight and Microsoft SQL Server. Attendees will learn job skills for the development of solutions of large volumes of data.

Who is this course designed for?

This course is directed to data, Business Intelligence professional managers or to any data management professional who wants to learn about the new tools for advanced data management.

Pre-requisites: What do you need to know?

Before attending this course, it is recommended that participants have at least basic experience in databases, data mining or Business Intelligence.

Attendees should know the SQL language.

All our courses can be offered as a private delivery and tailored for your team's specific needs

Contact us

Course outline

Module 01: Introduction to Big Data (2 hours)
  • Modern Data Warehouse Architectures
  • Microsoft Vision of Big Data
  • Big Data and Data Science
  • Hadoop 2.0: Ecosystem
  • Choosing Hadoop 2.0: use cases
  • Microsoft Analysis Platform System
Module 02: Hadoop 2.0 Basic Concepts (1 hour)
  • HDFS
  • Map/Reduce
  • Hadoop Ecosystem
  • API HDInsight

Lab 02: Running HDInsight samples (30 minutes)

Module 03: Azure HDInsight (1 hour)
  • Cloud
  • Functions and Limitations
  • Azure Blog Storage
  • Lab 03: HDIsight in Azure (30 minutes)
Module 04: Hadoop Distributed File System (2 hours)
  • Understanding HDFS
  • Storing data in HDFS
  • HDFS details

Lab 04: HDFS (1 hour)

Module 05: Exceution Frameworks. From Map/Reduce to YARN (2 hours)
  • Map/Reduce Architecture
  • YARN Architecture
  • Run a Map/Reduce job on Yarn
  • Executing Hadoop programs

Lab 05: Execution of Map/Reduce programs (1 hour)

Module 06: HDFS development and Map/Reduce (2 hours)
  • FrameWork to develop YARN & Map/Reduce
  • Java, C#
  • Developing Map/Reduce solutions
  • HDInsight SDK

Lab 06: Developing Hadoop applications (1 hour)

Module 07: HIVE (3 hours)
  • Introduction to HIVE
  • Movement of data from/to HIVE
  • Data Manipulation with HIVE
  • HIVE features
  • Stinger Initiative
  • Apache TEZ
  • HIVE OBDC and Excel Integration

Lab 07: HIVE (90 Minutes)

Module 08: PIG (2 hours)
  • Loading data
  • Transforming data
  • Dump

Lab 08: Pig (30 minutes)

Module 09: SQOOP (1 hour)
  • Introduction to SQOOP
  • Transferring data

Lab 09: SQOOP (30 minutes)

Module 10: Azure Machine Learning (2 hours)
  • Basic Machine Learning
  • Machine Learning Scenarios
  • What is Azure Machine Learning?
  • Azure Machine Learning components

Lab 10: Playing with Azure Machine Learning Samples (1 hour)

Module 11: Creating and Deploying Machine Learning Models (4 hours)
  • Create an Azure Machine Learning workspace
  • Azure Machine Learning Studio
  • Getting Data
  • Creating and Running experiments
  • Publishing experiments: Web Services Deployment

Lab 11: Creating your first experiment with Azure Machine Learning Studio (1 hour)

Optional 1: Power BI Visualizing Big Data (1 hour)
  • Access Hadoop data from Power BI
  • Visualizing & Integration

WHEN and WHERE is this course running next?

This course may be scheduled in more than one region. Please check availability in your country.

Check dates