Big Data & Hadoop with HDInsight
Learn how to implement Hadoop solutions using HDinsight to implement applications that manage a great amount of data.
Duration: 40 hours
After the completion of this course, attendees will be able of: describe and understand what it is Big Data.
- Understand and master the skills of Hadoop.
- Understand Big Data problems.
- Install and configure Hindsight.
- Understand and master HDFS, MAP / Reduce and YARN.
- Load and transform data using Hive / Pig / Sqoop.
- Write applications with Map / Reduce and YARN.
- Understand Data Science and Machine Learning
”Very knowledgeable and good energy. Kept the class moving and engaged”
WHEN and WHERE is this course running?
What does this course cover?
In this course the attendees will know how to implement Hadoop solutions using HDInsight to implement applications that manage a great amount of data.
This course introduces the student in the domain of the management of large volumes of data and the installation and provisioning of Azure HDInsight. It covers most of the components inside HDInsight to provide the in-depth understanding of Big Data and Hadoop with Microsoft HDInsight and Microsoft SQL Server. Attendees will learn job skills for the development of solutions of large volumes of data.
Who is this course designed for?
This course is directed to data, Business Intelligence professional managers or to any data management professional who wants to learn about the new tools for advanced data management.
Pre-requisites: What do you need to know?
Before attending this course, it is recommended that participants have at least basic experience in databases, data mining or Business Intelligence.
Attendees should know the SQL language.
All our courses can be offered as a private delivery and tailored for your team's specific needs
Module 01: Introduction to Big Data (2 hours)
- Modern Data Warehouse Architectures
- Microsoft Vision of Big Data
- Big Data and Data Science
- Hadoop 2.0: Ecosystem
- Choosing Hadoop 2.0: use cases
- Microsoft Analysis Platform System
Module 02: Hadoop 2.0 Basic Concepts (1 hour)
- Hadoop Ecosystem
- API HDInsight
Lab 02: Running HDInsight samples (30 minutes)
Module 03: Azure HDInsight (1 hour)
- Functions and Limitations
- Azure Blog Storage
- Lab 03: HDIsight in Azure (30 minutes)
Module 04: Hadoop Distributed File System (2 hours)
- Understanding HDFS
- Storing data in HDFS
- HDFS details
Lab 04: HDFS (1 hour)
Module 05: Exceution Frameworks. From Map/Reduce to YARN (2 hours)
- Map/Reduce Architecture
- YARN Architecture
- Run a Map/Reduce job on Yarn
- Executing Hadoop programs
Lab 05: Execution of Map/Reduce programs (1 hour)
Module 06: HDFS development and Map/Reduce (2 hours)
- FrameWork to develop YARN & Map/Reduce
- Java, C#
- Developing Map/Reduce solutions
- HDInsight SDK
Lab 06: Developing Hadoop applications (1 hour)
Module 07: HIVE (3 hours)
- Introduction to HIVE
- Movement of data from/to HIVE
- Data Manipulation with HIVE
- HIVE features
- Stinger Initiative
- Apache TEZ
- HIVE OBDC and Excel Integration
Lab 07: HIVE (90 Minutes)
Module 08: PIG (2 hours)
- Loading data
- Transforming data
Lab 08: Pig (30 minutes)
Module 09: SQOOP (1 hour)
- Introduction to SQOOP
- Transferring data
Lab 09: SQOOP (30 minutes)
Module 10: Azure Machine Learning (2 hours)
- Basic Machine Learning
- Machine Learning Scenarios
- What is Azure Machine Learning?
- Azure Machine Learning components
Lab 10: Playing with Azure Machine Learning Samples (1 hour)
Module 11: Creating and Deploying Machine Learning Models (4 hours)
- Create an Azure Machine Learning workspace
- Azure Machine Learning Studio
- Getting Data
- Creating and Running experiments
- Publishing experiments: Web Services Deployment
Lab 11: Creating your first experiment with Azure Machine Learning Studio (1 hour)
Optional 1: Power BI Visualizing Big Data (1 hour)
- Access Hadoop data from Power BI
- Visualizing & Integration