OD20775

MOC on Demand: Performing Data Engineering on Microsoft HD Insight

  • Course Price:$675
  • Audience: IT Professionals
  • Portfolio: MOC-ON-DEMAND
  • Related Exams:
  • Related Certifications:

Description

About MOC on Demand
MOC On Demand from OakTree puts a massive catalog of Microsoft courses online right at your fingertips -- from anywhere, anytime. Each MOC On Demand course is the perfect blend of video, text, and lab-style instruction with knowledge checks throughout so students can gauge their comprehension.  Taking official Microsoft courses online has never been so simple.

Basic Course Package: $675.00 or 2 voucher days
Package Includes: Online Course, 90-day access to the course and labs.
*does not include digital courseware.

Plus Course Package: $950.00 or 3 voucher days
Package Includes: Online Course, 90-day access to the course and labs, digital courseware

Premium Course Package: $1250.00 or 4 voucher days
Package Includes: Online Course, 180-day access to the course and labs, digital courseware.

Registration
Once you register for this course, you will receive a reply to your request within 1 business day from our friendly training staff to verify the Microsoft Online Courses Package of your choice.  Once a member of our staff has verified your payment details, you will receive your login credentials to begin taking the online Microsoft course.
___________________________________________________________________________________________________________________________________________

About this course
The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Audience profile
The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

At course completion
After completing this course, students will be able to:

  • Explain how Microsoft R
  • Transform and clean big data sets

Prerequisites
In addition to their professional experience, students who attend this course should have:
  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.
  • Working knowledge of relational databases.

Course Outline

Module 1: Getting Started with HDInsight

This module introduces Hadoop, the MapReduce paradigm, and HDInsight.
Lessons
  • Big Data
  • Hadoop
  • MapReduce
  • HDInsight
After completing this module, students will be able to:
  • Describe Big data.
  • Describe Hadoop.
  • Describe MapReduce.
  • Describe HDInsight.

Module 2: Deploying HDInsight Clusters
At the end of this module the student will be able to deploy HDInsight clusters.
Lessons
  • HDInsight cluster types
  • Managing HDInsight Clusters
  • Managing HDInsight Clusters with PowerShell
After completing this module, students will be able to:
  • Describe HDInsight cluster types.
  • Describe the creation, management, and deletion of HDInsight clusters with the Azure portal.
  • Describe the creation, management, and deletion of HDInsight clusters with PowerShell.

Module 3: Authorizing Users to Access Resources
This module covers permissions and the assignment of permissions.
Lessons
  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters
After completing this module, students will be able to:
  • Describe how to authorize user access to objects.
  • Describe how to authorize users to execute code.
  • Describe how to manage domain-joined HDInsight clusters.

Module 4: Loading data into HDInsight
This module covers loading data into HDInsight.
Lessons
  • HDInsight Storage
  • Data loading tools
  • Performance and reliability
After completing this module, students will be able to:
  • Describe HDInsight storage configurations and architectures.
  • Describe options for loading data into HDInsight.
  • Describe benefits of compression and pre-processing in HDInsight.

Module 5: Troubleshooting HDInsight
This module describes how to troubleshoot HDInsight.
Lessons
  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite
After completing this module, students will be able to:
  • Analyze HDInsight logs.
  • Analyze YARN logs.
  • Analyze Heap dumps.
  • Use the operations management suite to monitor resources.

Module 6: Implementing Batch Solutions
This module describes how to implement batch solutions.
Lessons
  • Apache Hive storage
  • Querying with Hive and Pig
  • Operationalize HDInsight
After completing this module, students will be able to:
  • Describe Apache Hive storage.
  • Query data using Hive and Pig.
  • Operationalize HDInsight.

Module 7: Design Batch ETL solutions for big data with Spark
This module describes how to design batch ETL solutions for big data with Spark.
Lessons
  • What is Spark?
  • ETL with Spark
  • Spark performance
After completing this module, students will be able to:
  • Describe Spark and when to use it.
  • Describe the use of ETL with Spark.
  • Analyze Spark performance.

Module 8: Analyze Data with Spark SQL
This module describes how to analyze data with Spark SQL.
Lessons
  • Implement interactive queries
  • Perform exploratory data analysis
After completing this module, students will be able to:
  • Implement interactive queries.
  • Perform exploratory data analysis.

Module 9: Analyze Data with Hive and Phoenix
This module describes how to analyze data with Hive and Phoenix.
Lessons
  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
After completing this module, students will be able to:
  • Implement interactive queries with interactive Hive.
  • Perform exploratory data analysis using Hive.
  • Perform interactive processing by using Apache Phoenix.

Module 10: Stream Analytics
This module introduces Azure Stream Analytics.
Lessons
  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs
After completing this module, students will be able to:
  • Describe stream analytics and it’s capabilities.
  • Process streaming data with stream analytics.
  • Manage stream analytics jobs.

Module 11: Spark Streaming using the DStream API
This module introduces the Dstream API and describes how to create Spark structured streaming applications.
Lessons
  • Dstream
  • Create Spark structured streaming applications
  • Persistence and visualization
After completing this module, students will be able to:
  • Explain DStream.
  • Create Spark structured streaming applications.
  • Describe persistence and visualization.

Module 12: Develop big data real-time processing solutions with Apache Storm
This module explains how to develop big data real-time processing solutions with Apache Storm.
Lessons
  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm
After completing this module, students will be able to:
  • Persist long term data.
  • Stream data with Storm.
  • Create Storm topologies.
  • Configure Apache Storm.

Module 13: Analyze Data with Spark SQL
This module describes how to analyze data with Spark SQL.
Lessons
  • Implement interactive queries
  • Perform exploratory data analysis
After completing this module, students will be able to:
  • Implement interactive queries.
  • Perform exploratory data analysis.