CBT Nuggets - Apache Hadoop With Garth Schulte | 2.02 GB
The data revolution is upon us and Hadoop is THE leading Big Data platform. Fortune 500 companies are using it for storing and analyzing extremely large datasets, while other companies are realizing its potential and preparing their budgets for future Big Data positions. It's the elephant in Big Data's room!
Familiarity with Ubuntu Linux
Ubuntu Linux 12.04 LTS operating system
Related job functions:
Big Data architects
Big Data administrators
Big Data developers
This series will get you up to speed on Big Data and Hadoop. Topics include how to install, configure and manage a single and multi-node Hadoop cluster, configure and manage HDFS, write MapReduce jobs and work with many of the projects around Hadoop such as Pig, Hive, HBase, Sqoop, and Zookeeper. Topics also include configuring Hadoop in the cloud and troubleshooting a multi-node Hadoop cluster.
Hadoop Series Introduction 00:24:54
Welcome to Hadoop! This Nugget explains the challenges BIG DATA poses, how Hadoop was designed to solve them, and the value unstructured data can bring to companies of all sectors and sizes. We'll also cover the state of data, how companies use BIG DATA, and a high-level overview of Hadoop and its core technologies.
Hadoop Technology Stack 00:19:37
The sheer number of technologies around Hadoop is enough to make even the bravest IT souls cringe. This Nugget is here to save the day! We'll cover the core, essential and upcoming Hadoop projects and see what a basic Hadoop implementation looks like.
Hadoop Distributed File System (HDFS) 00:23:49
This Nugget dives into the architecture and internal workings of HDFS! We'll cover all of the HDFS node types along with their responsibilities in a Hadoop cluster, talk about single and multi-rack cluster topologies, rack awareness, and how HDFS handles block management. We'll also cover some of the major user and administration tools used to interact with HDFS.
Introduction to MapReduce 00:25:08
The world of distributed data processing and programming starts here! This Nugget begins by covering Hadoop's MapReduce architecture to see how the JobTracker and TaskTracker work together to serve up data stored across a cluster. We'll also cover MapReduce's internal phases and see how data flows through a MapReduce pipeline. This Nugget ends with a live demonstration that executes a MapReduce job across a local Hadoop cluster.
Installing Apache Hadoop (Single Node) 00:29:56
Installation time, w00t! This Nugget will walk you through an installation of Hadoop in pseudo-distributed mode to create a single node Hadoop cluster. We'll cover how to install and configure SSH, Java and Hadoop. Start, verify and stop all of the Hadoop demons, and cover time-saving tips along the way to make it as painless as possible.
Installing Apache Hadoop (Multi Node) 00:23:24
This Nugget will show you how to break out from a single node cluster to a fully distributed multi-node cluster. We'll cover what cluster configurations look like, how to configure master/slave nodes, and how to properly start and stop a multi-node cluster, along with some common cluster "stuff." Good times!
Troubleshooting, Administering and Optimizing Hadoop 00:38:58
This Nugget will get you up to speed on troubleshooting and tuning a cluster. We'll cover the troubleshooting process, walk through a demonstration on how to reproduce and fix a common installation issue and see where to look when issues occur. We'll also cover common administration tasks and walk through a demo on how to benchmark and tune a cluster using TeraGen/TeraSort.
Managing HDFS 00:25:26
Data! Data! Get your data here! We'll cover where to find sample data - big and small-for your Hadoop clusters, and how to push that data into HDFS and manage it with utilities such as dfsadmin and fsck. We'll also cover the upgrade process and how to configure rack awareness.
MapReduce Development 00:36:08
Take a journey through the magical land of MapReduce development! This Nugget will cover the development process and go through multiple live demonstrations on how to code, test, build, and run a MapReduce job on local filesystem data and against HDFS data in a live cluster.
Introduction to Pig 00:31:49
This Nugget will cover Pig, the data flow scripting language of Hadoop! You'll learn how Pig is a simple abstraction on top of MapReduce to quickly and easily write queries against HDFS data. We'll get the basics and components of Pig down, get familiar with the Pig Latin language, install and configure Pig, and see it in action with a demo. Oink Oink!
Developing with Pig 00:36:18
This Nugget is chock-full of Pig demonstrations! We'll cover how to load, store, filter, group, aggregate, and sort HDFS data interactively using Pig Latin within the grunt shell. We'll follow that up with batch processing by analyzing and executing complete Pig scripts where we cover how to join, combine and split our data flows, as well as write and implement our own custom user-defined functions.
Introduction to Hive 00:25:14
This Nugget covers Hive, the SQL of Hadoop! You'll learn how Hive is a another simple abstraction on top of MapReduce that provides us with a familiar way to access HDFS data. We'll cover the components and architecture of Hive to see how it stores data in table like structures over HDFS data. Also covered are the basics of HiveQL, the SQL-compliant query language used to query those structures, and installation and configuration of Hive. We'll end this Nugget with a live demo to see Hive and HiveQL in action. BZZZT!
Developing with Hive 00:27:39
This Nugget is chock-full of Hive demonstrations! We'll cover how to create external, internal, and partitioned hive tables, load data from the local filesystem as well as the distributed filesystem (HDFS), setup dynamic partitioning, create views, and manage indexes.
Introduction to HBase 00:27:53
This Nugget covers HBase, the low-latency way of getting small specific data out of Hadoop! We'll start with what HBase is all about and highlight the differences between row and column-oriented data stores. Also, we'll get familiar with the architecture of HBase, get it up and running in our Hadoop cluster, and even see it in action by creating, loading, and dropping an HBase table. Woohoo HBase!
Developing with HBase 00:24:35
Learn how to work with HBase data in this Nugget! We'll talk about the many ways to load and access HBase data, see how to configure a fully distributed HBase cluster, load data into an HBase table using Pig, query an HBase table using Hive, as well as pull a record of data out in real-time by starting and using a REST server.
Introduction to Zookeeper 00:24:51
Learn how to coordinate distributed applications with Zookeeper in this Nugget! We'll cover what Zookeeper is, the architecture, internal data storage, and learn how to bring up an ensemble of Zookeeper servers and store data within those servers.
Introduction to Sqoop 00:24:23
This Nugget will show you how to transfer data between Hadoop and relational database systems using Sqoop. We'll cover how to get Sqoop installed and configured, import data from a mySQL server into HDFS, import data from a SQL Server into Hive, and export data from Hadoop into a SQL Server instance. Sqadoosh!
Local Hadoop: Cloudera CDH VM 00:17:16
This Nugget will show you how to get up and running with Hadoop and its projects quickly! Cloudera's quick-start virtual machine is a great way to jump in and start learning Hadoop without the hassle of fully configuring a Hadoop cluster. We'll cover what Cloudera CDH is all about and some of the unique tools it offers, how to obtain Cloudera's quickstart VM, get it up and running, and take a tour of the VM, including Cloudera Manager and Hue.
Cloud Hadoop: Amazon EMR 00:21:35
This Nugget will show you how to use Amazon Web Services (AWS) Elastic MapReduce (EMR) to run fully managed Hadoop jobs in the cloud! We'll cover what EMR is and see how it's built on top of EC2 for processing and S3 for storage. We'll also transfer data directly from HDFS to an S3 bucket and create an EMR job flow to process the data stored inside of S3. (side note: all of these acronyms combined = AWSEMREC2S3!)
Cloud Hadoop: Microsoft HDInsight 00:18:26
Are you a Microsoft shop looking for a Hadoop solution that seamlessly integrates with your existing technology stack? Look no further! This Nugget will get you up to speed on HDInsight, Microsoft's take on Hadoop both locally and in the cloud. We'll cover HDInsight from the Windows Azure cloud and see how to get a fully configured Hadoop cluster up and running in minutes. We'll also take a tour of the HDInsight web portal, run a few samples and even RDP into our cluster. Who knew elephants could fly?!
More Info: http://www.cbtnuggets.com/it-training-videos/course/cbtn_hadoop
(All below links are interchangable. No password)
Buy a Premium Account Uploaded or Rapidgator or Secureupload = High speed + Parallel Downloads!
From Rapidgator (Premium Recommend)
From Secureupload (Premium Recommend)