Machine Learning with Apache Spark Foundation

Machine Learning with Apache Spark Foundation

Kód: WA2610
Időtartam:1 nap
Nehézségi szint:
  • Kezdő
19 500 Ft
(Bruttó ár: 24 765 Ft)
Kérdésem van!
Képzési forma
Képzés nyelve
- + Jelentkezem


To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark's distributed compute capability and its built-in machine learning library.
This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.This product is delivered as a voucher. After ordering, the voucher(s) will be available in your Dashboard (myLeapest). You will also receive required software set up for install 48 hours from time of purchase.


The following contents are included with this product. For any questions about these contents, please, contact the Seller of this product.

Machine Learning with Apache Spark - Lecture Ebook - eBook-English-(en-US)

Machine Learning with Apache Spark - Lab Guide


Kiknek ajánljuk

Data Scientists, Business Analysts, Software Developers, IT Architects


Szükséges előképzettség

Participants should have the general knowledge of statistics and programming



Course Outline

Chapter 1.

Machine Learning Algorithms

Supervised vs Unsupervised Machine Learning

Supervised Machine Learning Algorithms

Unsupervised Machine Learning Algorithms

Choose the Right Algorithm

Life-cycles of Machine Learning Development

Classifying with k-Nearest Neighbors (SL)k-Nearest Neighbors Algorithmk-Nearest Neighbors Algorithm

The Error Rate

Decision Trees (SL)Random Forests

Unsupervised Learning Type: ClusteringK-Means Clustering (UL)K-Means Clustering in a Nutshell

Regression Analysis

Logistic Regression



Chapter 2.

Introduction to Functional Programming

What is Functional Programming (FP)?

Terminology: Higher-Order Functions

Terminology: Lambda vs Closure

A Short List of Languages that Support FPFP with JavaFP With JavaScript

Imperative Programming in JavaScript

The JavaScript map (FP) Example

The JavaScript reduce (FP) Example

Using reduce to Flatten an Array of Arrays (FP) Example

The JavaScript filter (FP) Example

Common High-Order Functions in Python

Common High-Order Functions in Scala

Elements of FP in R



Chapter 3.

Introduction to Apache Spark

What is Apache Spark

A Short History of Spark

Where to Get Spark?The Spark Platform

Spark Logo

Common Spark Use Cases

Languages Supported by Spark

Running Spark on a Cluster

The Driver Process

Spark Applications

Spark Shell

The spark-submit Tool

The spark-submit Tool Configuration

The Executor and Worker Processes

The Spark Application Architecture

Interfaces with Data Storage Systems

Limitations of Hadoop's MapReduce

Spark vs MapReduce

Spark as an Alternative to Apache Tez

The Resilient Distributed Dataset (RDD)

Spark Streaming (Micro-batching)Spark SQL

Example of Spark SQLSpark Machine Learning Library

GraphXSpark vs R



Chapter 4.

The Spark Shell

The Spark Shell UI

Spark Shell Options

Getting Help

The Spark Context (sc) and SQL Context (sqlContext)

The Shell Spark Context

Loading Files

Saving Files

Basic Spark ETL Operations



Chapter 5.

Spark Machine Learning Library

What is MLlib?

Supported Languages

MLlib Packages

Dense and Sparse Vectors

Labeled Point

Python Example of Using the Labeled

Point Class

LIBSVM format

An Example of a LIBSVM File

Loading LIBSVM Files

Local Matrices

Example of Creating Matrices in MLlib

Distributed Matrices

Example of Using a Distributed Matrix

Classification and Regression Algorithm




Chapter 6.

Text Mining

What is Text Mining?

The Common Text Mining Tasks

What is Natural Language Processing (NLP)?

Some of the NLP Use Cases

Machine Learning in Text Mining and NLP

Machine Learning in NLPTF-IDF

The Feature Hashing Trick


Example of Stemming

Stop Words

Popular Text Mining and NLP Libraries and Packages


Lab Exercises

Lab 1. Learning the Lab Environment

Lab 2. The Spark Shell

Lab 3. Using Random Forests for Classification with Spark MLlib

Lab 4. Using k-means Algorithm from MLlib

Lab 5. Text Classification with Spark ML Pipeline


Target Audience

Data Scientists, Business Analysts, Software Developers, IT Architects


Course Agenda

Applied Data Science and Business Analytics

Machine Learning Algorithms, Techniques and Common Analytical Methods

Apache Spark Introduction

Spark’s MLlib Machine Learning Library


This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:

Lab 1 - Using the spark-submit ToolSpark offers developers two ways of running your applications:Using the spark-submit toolUsing Spark ShellIn this lab, we will review what is involved in using the spark-submit tool.


Lab 2 - The Apache Spark ShellInteractive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).The lab instructions below apply to the Scala version of the Spark Shell.


Lab 3 - Using Random Forests for Classification with Spark MLlibIn this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression. In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.

Hírlevél feliratkozás

Az Adatvédelmi szabályzatot megértettem és elfogadom, feliratkozom a Számalk hírlevelére.

Tanfolyami naptár