Számalk Oktatási és Informatikai Zrt.
Cím: 1119 Budapest, Fejér Lipót u. 70.
E-mail: training##kukac##szamalk.hu
Telefon: +36 1 491 8974
To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark's distributed compute capability and its built-in machine learning library.
This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.This product is delivered as a voucher. After ordering, the voucher(s) will be available in your Dashboard (myLeapest). You will also receive required software set up for install 48 hours from time of purchase.
The following contents are included with this product. For any questions about these contents, please, contact the Seller of this product.
Machine Learning with Apache Spark - Lecture Ebook - eBook-English-(en-US)
Machine Learning with Apache Spark - Lab Guide
Data Scientists, Business Analysts, Software Developers, IT Architects
Participants should have the general knowledge of statistics and programming
Course Outline
Chapter 1.
Machine Learning Algorithms
Supervised vs Unsupervised Machine Learning
Supervised Machine Learning Algorithms
Unsupervised Machine Learning Algorithms
Choose the Right Algorithm
Life-cycles of Machine Learning Development
Classifying with k-Nearest Neighbors (SL)k-Nearest Neighbors Algorithmk-Nearest Neighbors Algorithm
The Error Rate
Decision Trees (SL)Random Forests
Unsupervised Learning Type: ClusteringK-Means Clustering (UL)K-Means Clustering in a Nutshell
Regression Analysis
Logistic Regression
Summary
Chapter 2.
Introduction to Functional Programming
What is Functional Programming (FP)?
Terminology: Higher-Order Functions
Terminology: Lambda vs Closure
A Short List of Languages that Support FPFP with JavaFP With JavaScript
Imperative Programming in JavaScript
The JavaScript map (FP) Example
The JavaScript reduce (FP) Example
Using reduce to Flatten an Array of Arrays (FP) Example
The JavaScript filter (FP) Example
Common High-Order Functions in Python
Common High-Order Functions in Scala
Elements of FP in R
Summary
Chapter 3.
Introduction to Apache Spark
What is Apache Spark
A Short History of Spark
Where to Get Spark?The Spark Platform
Spark Logo
Common Spark Use Cases
Languages Supported by Spark
Running Spark on a Cluster
The Driver Process
Spark Applications
Spark Shell
The spark-submit Tool
The spark-submit Tool Configuration
The Executor and Worker Processes
The Spark Application Architecture
Interfaces with Data Storage Systems
Limitations of Hadoop's MapReduce
Spark vs MapReduce
Spark as an Alternative to Apache Tez
The Resilient Distributed Dataset (RDD)
Spark Streaming (Micro-batching)Spark SQL
Example of Spark SQLSpark Machine Learning Library
GraphXSpark vs R
Summary
Chapter 4.
The Spark Shell
The Spark Shell UI
Spark Shell Options
Getting Help
The Spark Context (sc) and SQL Context (sqlContext)
The Shell Spark Context
Loading Files
Saving Files
Basic Spark ETL Operations
Summary
Chapter 5.
Spark Machine Learning Library
What is MLlib?
Supported Languages
MLlib Packages
Dense and Sparse Vectors
Labeled Point
Python Example of Using the Labeled
Point Class
LIBSVM format
An Example of a LIBSVM File
Loading LIBSVM Files
Local Matrices
Example of Creating Matrices in MLlib
Distributed Matrices
Example of Using a Distributed Matrix
Classification and Regression Algorithm
Clustering
Summary
Chapter 6.
Text Mining
What is Text Mining?
The Common Text Mining Tasks
What is Natural Language Processing (NLP)?
Some of the NLP Use Cases
Machine Learning in Text Mining and NLP
Machine Learning in NLPTF-IDF
The Feature Hashing Trick
Stemming
Example of Stemming
Stop Words
Popular Text Mining and NLP Libraries and Packages
Summary
Lab Exercises
Lab 1. Learning the Lab Environment
Lab 2. The Spark Shell
Lab 3. Using Random Forests for Classification with Spark MLlib
Lab 4. Using k-means Algorithm from MLlib
Lab 5. Text Classification with Spark ML Pipeline
Target Audience
Data Scientists, Business Analysts, Software Developers, IT Architects
Course Agenda
Applied Data Science and Business Analytics
Machine Learning Algorithms, Techniques and Common Analytical Methods
Apache Spark Introduction
Spark’s MLlib Machine Learning Library
This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:
Lab 1 - Using the spark-submit ToolSpark offers developers two ways of running your applications:Using the spark-submit toolUsing Spark ShellIn this lab, we will review what is involved in using the spark-submit tool.
Lab 2 - The Apache Spark ShellInteractive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).The lab instructions below apply to the Scala version of the Spark Shell.
Lab 3 - Using Random Forests for Classification with Spark MLlibIn this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression. In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.