Home
Big Data Training
Programming with Big Data in R Training Course

Programming with Big Data in R Training Course

Big Data is a term that refers to solutions destined for storing and processing large data sets. Developed by Google initially, these Big Data solutions have evolved and inspired other similar projects, many of which are available as open-source. R is a popular programming language in the financial industry.

This course is available as onsite live training in Peru or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction to Programming Big Data with R (bpdR)

Setting up your environment to use pbdR
Scope and tools available in pbdR
Packages commonly used with Big Data alongside pbdR

Message Passing Interface (MPI)

Using pbdR MPI 5
Parallel processing
Point-to-point communication
Send Matrices
Summing Matrices
Collective communication
Summing Matrices with Reduce
Scatter / Gather
Other MPI communications

Distributed Matrices

Creating a distributed diagonal matrix
SVD of a distributed matrix
Building a distributed matrix in parallel

Statistics Applications

Monte Carlo Integration
Reading Datasets
Reading on all processes
Broadcasting from one process
Reading partitioned data
Distributed Regression
Distributed Bootstrap

21 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Open Training Courses require 5+ participants.

Programming with Big Data in R Training Course - Booking

Full name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Allow Publishing Certificate

If you check this box the participants will receive an option to publish their course certificate on the NobleProg Certified Professional Catalogue.

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Programming with Big Data in R Training Course - Enquiry

Full name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Programming with Big Data in R - Consultancy Enquiry

Consultancy Enquiry

Full name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Consultancy Duration

Number of Consultants

Suitable Date

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (2)

The subject matter and the pace were perfect.

Tim - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

Michael the trainer is very knowledgeable and skillful about the subject of Big Data and R. He is very flexible and quickly customize the training meeting clients' need. He is also very capable to solve technical and subject matter problems on the go. Fantastic and professional training!.

Xiaoyuan Geng - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

8747 USD (Classroom)

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

21 Hours

This course is intended for developers and data scientists who want to understand and implement artificial intelligence in their applications. Special focus is placed on data analytics, distributed AI, and natural language processing.

Unified Batch and Stream Processing with Apache Beam

14 Hours

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By the end of this training, participants will be able to:

Install and configure Apache Beam.
Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
Execute pipelines across multiple environments.

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

This course will be available Scala in the future. Please contact us to arrange.

Data Vault: Building a Scalable Data Warehouse

28 Hours

In this instructor-led, live training in Peru, participants will learn how to build a Data Vault.

By the end of this training, participants will be able to:

Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
Build and deploy highly scalable and repeatable warehouses.

Apache Flink Fundamentals

28 Hours

This instructor-led, live training in Peru (online or onsite) introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application in Apache Flink.

By the end of this training, participants will be able to:

Set up an environment for developing data analysis applications.
Understand how Apache Flink's graph-processing library (Gelly) works.
Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications.
Manage diverse workloads.
Perform advanced analytics.
Set up a multi-node Flink cluster.
Measure and optimize performance.
Integrate Flink with different Big Data systems.
Compare Flink capabilities with those of other big data processing frameworks.

Introduction to Graph Computing

28 Hours

In this instructor-led, live training in Peru, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

Understand how graph data is persisted and traversed.
Select the best framework for a given task (from graph databases to batch processing frameworks.)
Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
View real-world big data problems in terms of graphs, processes and traversals.

Hortonworks Data Platform (HDP) for Administrators

21 Hours

This instructor-led, live training in Peru (online or onsite) introduces Hortonworks Data Platform (HDP) and walks participants through the deployment of Spark + Hadoop solution.

By the end of this training, participants will be able to:

Use Hortonworks to reliably run Hadoop at a large scale.
Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows.
Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project.
Process different types of data, including structured, unstructured, in-motion, and at-rest.

Data Analysis with Hive/HiveQL

7 Hours

This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive

Impala for Business Intelligence

21 Hours

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.

Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.

Audience

This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.

After this course delegates will be able to

Extract meaningful information from Hadoop clusters with Impala.
Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
Troubleshoot Impala.

Confluent KSQL

7 Hours

This instructor-led, live training in Peru (online or onsite) is aimed at developers who wish to implement Apache Kafka stream processing without writing code.

By the end of this training, participants will be able to:

Install and configure Confluent KSQL.
Set up a stream processing pipeline using only SQL commands (no Java or Python coding).
Carry out data filtering, transformations, aggregations, joins, windowing, and sessionization entirely in SQL.
Design and deploy interactive, continuous queries for streaming ETL and real-time analytics.

Apache NiFi for Administrators

21 Hours

In this instructor-led, live training in Peru (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.

By the end of this training, participants will be able to:

Install and configure Apachi NiFi.
Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
Automate dataflows.
Enable streaming analytics.
Apply various approaches for data ingestion.
Transform Big Data and into business insights.

Apache NiFi for Developers

7 Hours

In this instructor-led, live training in Peru, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.

By the end of this training, participants will be able to:

Understand NiFi's architecture and dataflow concepts.
Develop extensions using NiFi and third-party APIs.
Custom develop their own Apache Nifi processor.
Ingest and process real-time data from disparate and uncommon file formats and data sources.

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in Peru, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

Learn how to use Spark with Python to analyze Big Data.
Work on exercises that mimic real world cases.
Use different tools and techniques for big data analysis using PySpark.

Spark Streaming with Python and Kafka

7 Hours

This instructor-led, live training in Peru (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.

By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.

Apache Spark MLlib

35 Hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

spark.mllib contains the original API built on top of RDDs.
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

Introduction to Data Visualization with Tidyverse and R

7 Hours

The Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble.

In this instructor-led, live training, participants will learn how to manipulate and visualize data using the tools included in the Tidyverse.

By the end of this training, participants will be able to:

Perform data analysis and create appealing visualizations
Draw useful conclusions from various datasets of sample data
Filter, sort and summarize data to answer exploratory questions
Turn processed data into informative line plots, bar plots, histograms
Import and filter data from diverse data sources, including Excel, CSV, and SPSS files

Audience

Beginners to the R language
Beginners to data analysis and data visualization

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Programming with Big Data in R Training Course

Course Outline

Introduction to Programming Big Data with R (bpdR)

Message Passing Interface (MPI)

Distributed Matrices

Statistics Applications

Testimonials (2)

Tim - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

Xiaoyuan Geng - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

Upcoming Courses

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Programming with Big Data in R Training Course

Course Outline

Introduction to Programming Big Data with R (bpdR)

Message Passing Interface (MPI)

Distributed Matrices

Statistics Applications

Testimonials (2)

Tim - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

Xiaoyuan Geng - Ottawa Research and Development Center, Science Technology Branch, Agriculture and Agri-Food Canada

Course - Programming with Big Data in R

Upcoming Courses

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Programming with Big Data in R

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Unified Batch and Stream Processing with Apache Beam

Data Vault: Building a Scalable Data Warehouse

Apache Flink Fundamentals

Introduction to Graph Computing

Hortonworks Data Platform (HDP) for Administrators

Data Analysis with Hive/HiveQL

Impala for Business Intelligence

Confluent KSQL

Apache NiFi for Administrators

Apache NiFi for Developers

Python and Spark for Big Data (PySpark)

Spark Streaming with Python and Kafka

Apache Spark MLlib

Introduction to Data Visualization with Tidyverse and R

Related Categories

Big Data

R Language

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites