Search our courses
Training

This Advanced Analytics for Structured Data using AWS 2 day course provides a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.


What you'll learn

•    Navigate the AWS Console for key areas discussed in this class
•    Utilize AWS for data processing and data management
•    Describe patterns for handling structured data with AWS services
•    Understand the usage of AWS Elastic Map Reduce (EMR)
•    Understand the facilities provided by Elastic Map Reduce (EMR)
•    Identify the facilities provided by Apache Airflow for workflow
•    Outline the facilities provided by Glue (Data Catalog)
•    Describe the facilities provided by Aurora MySQL
•    Define the facilities provided by S3 – Simple Storage Service
•    Understand the facilities provided by Informatica Cloud (ICS)
•    Identify the features and functions of AWS Lambda
•    Describe the features of Hive, HiveQL, and the Hive CLI
•    Discuss file formats used in Advanced Analytics
•    Understand AWS Athena usages across varied data sources

Advanced Analytics for Structured Data using AWS

Price €1,185.00

Course Code

GTBD10

Duration

2 Day

Course Fee

€1,185.00

Accreditation

N/A

Target Audience

This is a general introduction course for anyone who wants a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.

Attendee Requirements

  • Basic understanding of a coding, AWS console, and cloud are helpful.

Expand all

Course Description

This Advanced Analytics for Structured Data using AWS 2 day course provides a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.


What you'll learn

•    Navigate the AWS Console for key areas discussed in this class
•    Utilize AWS for data processing and data management
•    Describe patterns for handling structured data with AWS services
•    Understand the usage of AWS Elastic Map Reduce (EMR)
•    Understand the facilities provided by Elastic Map Reduce (EMR)
•    Identify the facilities provided by Apache Airflow for workflow
•    Outline the facilities provided by Glue (Data Catalog)
•    Describe the facilities provided by Aurora MySQL
•    Define the facilities provided by S3 – Simple Storage Service
•    Understand the facilities provided by Informatica Cloud (ICS)
•    Identify the features and functions of AWS Lambda
•    Describe the features of Hive, HiveQL, and the Hive CLI
•    Discuss file formats used in Advanced Analytics
•    Understand AWS Athena usages across varied data sources

Course Outline

Chapter 1. Advanced Analytics with AWS

•        What are advanced analytics?
•        Introduction to AWS services for Analytics
•        AWS Public Data Sets
•        Forces and Trends in Cloud Analytics
•        Data Storage Platforms
•        Data Lifecycle and Events
•        What is JSON?

Chapter 2. Elastic MapReduce

•        What is Amazon EMR?
•        Getting started with EMR
•        EMR planning
•        Running Hadoop Applications for data processing
•        Hive and EMR
•        Spark and EMR
•        Kinesis and EMR
•        ETL with EMR
•        AWS CLI and EMR
•        AWS Console Walkthrough: EMR

Chapter 3. AWS GLUE

•        What is Glue?
•        How Glue works
•        AWS Glue Console
•        Getting started with Glue
•        Security management
•        Glue Data Catalog
•        Authoring with Glue
•        Auto-population and schema inference
•        Events and monitoring
•        Troubleshooting
•        ETL with Glue
•        Glue Application Programming Interface (API)
•        AWS Console Walkthrough: Glue

Chapter 4. Apache Airflow

•        What is Apache Airflow?
•        Introduction to Apache Airflow components
•        Visualizing DAG
•        Authoring DAGs
•        Performance Insights
•        Performance Graphs
•        Airflow Features
•        Use Cases
•        Workflow Tables Stakes
•        Incubation of Airflow
•        Airflow at Work

Chapter 5. Amazon Aurora

•        What is Amazon RDS?
•        Introduction to Aurora
•        MySQL and Aurora compatibility
•        Service-oriented Architecture and RDS
•        Data replication
•        Fully managed
•        Shared accountability
•        Data encryption at rest and in motion
•        Aurora as a meta store
•        AWS Console Walkthrough: Aurora

Chapter 6. Introduction to Informatica Cloud (ICS)

•        What is Informatica Cloud?
•        Integration Platform as a Service
•        Cloud-native migration and ICS
•        Use cases for Informatica Cloud
•        Cloud Connectors
•        ICS Connectors
•        Information Cloud Options
•        Citizen developers and ICS
•        Secure Agent
•        Cloud Integration Hub
•        ICS Console Walkthrough

Chapter 7. S3 – Simple Storage Service

•        What is S3?
•        Introduction to S3
•        Storage
•        Replication
•        CAP Theorem
•        Data Consistency
•        Buckets
•        Amazon Resource Name (ARN)
•        Resource Sharing
•        Versioning
•        Lifecycle
•        Security in S3
•        Use cases for S3
•        AWS Console Walkthrough: S3

Chapter 8. AWS Lambda

•        What is Lambda?
•        Introduction to Serverless Computing
•        What can you do with Lambda?
•        Lambda services
•        Triggering for digital data supply chain
•        Data processing with Lambda and Glue
•        Managed analytics pipeline with Lambda
•        AWS Console Walkthrough: Lambda

Chapter 9. HIVE

•        What is Hive?
•        Hive's value proposition
•        Hive's Main Sub-Systems
•        Hive Features
•        The "Classic" Hive Architecture
•        The New Hive Architecture
•        HiveQL
•        Where are the Hive tables located?
•        Hive Command-line Interface (CLI)
•        The Beeline Command Shell
•        Differences and considerations for Hive on Amazon EMR
•        Configuring an External Metastore for Hive
•        Use the Hive JDBC Driver
•        Hive release history
•        Hive Walkthrough

Chapter 10. HIVE CLI

•        Hive Command-line Interface (CLI)
•        The Hive Interactive Shell
•        Running Host OS Commands from the Hive Shell
•        Interfacing with HDFS from the Hive Shell
•        The Hive in Unattended Mode
•        The Hive CLI Integration with the OS Shell
•        Executing HiveQL Scripts
•        Comments in Hive Scripts
•        Variables and Properties in Hive CLI
•        Setting Properties in CLI
•        Example of Setting Properties in CLI
•        Hive Namespaces
•        Using the SET Command
•        Setting Properties in the Shell
•        Setting Properties for the New Shell Session
•        Setting Alternative Hive Execution Engines
•        The Beeline Shell
•        Connecting to the Hive Server in Beeline
•        Beeline Command Switches
•        Beeline Internal Commands

Chapter 11. HIVE DDL

•        Hive Data Definition Language
•        Creating Databases in Hive
•        Using Databases
•        Creating Tables in Hive
•        Supported Data Type Categories
•        Common Numeric Types
•        String and Date / Time Types
•        Miscellaneous Types
•        Example of the CREATE TABLE Statement
•        Working with Complex Types
•        Table Partitioning
•        Table Partitioning
•        Table Partitioning on Multiple Columns
•        Viewing Table Partitions
•        Row Format
•        Data Serializers / Deserializers
•        File Format Storage
•        File Compression
•        More on File Formats
•        The EXTERNAL DDL Parameter
•        Example of Using EXTERNAL
•        Creating an Empty Table
•        Dropping a Table
•        Table / Partition(s) Truncation
•        Alter Table/Partition/Column
•        Views
•        Create View Statement
•        Why Use Views?
•        Restricting Amount of Viewable Data
•        Examples of Restricting Amount of Viewable Data
•        Creating and Dropping Indexes
•        Describing Data

Chapter 13. HIVE DML

•        Hive Data Manipulation Language (DML)
•        Using the LOAD DATA statement
•        Example of Loading Data into a Hive Table
•        Loading Data with the INSERT Statement
•        Appending and Replacing Data with the INSERT Statement
•        Examples of Using the INSERT Statement
•        Multi Table Inserts
•        Multi Table Inserts Syntax
•        Multi Table Inserts Example

Chapter 14. Amazon Athena

•        What is Amazon Athena?
•        Athena in context
•        Athena Policy
•        Athena Data Sources
•        Connectivity
•        Getting started with Athena

Chapter 15. High Performance File System Formats

•        Why file systems for Advanced Analytics?
•        Columnar Data Storages
•        Introduction to ORC
•        Introduction to Parquet
•        Creating ORC and Parquet from CSV with Hive
•        Converting Text to ORC Data Format

Chapter 16. Introduction to Monitoring in AWS

•        Evolution of monitoring in AWS Cloud
•        What is Cloudwatch?
•        What is Cloudtrail?
•        What is AWS Config?
•        Event-driven models
•        Notifications driving events
•        Serverless computing
•        Introduction to Lamba

Lab Exercises

    Lab 1. Learning the AWS Management Console
    Lab 2. Managing Keys for Secure Connection
    Lab 3. Using S3 Through Management Console
    Lab 4. Managing IAM Users
    Lab 5. Getting Started with the EC2 Service
    Lab 6. Using AWS Lambda
    Lab 7. Using S3 and Aurora MySQL in AWS Lambda

 

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.

Learning Path
Ways to Attend
  • Attend a public course, if there is one available. Please check our schedule, or register your interest in joining a course in your area.
  • Private onsite Team training also available, please contact us to discuss. We can customise this course to suit your business requirements.

Technical ICT learning & mentoring services

Private Team Training

Our instructors are specialist consultants with vast real world experience and expertise allowing them to design and deliver client-focused courses for your organisation.

Learn more about our Private Team Training

What Our Clients Say

"Absolutely fantastic training. Thoroughly enjoyed it thanks to our highly enthusiastic tutor.  It wouldn't be an understatement to say that it was the best professional training that I have ever received."

 

Customised Linux with Networking

Live Online -  February 2022

 

"The course content was very good. When needed, the Instructor was extending the content of the course with hints and tips to help us understand different topics that were covered in the course."

 

Kubernetes Administration Certification - GTLFK

Live Online June 2021

 

 

 

“The course was held at the highest possible standards, the instructor was excellent, well prepared, well informed, and clearly an SME. Top marks.”

 

Professional Cloud Service Manager - GTC13

Live Online December 2021

 

“Very engaging and practical course so hope to be able to put the learning into practice.”

 

Being Agile in Business - GTBAB

Live Online September 2021

 

“Great instructor, who encouraged active participation. The breakout groups and exercises kept the group engaged and the content relevant to our own products”.

 

Site Reliability Engineering Foundation - GTDSRE

Live Online January 2022

 

 

 

"Intelligence is the ability to avoid doing work, yet
getting the work done"

Linus Torvalds, creator of Linux and GIT

Technical ICT learning & mentoring services

About GuruTeam

GuruTeam is a high-level ICT Learning, Mentoring and Consultancy services company. We specialise in delivering instructor-led on and off-site training in Blockchain, Linux, Cloud, Big Data, DevOps, Kubernetes, Agile, Software & Web Development technologies. View our Testimonials

Download our eBrochure
Our Accreditation Partners
  •  
  •  
  •  

 

Upcoming Courses

Kubernetes Administration

11th - 14th March 2024

26th - 29th March 2024

Live Online

 

This Kubernetes Administration Certification training course is suitable for anyone who wants to learn the skills necessary to build and administer a Kubernetes cluster

Learn More

RUST

11th - 14th March 2024

26th - 29th March 2024

 Live Online

This course will help you understand what Rust applications look like, how to write Rust applications properly, and how to get the most out of the language and its libraries.

Learn More

Introduction to Python 3 

19th - 21st March 2024

9th - 11th April 2024

7th - 9th May 2024

   4th - 6th June 2024

 

Live Online

This Introduction to Python 3 training course is designed for anyone who needs to learn how to write programs in Python or support/modify existing programs.

 

Learn More

 GO LANG TRAINING

11th - 14th March 2024

26th - 29th March 2024

 

Live Online        

 

This Go language programming training course will help you understand how Go works, and immediately be more productive. If you are building a team using Go, this will be a great opportunity to get your team on the same page and speaking the same language. Innovative lab exercises and code samples are provided to reinforce skills and quickly master the topics.

Learn More

Newsletter

Stay up to date, receive updates on scheduled dates, new courses, offers, and events.

Subscribe to our Newsletter