This course will enable you to understand the issues and benefits of Big Data as well as the technologies to implement it. You'll learn how to integrate massive volumes of structured and unstructured data via an ETL, then to analyze them using statistical models and dynamic dashboards.
Training at your location, our location or remotely
Ref. BID
5d - 35h
Would you like to transpose this course—without changes—for your company?
A la carte training
Do you want a training course tailored to the needs of your company and its teams? Your training will be built to measure by our experts!
This course will enable you to understand the issues and benefits of Big Data as well as the technologies to implement it. You'll learn how to integrate massive volumes of structured and unstructured data via an ETL, then to analyze them using statistical models and dynamic dashboards.
Teaching objectives
At the end of the training, the participant will be able to:
Understand the concepts and benefits of Big Data with respect to business challenges
Understand the technological ecosystem needed to carry out a Big Data project
Acquire the technical skills to manage massive, unstructured, complex data flows
Implement statistical analysis models to address business needs
Learn about a data visualization tool for reporting dynamic analyses
Intended audience
Dataminers, statistical researchers, developers, project managers, business intelligence consultants.
Prerequisites
Basic knowledge of relational models, statistics, and programming languages. Basic knowledge of Business Intelligence concepts.
Course schedule
Understanding the concepts and challenges of Big Data
Origins and definition of Big Data.
Key figures in the international and French markets.
The challenges of Big Data: ROI, organization, data privacy.
An example of Big Data architecture.
Big Data technologies
Description of the architecture and components of the Hadoop platform.
Storage methods (NoSQL, HDFS).
Operating principles of MapReduce, Spark, Storm, etc.
Most popular distributions on the market (Hortonworks, Cloudera, MapR, Elastic Map Reduce, Biginsights).
Installing a Hadoop platform.
Technologies for the data scientist.
Exercise
Exercise
Installing a Hadoop Big Data platform (via Cloudera Quickstart or other software).
Operating principles of the Hadoop Distributed File System (HDFS).
Importing outside data into HDFS.
Creating SQL requests with HIVE.
Using PIG to process the data.
Using an ETL to industrialize the creation of massive data flows.
Overview of Talend For Big Data.
Exercise
Operating principles of the Hadoop Distributed File System (HDFS).
Importing outside data into HDFS.
Creating SQL requests with HIVE.
Using PIG to process the data.
The principle of ETL (Talend, etc.).
Managing massive data streaming (NIFI, Kafka, Spark, Storm, etc.)
Exercise
Implementing massive data flows
Big Data Analytics techniques and methods
Machine Learning: A component of artificial intelligence.
Discovering the three families: Regression, Classification, and Clustering.
Data preparation, feature engineering.
Generating models in R or Python.
Ensemble Learning.
Exercise
Exercise
Setting up analyses with the tools studied.
Takeaways.
Summary of best practices.
Bibliography.
Practical details
Hands-on work
Set up a Hadoop platform and its basic components, use an ETL to manage the data, create analysis modules and dashboards.
Customer reviews
4 / 5
Customer reviews are based on end-of-course evaluations. The score is calculated from all evaluations within the past year. Only reviews with a textual comment are displayed.
Dates and locations
Select your location or opt for the remote class then choose your date.
Remote class
No session at the moment, we invite you to consult the schedule of distance classes.