Get the program brochure
Check out the program and fee details in our brochure
PGP-Data Science & Analytics
Kickstart your career in Data Science | Learn In-demand Tools & Languages
Application closes 1st Jul 2024
- Program Overview
- Curriculum
- Certificate
- Tools
- Success Stories
- Faculty
- Career Support
- Fees
Key highlights of the Data science course
-
Industry-ready curriculum
-
10 years of excellence
-
200+ successful batches
-
1:1 mentorship
-
Dedicated Career Support
-
150+ hours of learning content
-
Certificate from Great Lakes Executive Learning
Skills you will learn
- Python
- Data Mining
- Tableau
- Machine Learning
- SQL
- ChatGPT
Our alumni work at top companies
Curriculum
Unit 1
Data Science Foundations
Introduction to Data Science and AI (Self-Paced)
Gain an understanding of the evolution of Data Science over time, their application in industries, the mathematics and statistics behind them, and an overview of the life cycle of building data driven solutions
- The fascinating history of Data Science
- Transforming Industries through Data Science
- The Math and Stats underlying the technology
- Navigating the Data Science Lifecycle
Python for Data Science - 4 Weeks
Python Programming:
Python is a widely used, high-level, interpreted programming language, having a simple, easy-to-learn syntax that highlights code readability. This module will cover the fundamentals of Python programming and taking the first steps in organizing data with Python.
- Variables and Datatypes
- Data Structures
- Conditional and Looping Statements
- Functions
Python for Data Science:
NumPy is a Python package for mathematical and scientific computing and involves working with arrays and matrices. Pandas is a fast, powerful, flexible, and simple-to-use open-source library in Python to manipulate and analyse data. This module will cover these important libraries and provide a deep understanding of how to use them to explore data.
- NumPy arrays and functions
- Accessing and modifying NumPy arrays
- Saving and loading NumPy arrays
- Pandas Series (Creating, Accessing, and Modifying Series)
- Pandas DataFrames (Creating, Accessing, Modifying, and Combining DataFrames)
- Pandas Functions
- Saving and loading datasets using Pandas
Python for Visualization:
Matplotlib is a library to create statically animated, interactive visualisations, whereas Seaborn is a Matplotlib based data visualisation library in Python.
This module will give you a deep understanding of exploring data sets using Matplotlib and Seaborn
- Histogram, Boxplots and Bar graphs
- Line Plot, Scatterplot, and Lmplot
- Jointplot, Violin Plot, and striplot
- Swarm, catplot, and pairplots
- Heatmaps, Plotly, and Customizing of Plots
Exploratory Data Analysis (Deep Dive)
Exploratory Data Analysis, or EDA, is a process of examining and visualizing data to uncover patterns and extract meaningful insights from it and facilitates storytelling. This module provides a deep insight on how to conduct EDA using Python and utilize the insights extracted to drive business decisions.
- Data overview
- Univariate analysis
- Bivariate/Multivariate analysis
- Missing value treatment
Introduction to SQL - 4 Weeks
Querying Data With SQL
SQL is a widely used querying language for efficiently managing and manipulating relational databases. This module provides an essential foundation for understanding and working with relational databases. Participants will explore the fundamentals of setting up MySQL, including installation and configuration, gain insight into the principles of database management and Structured Query Language (SQL), and learn how to fetch and filter data using SQL queries, enabling them to extract valuable insights from large datasets efficiently.
- Getting set up with MySQL
- Introduction to DB and SQL
- Fetching data in SQL
- Filtering data in SQL
SQL In-Built Functions
SQL offers a wide range of numeric, string, and date functions, gaining proficiency in leveraging these functions to perform advanced calculations, string manipulations, and date operations. This module provides a comprehensive exploration of the various functions available within SQL for data manipulation and analysis. Additionally, participants will discover the significance of aggregating data using SQL functions, enabling them to summarize and analyze large datasets effectively.
- Numeric Functions in SQL
- String Functions in SQL
- Date Functions in SQL
- Aggregating data in SQL
Advanced Querying
SQL joins are used to combine data from multiple tables effectively and window functions enable performing complex analytical tasks such as ranking, partitioning, and aggregating data within specified windows. Subqueries allow one to nest queries within other queries. This module will equip participants with advanced techniques for querying and analyzing relational databases to extract and manipulate data dynamically.
- Joins in SQL
- Window functions in SQL
- Subqueries
Unit 2
Data Science Techniques
Inferential Statistics - 4 Weeks
Inferential Statistics Foundations
Inferential statistics is pivotal in statistical analysis and decision-making and involves drawing conclusions about populations based on samples. This module will introduce learners to the common probability distributions and how they are used to make statistically-sound, data-driven decisions.
- Experiments, Events, and Definition of Probability
- Introduction to Inferential Statistics
- Introduction to Probability Distributions (Random Variable, Discrete and Continuous Random Variables, Probability Distributions)
- Binomial Distribution
- Normal Distribution
- z-score
Estimation and Hypothesis Testing
Estimation involves determining likely values for population parameters from sample data, while hypothesis testing provides a framework for drawing conclusions from sample data to the broader population. This module covers the important concepts of central limit theorem and estimation theory that as vital for statistical analysis, and the framework for conducting hypothesis tests.
- Sampling
- Central Limit Theorem
- Estimation
- Introduction to Hypothesis Testing (Null and Alternative hypothesis, Typ-I and Type-II errors, alpha, critical region, p-value)
- Hypothesis Formulation and Performing a Hypothesis Test
- One-tailed and Two-tailed Tests
- Confidence Intervals and Hypothesis Testing
Common Statistical Tests
Hypothesis tests assess the validity of a claim or hypothesis about a population parameter through statistical analysis. This module introduces learners to the most commonly used hypothesis tests used in the world of Data Science and how to choose the right test for a given business claim depending on the associated context.
- Common Statistical Tests
- Test for one mean
- Test for equality of means (known standard deviation)
- Test for equality of means (Equal and unknown std dev)
- Test for equality of means (Unequal and unknown std dev)
- Test of independence
- One-way ANOVA
Predictive Modeling - 5 Weeks
Intro to Supervised Learning - Linear Regression
Machine Learning (ML), a subset of Artificial Intelligence (AI), which focuses on developing algorithms capable of learning patterns in data and making predictions without being explicitly programmed to do so. Linear Regression is one of the most popular supervised ML algorithms that identifies the degree of linear relationship in data. This module introduces participants to ML and explores how linear regression can be used for predictive analysis.
- Introduction to learning from data
- Simple and Multiple Linear Regression
- Evaluating a regression model
- Pros and Cons of Linear Regression
Linear Regression Assumptions and Statistical Inference
The linear regression algorithm has a set of assumptions that need to be satisfied for the model to be statstically validated and to be able to draw inferences from it. This module walks participants through these assumptions, how to check them, what to do in case they are violated, and the statistical inferences that can be drawn based on the model's output.
- Statistician vs ML Practitioner
- Linear Regression Assumptions
- Statistical Inferences from a Linear Regression Model
Machine Learning-1 - 3 Weeks
-
Logistic Regression
Logistic regression is a statistical modeling technique primarily used for modeling the probability of binary outcomes and it finds applications in various fields such as medicine, finance, and manufacturing. This module covers the theory behind the logistic regression model, how to asses its performance, and how to draw statistical inferences from it.
- Introduction to Logistic Regression
- Interpretation from a Logistic Regression model
- Changing the threshold of a Logistic Regression model
- Evaluation of a classification model
- Pros and Cons
-
Naive-Bayes, KNN
Bayes' Rule is an important topic in probabilistic reasoning and decision-making. Distance metrics offer a handy way of measuring similarity between data points. This module provides participants with a comprehensive understanding of the Bayes Rule and Naive Bayes algorithm, its assumptions, different distance metrics, the K-Nearest Neighbors (KNN) algorithm, and its practical applications in classification and regression tasks.
- Bayes Rule
- Naive Bayes Algorithm
- Distance Metrics
- KNN Algorithm
-
Decision Tree
Decision trees are supervised ML algorithms that utilize a hierarchical structure for decision making and can be used for both classification and regression problems. This module dives into how a decison tree can be used to model complex, non-linear data and how to improve the performance of decision trees using pruning techniques.
- Introduction to Decision Tree
- How a Decision Tree is built
- Methods of pruning a Decision Tree
- Different impurity measures
- Regression Trees
- Pros and Cons
Machine Learning-2 - 4 Weeks
Bagging and Random Forest
Random forest is a popular ensemble learning technique that comprises of several decision trees, each using a subset of the data to understand patterns. The outputs of each tree are then aggregated to provide predictive performance. This module will explore how to train a random forest model to solve complex business problems.
- Introduction to Ensemble Techniques
- Bagging
- Random Forests
Boosting:
Boosting models are robust ensemble models that comprise of several sub-models, each of which are developed in a sequential manner to improve upon the errors made by the previous one. This modules will cover essential boosting algorithms like Adaboost and XGBoost that are widely used in the industry for accurate and robust predictions.
- Introduction to Boosting
- Bagging VS Boosting
- Different boosting techniques - AdaBoost, Gradient Boosting, XGBoost
- Stacking
Model Tuning
Model tuning is a crucial step in developing ML models and focuses on improving the performance of a model using different techniques like feature engineering, imbalance handling, regularization, and hyperparameter tuning to tweak the data and the model. This module covers the different techniques to tune the performance of an ML model to make it robust and generalized.
- K-fold cross validation
- Oversampling and Undersampling
- Regularization
- Data Leakage
- Hyperparameter Tuning
- GridSearchCV and RandomizedSearchCV
Unsupervised Learning
K-Means Clustering
K-means clustering is a popular unsupervised ML algorithm that is used for identifying patterns in unlabeled data and grouping it. This module dive into the working of the algorithm and the important points to keep in mind when implementing it in practical scenarios.
- Introduction to Clustering
- Types of Clustering
- K-means Clustering
- Importance of Scaling
- Silhouette Score
- Visual Analysis of Clustering
Hierarchical Clustering and PCA
Hierarchical clustering organizes data into a tree-like structure of nested clusters, while dimensionality reduction techniques are used to transform data into a lower-dimensional space while retaining the most important information in it. This module covers the business applications of hierarchical clustering and how to reduce the dimension of data using PCA to aid in visualization and feature selection of multivariate datasets.
- Hierarchical Clustering
- Cophenetic Correlation
- Introduction to Dimensionality Reduction
- Principal Component Analysis
Unit 3
Visualization and insights
Data Visualization using Tableau (Self-Paced)
- Introduction to Data Visualization
- Introduction to Tableau
- Basic Charts and Dashboards
- Descriptive Statistics, Dimensions and Measures
- Visual Analytics
- Dashboard Design & Principles
- Advanced Design Components/Principles
- Special Chart Types
- Case Study: Hands-On using Tableau
- Integrate Tableau with Google Sheets
Unit 4
Capstone Project
You will get your hands dirty with a real-time project under industry experts’ guidance, this capstone project will last for 4 weeks where you will get to implement all your learnings from the Data Science foundations to Visualization and everything in between. Successful completion of the project will earn you a post-graduate certificate in data science and analytics.
Upskill from Great Lakes
Earn a PG certificate in Data Science & Analytics
Ranked among India's top 10 business schools, Great Lakes is highly regarded for its analytics programs. A certification from Great Lakes Executive Learning ensures industry credibility and acceptance, providing a robust foundation for your career advancement.
* Image for illustration only. Certificate subject to change.
-
Top Standalone Institution
By Outlook India
-
In One Year Programs
By Business World
-
Top B-Schools
By Business India
Industry relevant syllabus
Learn top in-demand tools
Delve deep into Data Science with our program, mastering significant skills and employing powerful tools to fortify digital defenses.
-
Python
-
Tableau
-
Knime
-
NumPy
-
SQL
-
Pandas
-
Seaborn
-
Matplotlib
-
Statsmodels
-
Scikit-Learn
Our faculty
Meet our expert faculty - professionals who are passionate about deep Data Science knowledge
Industry experts
Introducing our dedicated mentors and experienced industry insiders devoted to guiding learners on their Data Science career journey.
-
Satish Raghavendran
Vice President, Deloitte -
Manish Gupta
Senior Applied Scientist,Microsoft -
Sreevasan P S
Data Science Practitioner, AI/ML Mentor, Ex - Cognizant -
Balaji Sundararaman
Mentor - Data Science, ML, AI and Analytics at Great Learning -
Udayakumar Devaraj
Senior Data Scientist, WNS
Advanced Career Support
-
1:1 CAREER SESSIONS
Engage one-on-one with industry experts for valuable insights and guidance.
-
INTERVIEW PREPARATION
Gain insights into Recruiter Expectations.
-
RESUME & LINKEDIN PROFILE REVIEW
Showcase your Strengths Impressively
-
E-PORTFOLIO
Create a Professional Portfolio Demonstrating Skills and Expertise
Program Fees
Program Fees: 1,800 USD
Flexible payment options available
Benefits of learning from us
- 150+ hours of online content
- Personalised mentorship sessions
- Dedicated career support
- 8+ languages & tools
- Doubt-Solving with Expert Industry mentors
- Proactive Program Support
- Certificate of completion from Great Lakes
Application process
Our admissions close once the requisite number of participants enroll for the upcoming batch . Apply early to secure your seats.
-
1. Fill the application form
Apply by filling a simple online application form.
-
2. Interview Process
Go through a screening call with the Admission Director’s office.
-
3. Join program
An offer letter will be rolled out to the select few candidates. Secure your seat by paying the admission fee.
Post-Graduate Program in Data science & Analytics
Still have queries?
Contact Us
Download Brochure
Check out the program and fee details in our brochure
We are allocating a suitable domain expert to help you out with your queries. Expect to receive a call in the next 4 hours.