Data Science Bootcamp Curriculum

Curriculum Highlights

Curriculum Overview

Module: Python for Data Science

Python Basics

Introduction to Python: history, features, and advantages
Expressions and operators: arithmetic, assignment, comparison, and logical
Understanding type() function and type inference
Introduction to data structures: lists, tuples, and dictionaries

Working with arithmetic operators: addition, subtraction, multiplication, division, modulus, and exponentiation
Using comparison operators: equal to, not equal to, greater than, less than, etc.
Logical operators: and, or, and not
Exploring advanced data types: sets and strings manipulation.

Expressions, Conditional Statements & For Loop

Evaluating expressions: operator precedence and associativity
Introduction to conditional statements: if, elif, and else
Executing code based on conditionals.
Understanding the flow of control in conditional statements
Iteration using the for loop: range(), iteration over lists, and strings.

While loop, Break and Continue Statements, and Nested Loops

Working with while loop: syntax, conditions, and examples
Combining loops and conditionals
Using the break statement to exit loops prematurely.
Utilizing the continue statement to skip iterations.
Implementing nested loops for complex iterations

Functions

Introduction to functions: purpose, advantages, and best practices
Defining and calling user-defined functions
Parameters and arguments: positional, keyword, and default values
Return statement and function output.
Variable scope and lifetime
Function documentation and code readability

Exception Handling and File Handling

Understanding exceptions: errors, exceptions, and exception hierarchy
Handling exceptions using try-except blocks: handling specific exceptions, multiple exceptions, and else and finally clauses.
Raising exceptions and creating custom exception classes
File handling in Python: opening, reading, writing, and closing files.
Working with different file modes and file objects

Module: Mathematics

Mathematical Concepts

Linear Algebra:

- - Vectors, matrices, operations, and their applications in data science.
  - Introduction to numpy for matrix operations.

Introduction to numpy:

- Why numpy is the preferred library for numerical operations in Python.
- Basic numpy operations: creating arrays, array indexing, array slicing.
- Matrix operations using numpy: matrix multiplication, finding determinants, solving linear equations.

Probability and Statistics

Probability Theory:

- - Basic probability, rules, and distributions (normal, binomial, Poisson).
  - Random variables, expectations, and variance.

Descriptive Statistics:

- - Measures of central tendency (mean, median, mode).
  - Measures of variability (range, variance, standard deviation)

Introduction to pandas and scipy

- Using pandas for data manipulation and summary statistics.
- scipy for performing hypothesis tests and building confidence intervals.
- Visualization with seaborn and matplotlib to understand data distributions and relationships.

Probability and Statistics II

Inferential Statistics:
- Sampling distributions and the Central Limit Theorem.
- Confidence intervals and hypothesis testing.
Correlation and Regression:
- Scatter plots, correlation coefficients.
- Simple linear regression analysis.
Introduction to Seaborn and Matplotlib
- Visualization of correlation matrices and scatter plots to identify relationships between variables.
- Advanced plotting techniques such as pair plots, heatmaps, and regression plots to extract deeper insights from data.

Module: Exploratory Data Analysis (EDA) and Machine Learning

Introduction and Missing Value Analysis

Introduction to Exploratory Data Analysis (EDA)
Importance of EDA in data analysis
Steps involved in EDA
Handling missing values: identification, analysis, and treatment strategies • Imputation techniques for missing values

Data Consistency, Binning, and Outlier Analysis

Data consistency checks using fuzzy logic
Binning and discretization techniques for continuous variables
Outlier detection and analysis methods
Handling outliers: techniques for treatment or removal

Feature Selection and Data Wrangling

Importance of feature selection in EDA
Feature selection techniques: filter methods, wrapper methods, and embedded methods
Data wrangling: cleaning and transforming data for analysis
Handling categorical variables: encoding techniques

Inference, Hypothesis Testing, and Visualization

Inference and hypothesis testing in EDA
Common statistical tests: t-test, chi-square test, ANOVA, etc.
Visualization techniques for EDA: histograms, box plots, scatter plots, etc.
Hands-on practical session for complete EDA using a dataset

Module : Machine Learning and Deep Learning

Overview of Machine Learning

Supervised, unsupervised, and reinforcement learning.
Machine learning workflow from data collection to model deployment.
Introduction to Python libraries essential for ML (Scikit-learn, TensorFlow, PyTorch)

Supervised learning and Linear Regression

Linear Regression: Concept, loss function, evaluation metrics (MSE, R²)
Building a Linear Regression model in scikit-learn
Visualizing predictions vs actuals

Machine Learning Performance Metrics and Naive Bayes

Evaluation metrics for classification problems: accuracy, precision, recall, F1 score, etc.
Introduction to Naive Bayes algorithm and its applications
Implementing Naive Bayes for classification tasks

Logistic Regression, SVM, Decision Trees, and Random Forests

Logistic Regression: theory, interpretation, and applications
Support Vector Machines (SVM): concepts, kernels, and use cases
Decision Trees: construction, pruning, and interpretability
Random Forests: ensemble learning and feature importance
Bagging and Boosting: techniques for improving model performance

Clustering Introduction, Partitioning Algorithms, and Cluster Evaluation

Introduction to clustering: unsupervised learning technique
Partitioning algorithms: K-means, K-medoids
Hierarchical clustering: agglomerative and divisive approaches
Density-based clustering: DBSCAN, OPTICS
Cluster evaluation metrics: silhouette coefficient, Davies-Bouldin index

Regression and Evaluation of Regression Methods

Introduction to regression analysis
Linear regression: assumptions, interpretation, and model evaluation • Evaluation metrics for regression: mean squared error, R-squared, etc.
Other regression methods: polynomial regression, ridge regression, lasso regression

Introduction to Deep Learning

Overview of deep learning, its importance in computer vision, key concepts, and architectures.
Code along session for building Deep Neural Network from scratch

Deep Learning Hyperparameter Tuning

Strategies for optimizing hyperparameters like learning rate, batch size, and regularization to improve model performance.

Module: Data Warehousing and the ETL (Extract, Transform, Load)

Conceptual Overview

Understand the distinct roles and responsibilities of data scientists and data engineers within projects.

Data Sourcing with Python

Extracting data from sources like PostgreSQL, MongoDB, APIs, web scraping, and IoT devices.

Databricks Tour

Get introduced to Databricks as a platform for big data processing and machine learning.

Big Data Handling for Data Science

Process large-scale datasets using Apache Spark within a distributed environment.

MLflow

Learn to track experiments and ensure reproducibility in machine learning workflows

Module : Computer Vision

Introduction to Convolutional Neural Networks (CNNs)

Explanation of CNNs, their architecture, and their role in image processing.
Codel along session on Convolutional Neural Networks

Building Custom Image Classification Models

Step-by-step guide to creating and training a custom image classifier using a CNN.

Transfer Learning and Introduction to Object Detection

Introduction to transfer learning, its applications, and an overview of object detection techniques.
Hands on with YOLO Object Detection
Practical session on using the YOLO (You Only Look Once) algorithm for object detection.

Custom Training YOLO model

Detailed guidance on training a YOLO model with a custom dataset for specific object detection tasks.

Using State-of-the-Art Models for Real-World Applications

Exploring and implementing advanced models in computer vision for practical use cases.

Introduction to OpenCV

Introduction to OpenCV, its libraries, and its importance in computer vision tasks.

Image Pre-processing and Pre-build Algorithms in OpenCV

Hands-on session on image pre-processing techniques and using built-in algorithms in OpenCV.

Advance guided project with OpenCV

Capstone project where Students apply learned techniques in a guided project using OpenCV.

Module : Natural Language Processing

Introduction to NLP and Text Normalization

Overview of Natural Language Processing (NLP)
Techniques for text normalization: lowercasing, punctuation removal, etc.
Basics of tokenization and stopword removal

Text Representation and Tokenization

Introduction to vectors in NLP: Bag of Words and Count Vectorizer
Basics of tokenization and stopword removal

Stemming, Lemmatization, and N-gram Language Models

Understanding and applying stemming and lemmatization
Introduction to N-gram language models
Introduction to vectors in NLP: Bag of Words, Count Vectorizer, and TF-IDF

Markov Models and Language Model Evaluation

Basics of Markov models in NLP
Overview of Text Classification
Techniques for evaluating language models: probability smoothing and performance metrics
Introduction to Naive Bayes and Sentiment Classification

Advanced Classifiers and Vector Semantics

Generative vs. discriminative classifiers
Understanding vector semantics and embeddings
Introduction to neural word embeddings: Word2Vec and GloVe

Deep Learning for NLP and Sequence Models

Overview of deep learning techniques for NLP
Applying supervised and unsupervised learning methods to NLP tasks
Exploring sequence of words in NLP tasks

LLM, Multimodal and Agents

Transformers and Large Language Models

Understanding the architecture and mechanisms of transformers
Overview of large language models (LLMs) and their capabilities
Hands-on: Fine-tuning transformers and LLMs for NLP applications

Training and Fine-Tuning LLMs

Techniques for training large language models
Fine-tuning pre-trained LLMs for specific tasks

Multimodal Integration

Exploring multimodal models and their integration
Working with open-source models for diverse applications
Utilizing cloud APIs for deploying and integrating multimodal solutions

LLM Agents

Introduction to LLM agents
Use cases and applications of LLM agents
Techniques for designing and deploying LLM agents for various tasks

Multimodel Generative AI - Agents

Introduction to Generative AI and Multimodal Models

Overview of Generative AI and its impact on various industries.
Understanding multimodal models and how they differ from single-modality models.
Exploring the OpenAI API: Capabilities and applications for text, speech, and image-based AI.

Speech Synthesis and Conversational Agents

Objective: Dive into text-to-speech (TTS) and speech-to-text (STT) models and explore conversational AI with agent frameworks.
Overview of text and voice synthesis technologies.
Introduction to Eleven Labs and Heygen: Creating high-quality audio and speech synthesis.
Using OpenAI API for speech synthesis and understanding its capabilities

Advanced Agent Development with LangChain

Basics of LangChain and its role in agent-based AI.
Building, customizing, and deploying agents using LangChain.
Exploring integration of LangChain with other APIs for multimodal interactions, including OpenAI API for image and text generation.

Building Complex Agent-Based Applications with CrewaI, Synthesia, and OpenAI

Introduction to CrewaI: Building intelligent agents for various tasks.
Synthesia for video and visual content generation in AI-driven applications.
Using OpenAI API to generate multimodal content (text, images, and speech).
Building a final project: Integrate multiple APIs to create a multimodal generative application.

Module : MLOps and Deployment

Introduction to MLOps and AI/NLP Fundamentals

Overview of MLOps and its importance in the AI lifecycle
Current trends in AI
Setting up the development environment

Deep Dive into Machine Learning Models for NLP

Understanding NLP models (llama2, GPT, Mistral, etc.)
Introduction to Hugging Face Transformers and Datasets
Hands-on: Building a simple NLP model with Hugging Face

Introduction to FastAPI for ML Model Deployment

Basics of API development with FastAPI
Deploying a simple ML model with FastAPI
Hands-on: Creating your first ML API with FastAPI

Advanced FastAPI Features for Production-Ready APIs

Authentication and authorization in FastAPI
Hands-on: Enhancing your ML API with advanced features

Docker Image Creation and Application Containerization

Introduction to Docker and its role in application deployment
Hands-on: Creating Docker images for NLP applications
Converting existing applications into Docker containers for scalable deployment

LangGraph and Advanced Model Integration

Introduction to LangGraph for building graph-based NLP Applications
Exploring advanced integration scenarios with LangGraph and LangChain
Hands-on: Designing graph-based workflows for NLP tasks

Deploying ML Models on Google Cloud

Overview of Google Cloud Platform (GCP) for ML
Introduction to Google Cloud Run
Hands-on: Deploying your Dockerized FastAPI application on GCP with LangSmith monitoring

Back to Bootcamp

Data Science Bootcamp Curriculum

Curriculum Highlights

Python for Data Science

EDL, ETL

Machine Learning

NLP, LLMs,
Computer Vision

Curriculum Overview

Mathematical Concepts

Probability and Statistics

Probability and Statistics II

Data Science Bootcamp Curriculum

Curriculum Highlights

Python for Data Science

EDL, ETL

Machine Learning

NLP, LLMs, Computer Vision

Curriculum Overview

Mathematical Concepts

Probability and Statistics

Probability and Statistics II

NLP, LLMs,
Computer Vision