Data Science Bootcamp Curriculum
Curriculum Highlights
Python for Data Science
EDL, ETL
Machine Learning
NLP, LLMs,
Computer Vision
Curriculum Overview
Python Basics
- Introduction to Python: history, features, and advantages
- Expressions and operators: arithmetic, assignment, comparison, and logical
- Understanding type() function and type inference
- Introduction to data structures: lists, tuples, and dictionaries
- Working with arithmetic operators: addition, subtraction, multiplication, division, modulus, and exponentiation
- Using comparison operators: equal to, not equal to, greater than, less than, etc.
- Logical operators: and, or, and not
- Exploring advanced data types: sets and strings manipulation.
Expressions, Conditional Statements & For Loop
- Evaluating expressions: operator precedence and associativity
- Introduction to conditional statements: if, elif, and else
- Executing code based on conditionals.
- Understanding the flow of control in conditional statements
- Iteration using the for loop: range(), iteration over lists, and strings.
While loop, Break and Continue Statements, and Nested Loops
- Working with while loop: syntax, conditions, and examples
- Combining loops and conditionals
- Using the break statement to exit loops prematurely.
- Utilizing the continue statement to skip iterations.
- Implementing nested loops for complex iterations
Functions
- Introduction to functions: purpose, advantages, and best practices
- Defining and calling user-defined functions
- Parameters and arguments: positional, keyword, and default values
- Return statement and function output.
- Variable scope and lifetime
- Function documentation and code readability
Exception Handling and File Handling
- Understanding exceptions: errors, exceptions, and exception hierarchy
- Handling exceptions using try-except blocks: handling specific exceptions, multiple exceptions, and else and finally clauses.
- Raising exceptions and creating custom exception classes
- File handling in Python: opening, reading, writing, and closing files.
- Working with different file modes and file objects
Mathematical Concepts
- Linear Algebra:
- Vectors, matrices, operations, and their applications in data science.
- Introduction to numpy for matrix operations.
- Introduction to numpy:
- Why numpy is the preferred library for numerical operations in Python.
- Basic numpy operations: creating arrays, array indexing, array slicing.
- Matrix operations using numpy: matrix multiplication, finding determinants, solving linear equations.
Probability and Statistics
- Probability Theory:
- Basic probability, rules, and distributions (normal, binomial, Poisson).
- Random variables, expectations, and variance.
- Descriptive Statistics:
- Measures of central tendency (mean, median, mode).
- Measures of variability (range, variance, standard deviation)
- Introduction to pandas and scipy
- Using pandas for data manipulation and summary statistics.
- scipy for performing hypothesis tests and building confidence intervals.
- Visualization with seaborn and matplotlib to understand data distributions and relationships.
Probability and Statistics II
- Inferential Statistics:
- Sampling distributions and the Central Limit Theorem.
- Confidence intervals and hypothesis testing.
- Correlation and Regression:
- Scatter plots, correlation coefficients.
- Simple linear regression analysis.
- Introduction to Seaborn and Matplotlib
- Visualization of correlation matrices and scatter plots to identify relationships between variables.
- Advanced plotting techniques such as pair plots, heatmaps, and regression plots to extract deeper insights from data.
Introduction and Missing Value Analysis
- Introduction to Exploratory Data Analysis (EDA)
- Importance of EDA in data analysis
- Steps involved in EDA
- Handling missing values: identification, analysis, and treatment strategies • Imputation techniques for missing values
Data Consistency, Binning, and Outlier Analysis
- Data consistency checks using fuzzy logic
- Binning and discretization techniques for continuous variables
- Outlier detection and analysis methods
- Handling outliers: techniques for treatment or removal
Feature Selection and Data Wrangling
- Importance of feature selection in EDA
- Feature selection techniques: filter methods, wrapper methods, and embedded methods
- Data wrangling: cleaning and transforming data for analysis
- Handling categorical variables: encoding techniques
Inference, Hypothesis Testing, and Visualization
- Inference and hypothesis testing in EDA
- Common statistical tests: t-test, chi-square test, ANOVA, etc.
- Visualization techniques for EDA: histograms, box plots, scatter plots, etc.
- Hands-on practical session for complete EDA using a dataset
Module : Machine Learning and Deep Learning
Overview of Machine Learning
- Supervised, unsupervised, and reinforcement learning.
- Machine learning workflow from data collection to model deployment.
- Introduction to Python libraries essential for ML (Scikit-learn, TensorFlow, PyTorch)
Supervised learning and Linear Regression
- Linear Regression: Concept, loss function, evaluation metrics (MSE, R²)
- Building a Linear Regression model in scikit-learn
- Visualizing predictions vs actuals
Machine Learning Performance Metrics and Naive Bayes
- Evaluation metrics for classification problems: accuracy, precision, recall, F1 score, etc.
- Introduction to Naive Bayes algorithm and its applications
- Implementing Naive Bayes for classification tasks
Logistic Regression, SVM, Decision Trees, and Random Forests
- Logistic Regression: theory, interpretation, and applications
- Support Vector Machines (SVM): concepts, kernels, and use cases
- Decision Trees: construction, pruning, and interpretability
- Random Forests: ensemble learning and feature importance
- Bagging and Boosting: techniques for improving model performance
Clustering Introduction, Partitioning Algorithms, and Cluster Evaluation
- Introduction to clustering: unsupervised learning technique
- Partitioning algorithms: K-means, K-medoids
- Hierarchical clustering: agglomerative and divisive approaches
- Density-based clustering: DBSCAN, OPTICS
- Cluster evaluation metrics: silhouette coefficient, Davies-Bouldin index
Regression and Evaluation of Regression Methods
- Introduction to regression analysis
- Linear regression: assumptions, interpretation, and model evaluation • Evaluation metrics for regression: mean squared error, R-squared, etc.
- Other regression methods: polynomial regression, ridge regression, lasso regression
Introduction to Deep Learning
- Overview of deep learning, its importance in computer vision, key concepts, and architectures.
- Code along session for building Deep Neural Network from scratch
Deep Learning Hyperparameter Tuning
- Strategies for optimizing hyperparameters like learning rate, batch size, and regularization to improve model performance.
Conceptual Overview
- Understand the distinct roles and responsibilities of data scientists and data engineers within projects.
Data Sourcing with Python
- Extracting data from sources like PostgreSQL, MongoDB, APIs, web scraping, and IoT devices.
Databricks Tour
- Get introduced to Databricks as a platform for big data processing and machine learning.
Big Data Handling for Data Science
- Process large-scale datasets using Apache Spark within a distributed environment.
MLflow
- Learn to track experiments and ensure reproducibility in machine learning workflows
Introduction to Convolutional Neural Networks (CNNs)
- Explanation of CNNs, their architecture, and their role in image processing.
- Codel along session on Convolutional Neural Networks
Building Custom Image Classification Models
- Step-by-step guide to creating and training a custom image classifier using a CNN.
Transfer Learning and Introduction to Object Detection
- Introduction to transfer learning, its applications, and an overview of object detection techniques.
- Hands on with YOLO Object Detection
- Practical session on using the YOLO (You Only Look Once) algorithm for object detection.
Custom Training YOLO model
- Detailed guidance on training a YOLO model with a custom dataset for specific object detection tasks.
Using State-of-the-Art Models for Real-World Applications
- Exploring and implementing advanced models in computer vision for practical use cases.
Introduction to OpenCV
- Introduction to OpenCV, its libraries, and its importance in computer vision tasks.
Image Pre-processing and Pre-build Algorithms in OpenCV
- Hands-on session on image pre-processing techniques and using built-in algorithms in OpenCV.
Advance guided project with OpenCV
- Capstone project where Students apply learned techniques in a guided project using OpenCV.
Introduction to NLP and Text Normalization
- Overview of Natural Language Processing (NLP)
- Techniques for text normalization: lowercasing, punctuation removal, etc.
- Basics of tokenization and stopword removal
Text Representation and Tokenization
- Introduction to vectors in NLP: Bag of Words and Count Vectorizer
- Basics of tokenization and stopword removal
Stemming, Lemmatization, and N-gram Language Models
- Understanding and applying stemming and lemmatization
- Introduction to N-gram language models
- Introduction to vectors in NLP: Bag of Words, Count Vectorizer, and TF-IDF
Markov Models and Language Model Evaluation
- Basics of Markov models in NLP
- Overview of Text Classification
- Techniques for evaluating language models: probability smoothing and performance metrics
- Introduction to Naive Bayes and Sentiment Classification
Advanced Classifiers and Vector Semantics
- Generative vs. discriminative classifiers
- Understanding vector semantics and embeddings
- Introduction to neural word embeddings: Word2Vec and GloVe
Deep Learning for NLP and Sequence Models
- Overview of deep learning techniques for NLP
- Applying supervised and unsupervised learning methods to NLP tasks
- Exploring sequence of words in NLP tasks
Transformers and Large Language Models
- Understanding the architecture and mechanisms of transformers
- Overview of large language models (LLMs) and their capabilities
- Hands-on: Fine-tuning transformers and LLMs for NLP applications
Training and Fine-Tuning LLMs
- Techniques for training large language models
- Fine-tuning pre-trained LLMs for specific tasks
Multimodal Integration
- Exploring multimodal models and their integration
- Working with open-source models for diverse applications
- Utilizing cloud APIs for deploying and integrating multimodal solutions
LLM Agents
- Introduction to LLM agents
- Use cases and applications of LLM agents
- Techniques for designing and deploying LLM agents for various tasks
Introduction to Generative AI and Multimodal Models
- Overview of Generative AI and its impact on various industries.
- Understanding multimodal models and how they differ from single-modality models.
- Exploring the OpenAI API: Capabilities and applications for text, speech, and image-based AI.
Speech Synthesis and Conversational Agents
- Objective: Dive into text-to-speech (TTS) and speech-to-text (STT) models and explore conversational AI with agent frameworks.
- Overview of text and voice synthesis technologies.
- Introduction to Eleven Labs and Heygen: Creating high-quality audio and speech synthesis.
- Using OpenAI API for speech synthesis and understanding its capabilities
Advanced Agent Development with LangChain
- Basics of LangChain and its role in agent-based AI.
- Building, customizing, and deploying agents using LangChain.
- Exploring integration of LangChain with other APIs for multimodal interactions, including OpenAI API for image and text generation.
Building Complex Agent-Based Applications with CrewaI, Synthesia, and OpenAI
- Introduction to CrewaI: Building intelligent agents for various tasks.
- Synthesia for video and visual content generation in AI-driven applications.
- Using OpenAI API to generate multimodal content (text, images, and speech).
- Building a final project: Integrate multiple APIs to create a multimodal generative application.
Introduction to MLOps and AI/NLP Fundamentals
- Overview of MLOps and its importance in the AI lifecycle
- Current trends in AI
- Setting up the development environment
Deep Dive into Machine Learning Models for NLP
- Understanding NLP models (llama2, GPT, Mistral, etc.)
- Introduction to Hugging Face Transformers and Datasets
- Hands-on: Building a simple NLP model with Hugging Face
Introduction to FastAPI for ML Model Deployment
- Basics of API development with FastAPI
- Deploying a simple ML model with FastAPI
- Hands-on: Creating your first ML API with FastAPI
Advanced FastAPI Features for Production-Ready APIs
- Authentication and authorization in FastAPI
- Hands-on: Enhancing your ML API with advanced features
Docker Image Creation and Application Containerization
- Introduction to Docker and its role in application deployment
- Hands-on: Creating Docker images for NLP applications
- Converting existing applications into Docker containers for scalable deployment
LangGraph and Advanced Model Integration
- Introduction to LangGraph for building graph-based NLP Applications
- Exploring advanced integration scenarios with LangGraph and LangChain
- Hands-on: Designing graph-based workflows for NLP tasks
Deploying ML Models on Google Cloud
- Overview of Google Cloud Platform (GCP) for ML
- Introduction to Google Cloud Run
- Hands-on: Deploying your Dockerized FastAPI application on GCP with LangSmith monitoring