Data Science Bootcamp Curriculum
Curriculum Highlights
Basics of Data Science
Python for AI
Machine Learning
NLP, LLMs,
Computer Vision
Curriculum Overview
Basic Data and Statistical Concepts
- Data Literacy
- Statistical Foundations
- Descriptive Statistics
- Statistical Inference and Sampling Techniques
- Regression Analysis
Data Cleaning, Preparation, and Management
- Data relationships, data shapes, and primary/unique keys and identifiers
- Basic data cleaning
- Data extraction from external sources
- Sorting, Filtering, and Merging Data
- Common excel formulas and functions
Further Concepts in Data Cleaning
- Excel Functions for Data Cleaning (E.g. Left, Right, Concatenate, etc.)
- Conditional Formatting
- Data Validation and Error Checking
Basic Data Processing
- IF and Nested IF statements
- Absolute and Relative cell references
Data Analysis with Intermediate Excel
- Processing large datasets
- Find, Find & Replace
- Advanced Commands (VLookUp, HLookUp, Index, Match)
Introduction to Pivot Tables
- Navigating Pivot Tables Interface
- Creating basic Pivot Tables
- Field Settings and layout options
- Working with pivot table data
Advance Data Analysis with Pivot Tables
- Advanced pivot table functions
- Data slicing
- Pivot charts and visualizations
- Advanced Pivot table visualizations
Further Data Visualization on Excel
- Intermediate and Advanced Excel Charting Techniques
- Excel Add-ins for advanced visualization
- Data Analysis and Visualization on Google Sheets
Overview
- Introduction to Data Warehousing
- Definition and purpose
- Key concepts: OLTP vs. OLAP
- What is ETL?
- Overview of Microsoft Tools for Data Warehousing
- Introduction to Excel, SQL Server, SSIS, and SSRS
Data Warehousing Terminology and Concepts
- Understanding Data Warehousing Terminology
- Fact tables, dimension tables, star and snowflake schemas
- Exploring the differences between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing)
- Use cases and examples
Installing and Setting Up SQL Server and SSMS
- Installation of Microsoft SQL Server and SQL Server Management Studio (SSMS)
- Guided installation process
- Overview of SSMS interface and features
- Introduction to SQL Server databases
- Creating and managing databases in SSMS
Introduction to Microsoft Business Intelligence Tools
- Installing and using SSIS (SQL Server Integration Services) and SSRS (SQL Server Reporting Services)
- Overview of SSIS and SSRS interfaces
- Exploring the capabilities of each tool
- Basic concepts of ETL (Extract, Transform, Load) with SSIS
Data Extraction and Transformation with SSIS
- Introduction to different flat file formats (CSV, TXT, Excel)
- Data extraction using SSIS
- Connecting to various data sources
- Extracting data from flat files and databases
- Data cleaning and transformation using SSIS
- Applying basic transformations (Data Conversion, Conditional Split, etc.)
- Handling common data quality issues
Data Preparation with Microsoft Excel
- Cleaning CSV files using Microsoft Excel
- Removing duplicates, handling missing values, data formatting
- Preparing data for loading into SQL Server
- Validating data and exporting it for SSIS processes
Building ETL Workflows with SSIS
- Creating efficient ETL workflows with SSIS
- Designing ETL packages
- Avoiding common errors in ETL processes
- Best practices for ETL process optimization
- Error handling, logging, and performance tuning
SQL-Based Data Analysis and Reporting with SSRS
- Introduction to SSRS and report creation
- Overview of report types and their purposes
- Designing and building basic reports
- Practicing data analysis with SQL queries in SSRS
- Writing SQL queries to retrieve and analyze data
- Integrating SQL queries with SSRS reports
Advanced Data Transformation Techniques with SSIS
- Implementing complex transformations in SSIS
- Using advanced SSIS components (Lookups, Merge Joins, etc.)
- Data enrichment and aggregation techniques
Installing SQL Workbench
Introduction to SQL
- Overview of Database Management Systems(DBMS)
- Basic SQL syntax and structure
- Flow of SQL commands
- Data Types in SQL
- Running SQL commands using SELECT statement
- Retrieving data from tables
Filtering, Sorting, and Aggregating Data
- Using WHERE clause to filter data based on conditions
- Sorting query results using ORDER BY clause
- Limiting the number of results with LIMIT clause
- Using aggregate functions for summary statistics
- Grouping query results using GROUP BY clause
- Filtering grouped data with HAVING clause
Table Joins and Case Statements
- Case statements
- Understanding relationships between tables
- Using joins to combine data from multiple tables
Advanced Query Techniques
- Further practice with different types of joins
- Working with subqueries
- Working with derived tables
- Common table expressions (CTEs)
Window Functions and Data Modification
- Working with window functions
- Ranking data using RANK and DENSE_RANK
- Modifying data in tables using ALTER, RENAME, INSERT, UPDATE, DELETE
- Using UNION, INTERSECT, and EXCEPT to combine query results
Creating New Tables
- Further practice with window functions
- Creating new tables
- Inserting values into new tables
- Modifying and updating new tables
Advanced Topics and Artificial Intelligence
- Working with Primary Keys
- Auto-Increments
- Updating Tables
- Indexing
- Using AI in SQL programming
Date Variables and Artificial Intelligence
- Dealing with date variables in SQL data
- Setting variables in SQL
- Using AI in SQL programming
- Wrapping up
Introduction to PowerBI/ Connecting & Shaping Data
- Download and install Power BI Desktop, and adjust the settings.
- Understand the role that Power BI plays within the broader Microsoft ecosystem
- Explore core components of the Power BI Desktop interface
- Review the business intelligence workflow.
- Explore Power BI’s query editor and understand the role that Power Query plays in the larger BI workflow
- Introduce different types of connectors and connectivity modes available for getting data into Power BI
- Review tools for checking data quality and key profiling metrics like column distribution, empty values, errors, and outliers
- Transform tables using text, numerical and date/time tools, pivot and group records, and create new conditional columns
- Practice combining, modifying, and refreshing queries
Data Modeling
- Understand the basic principles of data modeling, including normalization, fact & dimension tables, and common schemas
- Create table relationships using primary and foreign keys, and discuss different types of relationship cardinality
- Configure report filters and trace filter context as it flows between related tables in the model
- Explore data modeling options like hierarchies, data categories, and hidden fields
DAX
- Introduce DAX fundamentals and learn when to use calculated columns and measures
- Understand the difference between row context and filter context, and how they impact DAX calculations
- Learn DAX formula syntax, basic operators and common function categories (math, logical, text, date/time, filter, etc.)
- Explore nested functions, and more complex topics like iterators and time intelligence patterns
Data Visualization
- Review frameworks and best practices for visualizing data and designing effective reports and dashboards
- Explore tools and techniques for inserting, formatting and filtering visuals in the Power BI Report view
- Add interactivity using tools like bookmarks, slicer panels, parameters, tooltips, and report navigation
- Learn how to configure row-level security with user roles
- Optimize reports for mobile viewing using custom layouts
Introduction to Spatial Analysis
- Understanding Spatial Data: Definition, significance, and examples of spatial data.
- Spatial Data Types: Differentiating between vector and raster data.
- Applications of Spatial Analysis: Overview of how spatial analysis is used in various fields such as environmental science, urban planning, and public health.
- Introduction to GIS: Understanding Geographic Information Systems and their role in spatial analysis.
Introduction to QGIS
- Getting Started with QGIS: Installing QGIS and familiarizing with the interface.
- Basic Operations: Opening and viewing spatial data, navigating the map canvas, and managing layers.
- Data Import and Export: How to import and export different spatial data formats in QGIS.
Creating Shapefiles (Point, Line, and Polygon)
- Understanding Shapefiles: The structure and components of shapefiles.
- Creating New Shapefiles: Step-by-step guide to creating point, line, and polygon shapefiles.
- Editing Shapefiles: Adding, modifying, and deleting features in shapefiles.
Learning Basic Cartography
- Map Elements: Understanding essential map elements (title, legend, scale, north arrow).
- Design Principles: Color theory, symbolization, and layout design for clear and effective map-making.
- Creating Maps: Practical exercises to design maps using QGIS.
Learning Basic Vector Analysis Tools
- Spatial Queries: Selecting features based on spatial relationships (e.g., proximity, intersection).
- Buffer Analysis: Creating buffers around features and understanding their applications.
- Overlay Analysis: Performing spatial operations like intersect, union, and difference.
Creating Heatmaps
- Understanding Heatmaps: Concept and applications of heatmaps in representing data density.
- Generating Heatmaps in QGIS: Step-by-step guide to creating heatmaps from point data.
- Customization: Adjusting parameters to refine the heatmap visualization.
Using Google Maps to Create Maps in QGIS
- Integrating Google Maps: Methods to use Google Maps as basemaps in QGIS projects.
- Georeferencing: Aligning external map images with spatial data using georeferencing tools.
- Practical Exercise: Creating a map project in QGIS with Google Maps as the base layer.
Introduction to Raster Data
- Understanding Raster Data: Definition, structure, and examples of raster data.
- Raster Analysis Tools: Introduction to basic raster operations (e.g., map algebra, reclassification).
- Working with Satellite Imagery: Basics of accessing and analyzing satellite imagery in QGIS.
Real-world Examples of GIS in Various Industries and Fields
- Environmental Management: Use cases of GIS in conservation, pollution monitoring, and resource management.
- Urban Planning: Applications in zoning, infrastructure development, and traffic management.
- Public Health: Spatial analysis in epidemiology, access to healthcare facilities, and health outcome mapping.
- Agriculture: Precision farming, crop yield analysis, and soil mapping.
Introduction to Business Intelligence
- Concepts and Importance of BI: Understanding BI, its components, and its importance in decision-making.
- BI vs. Data Science: Differentiating BI from Data Science and understanding their intersection.
- Overview of BI Tools: Introduction to popular BI tools and technologies (e.g., Tableau, Power BI).
Data Warehousing and ETL Processes
- Data Warehousing Concepts: Understanding data warehousing, data marts, and the role they play in BI.
- ETL Processes: Introduction to Extract, Transform, Load (ETL) processes and tools.
- Hands-on ETL Project: Practical exercise involving data extraction, data cleansing, transformation, and loading into a data warehouse.
SQL for Business Intelligence
- Advanced SQL Queries: Writing complex SQL queries for data analysis and reporting.
- SQL for Data Manipulation: Techniques for data manipulation and preparation for BI applications.
- Hands-on SQL Project: Using SQL to solve a business problem and prepare data for analysis.
Data Visualization and Dashboarding
- Principles of Data Visualization: Best practices for designing effective and informative visualizations.
- Introduction to Tableau/Power BI: Getting started with a BI tool, setting up, and basic functionalities.
- Dashboard Creation: Designing and developing interactive dashboards for business reporting.
Analytical Reporting and Decision Making
- Creating Reports: Techniques for creating comprehensive and insightful reports.
- Storytelling with Data: Learning how to tell a compelling story using data visualizations and reports.
- Decision Making with BI: Understanding how to use BI reports and dashboards to make informed business decisions.
Advanced BI Tools and Techniques
- Predictive Analytics in BI: Introduction to incorporating predictive analytics into BI for forecasting.
- Real-Time BI: Understanding real-time BI and analytics for dynamic decision-making.
- Hands-on Project with Advanced Tools: Applying predictive analytics and real-time data in BI projects.
Implementing BI Solutions
- BI Strategy and Implementation: Planning and executing a BI project from start to finish.
- Managing BI Projects: Best practices for managing BI projects and ensuring their success.
- Case Study: Analysis of a successful BI implementation in a business.
Business Intelligence in Practice
- Industry-Specific BI Applications: Exploring how BI is used in different industries such as finance, healthcare, retail, and more.
- Emerging Trends in BI: Discussion on AI, machine learning in BI, and future directions.
- Group Project: Developing a BI solution for a real-world business problem.
Installation
- Introduction to the Python programming language and its applications
- Setting up the Python environment: installation of Python and necessary libraries
- Configuring the development environment: IDEs, text editors, and Jupyter Notebook
Python Basics
- Introduction to Python: history, features, and advantages
- Expressions and operators: arithmetic, assignment, comparison, and logical
- Understanding type() function and type inference
- Introduction to data structures: lists, tuples, and dictionaries
Python Basics
- Recap of Python basics
- Working with arithmetic operators: addition, subtraction, multiplication, division, modulus, and exponentiation
- Using comparison operators: equal to, not equal to, greater than, less than, etc.
- Logical operators: and, or, and not
- Exploring advanced data types: sets and strings manipulation
Expressions, Conditional Statements & For Loop
- Evaluating expressions: operator precedence and associativity
- Introduction to conditional statements: if, elif, and else
- Executing code based on conditionals.
- Understanding the flow of control in conditional statements
- Iteration using the for loop: range(), iteration over lists, and strings.
While loop, Break and Continue Statements, and Nested Loops
- Working with while loop: syntax, conditions, and examples
- Combining loops and conditionals
- Using the break statement to exit loops prematurely.
- Utilizing the continue statement to skip iterations.
- Implementing nested loops for complex iterations
Functions
- Introduction to functions: purpose, advantages, and best practices
- Defining and calling user-defined functions
- Parameters and arguments: positional, keyword, and default values
- Return statement and function output.
- Variable scope and lifetime
- Function documentation and code readability
Exception Handling and File Handling
- Understanding exceptions: errors, exceptions, and exception hierarchy
- Handling exceptions using try-except blocks: handling specific exceptions, multiple exceptions, and else and finally clauses.
- Raising exceptions and creating custom exception classes
- File handling in Python: opening, reading, writing, and closing files.
- Working with different file modes and file objects
Python Modules: NumPy and Matplotlib
- Introduction to the NumPy module: features and applications
- Working with multidimensional arrays: creation, indexing, slicing, and reshaping
- Performing element-wise operations: arithmetic, logical, and statistical
- Overview of the Matplotlib module: data visualization and plotting
- Customizing plots: line properties, markers, colors, labels, and legends
Mathematics for Data Science
- Vectors and Matrices: Definition, addition, scalar multiplication.
- Matrix Multiplication: Concept and application in data transformation and neural networks.
- Eigenvalues and Eigenvectors: Importance in dimensionality reduction techniques like PCA
- Derivatives and Gradients: Understanding how changes in input affect changes in output.
- Partial Derivatives and the Gradient Vector: Application in gradient descent and optimization.
- Introduction to Optimization: Concept of loss functions and how gradients are used to minimize them.
- Probability Theory: Basics and conditional probability.
- Bayes' Theorem: Importance in machine learning for making predictions.
- Descriptive Statistics: Mean, median, mode, variance, and standard deviation.
- Distributions: Normal distribution and its significance in machine learning.
- Beyond Gradient Descent: Introduction to stochastic gradient descent and mini-batch gradient descent.
- Regularization Techniques: L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting.
- Introduction to Convex Optimization: Understanding convex functions and their relevance to machine learning optimization.
Introduction and Missing Value Analysis
- Introduction to Exploratory Data Analysis (EDA)
- Importance of EDA in data analysis
- Steps involved in EDA
- Handling missing values: identification, analysis, and treatment strategies
- Imputation techniques for missing values
Data Consistency, Binning, and Outlier Analysis
- Data consistency checks using fuzzy logic
- Binning and discretization techniques for continuous variables
- Outlier detection and analysis methods
- Handling outliers: techniques for treatment or removal
Feature Selection and Data Wrangling
- Importance of feature selection in EDA
- Feature selection techniques: filter methods, wrapper methods, and embedded methods
- Data wrangling: cleaning and transforming data for analysis
- Handling categorical variables: encoding techniques
Inference, Hypothesis Testing, and Visualization
- Inference and hypothesis testing in EDA
- Common statistical tests: t-test, chi-square test, ANOVA, etc.
- Visualization techniques for EDA: histograms, box plots, scatter plots, etc.
- Hands-on practical session for complete EDA using a dataset
Machine Learning Performance Metrics and Naive Bayes
- Evaluation metrics for classification problems: accuracy, precision, recall, F1 score, etc.
- Introduction to Naive Bayes algorithm and its applications
- Implementing Naive Bayes for classification tasks
Logistic Regression, SVM, Decision Trees, and Random Forests
- Logistic Regression: theory, interpretation, and applications
- Support Vector Machines (SVM): concepts, kernels, and use cases
- Decision Trees: construction, pruning, and interpretability
- Random Forests: ensemble learning and feature importance
- Bagging and Boosting: techniques for improving model performance
Clustering Introduction, Partitioning Algorithms, and Cluster Evaluation
- Introduction to clustering: unsupervised learning technique
- Partitioning algorithms: K-means, K-medoids
- Hierarchical clustering: agglomerative and divisive approaches
- Density-based clustering: DBSCAN, OPTICS
- Cluster evaluation metrics: silhouette coefficient, Davies-Bouldin index
Regression and Evaluation of Regression Methods
- Introduction to regression analysis
- Linear regression: assumptions, interpretation, and model evaluation
- Evaluation metrics for regression: mean squared error, R-squared, etc.
- Other regression methods: polynomial regression, ridge regression, lasso regression.
Introduction to NLP and Text Normalization
- Overview of Natural Language Processing (NLP)
- Techniques for text normalization: lowercasing, punctuation removal, etc.
Text Representation and Tokenization
- Introduction to vectors in NLP: Bag of Words and Count Vectorizer
- Basics of tokenization and stopword removal
Stemming, Lemmatization, and N-gram Language Models
- Understanding and applying stemming and lemmatization
- Introduction to N-gram language models
Markov Models and Language Model Evaluation
- Basics of Markov models in NLP
- Techniques for evaluating language models: probability smoothing and performance metrics
Text Classification Fundamentals
- Overview of Text Classification
- Introduction to Naive Bayes and Sentiment Classification
Advanced Classifiers and Vector Semantics
- Generative vs. discriminative Classifiers
- Understanding vector semantics and embeddings: TF-IDF theory and vector similarity
Neural Word Embeddings and Sequence Models
- Introduction to neural word embeddings: Word2Vec and GloVe
- Exploring sequence of words in NLP tasks
Transformers and Large Language Models
- Overview of transformers and their impact on NLP
- Introduction to large language models (LLMs) and their applications.
Introduction to Deep Learning
- Overview of deep learning, its importance in computer vision, key concepts, and architectures.
- Code along session for building Deep Neural Network from scratch
Deep Learning Hyperparameter Tuning
- Strategies for optimizing hyperparameters like learning rate, batch size, and regularization to improve model performance.
Introduction to Convolutional Neural Networks (CNNs)
- Explanation of CNNs, their architecture, and their role in image processing.
- Code along session on Convolutional Neural Networks
Building Custom Image Classification Models
- Step-by-step guide to creating and training a custom image classifier using a CNN.
Transfer Learning and Introduction to Object Detection
- Introduction to transfer learning, its applications, and an overview of object detection techniques.
Hands-on with YOLO Object Detection
- Practical session on using the YOLO (You Only Look Once) algorithm for object detection.
Custom Training YOLO model
- Detailed guidance on training a YOLO model with a custom dataset for specific object detection tasks.
Using State-of-the-Art Models for Real-World Applications
- Exploring and implementing advanced models in computer vision for practical use cases.
Introduction to OpenCV
- Introduction to OpenCV, its libraries, and its importance in computer vision tasks.
Image Pre-processing and Pre-build Algorithms in OpenCV
- Hands-on session on image pre-processing techniques and using built-in algorithms in OpenCV.
Advance guided project with OpenCV
- Capstone project where participants apply learned techniques in a guided project using OpenCV.
Introduction to MLOps and AI/NLP Fundamentals
- Overview of MLOps and its importance in the AI lifecycle
- Current trends in AI
- Setting up the development environment
Deep Dive into Machine Learning Models for NLP
- Understanding NLP models (llama2, GPT, Mistral, etc.)
- Introduction to Hugging Face Transformers and Datasets
- Hands-on: Building a simple NLP model with Hugging Face
Introduction to FastAPI for ML Model Deployment
- Basics of API development with FastAPI
- Deploying a simple ML model with FastAPI
- Hands-on: Creating your first ML API with FastAPI
Advanced FastAPI Features for Production-Ready APIs
- Authentication and authorization in FastAPI
- Hands-on: Enhancing your ML API with advanced features
Introduction to Docker for AI Applications
- Basics of Docker and containerization
- Building Docker images for AI/ML applications
- Hands-on: Containerizing your FastAPI application
Leveraging Lang Chain and LangSmith for Enhanced NLP Applications
- Introduction to Lang Chain and its Ecosystem
- Overview of LangSmith for debugging, testing, evaluating, and monitoring LLM applications
- Hands-on: Integrating Lang Chain with your NLP models and using LangSmith for enhanced capabilities
Advanced Model Deployment with Hugging Face and Lang Chain
- Integrating Hugging Face models for advanced NLP capabilities
- Exploring Lang Chain for building complex NLP applications
- Hands-on: Deploying a Hugging Face model via FastAPI with LangSmith integration
Deploying ML Models on Google Cloud
- Overview of Google Cloud Platform (GCP) for ML
- Introduction to Google Cloud Run
- Hands-on: Deploying your Dockerized FastAPI application on GCP with LangSmith monitoring.
Email Writing
- Basics of Professional Email Communication: Structure, tone, and etiquette.
- Writing Effective Subject Lines: Techniques to ensure your emails are opened.
- Emails for Networking: Approaching professionals and mentors in data science/AI.
- Follow-up Emails: Strategies for following up without being intrusive.
Report Writing + Presentations
- Structure of a Data Science Report: Elements including abstract, methodology, results, and conclusion.
- Visualizing Data: Incorporating charts, graphs, and other visual tools to enhance comprehension.
- Creating Engaging Presentations: Tips for PowerPoint, storytelling, and engaging your audience.
- Presentation Skills: Delivering your message confidently, handling Q&A sessions.
LinkedIn Optimization
- Building a Professional Profile: Key components of a LinkedIn profile for data science/AI professionals.
- Networking Strategies: Connecting with industry professionals and joining relevant groups.
- Content Sharing and Creation: Establishing thought leadership by sharing insights, articles, and engaging with community content.
Resume/CV Writing
- Tailoring Your Resume for Data Science/AI: Highlighting relevant skills, projects, and experiences.
- Action Verbs and Quantifiable Achievements: Demonstrating impact in previous roles or projects.
- Design and Layout: Making your resume/CV visually appealing and easy to read.
Cover Letter
- Structure of a Cover Letter: Introduction, body, and closing.
- Customizing Your Message: Researching the company and role to personalize content.
- Highlighting Fit and Value: Articulating how your skills and experiences align with the job requirements.
Freelancing
- Getting Started with Freelancing: Platforms for data science/AI freelancers, setting up a profile.
- Finding Projects and Clients: Strategies to secure freelance work and build a portfolio.
- Pricing Your Services: Understanding market rates and value-based pricing.
- Client Management: Communicating effectively and managing expectations.
Kaggle for Data Science
- Introduction to Kaggle: Overview of the platform, competitions, datasets, and notebooks.
- Participating in Competitions: Tips for success, collaboration, and learning from the community.
- Building a Portfolio: Using Kaggle to showcase your skills and projects to potential employers.
GitHub
- Why GitHub for Data Scientists: Importance of version control and code sharing.
- Creating and Managing Repositories: Best practices for organizing and documenting projects.
- Collaborating on Projects: Contributing to open-source projects and collaborating with others.
- GitHub as a Portfolio: Presenting your work and contributions to potential employers.
How to Crack Data Science Interviews
- Understanding the Interview Process: Types of interviews (technical, behavioral, case studies).
- Preparing for Technical Interviews: Common questions, coding challenges, and statistical questions.
- Behavioral Interview Preparation: Crafting your story, STAR method for responses.
- Mock Interviews: Practicing with peers or mentors to gain confidence.
Global Market Understanding
- Data Science/AI Trends: Understanding global trends and emerging technologies.
- Cultural Competence: Working in multicultural teams and serving diverse user bases.
- Regulatory Environment: Overview of data privacy laws and ethical considerations in different regions.
AI Product Development
- From Idea to Product: Ideation, validation, and development processes.
- User-Centric Design: Incorporating user feedback and UX/UI principles.
- Product Management for AI: Unique challenges in managing AI projects, iteration, and deployment.
- Metrics and Performance: Evaluating the success and impact of AI products.
Storytelling Using Data
- Principles of Data Storytelling: Crafting narratives that resonate with your audience.
- Visual Narrative Techniques: Using data visualizations effectively in your story.
- Engaging Presentations: Combining data, visuals, and narrative for impactful presentations.
Intro to Data Commons
- Understanding Data Commons: Concept, importance, and examples.
- Accessing and Contributing to Data Commons: Guidelines and best practices.
- Leveraging Data Commons: How data scientists can use these resources for research and development.