Python for Data Science and Machine Learning Bootcamp

Instructor: Jose Marcial Portilla
Platform: Udemy
Duration: 25 hours, 165 videos
Level: Beginner to Intermediate
Topics: Python (numpy, pandas, matplotlib, seaborn, scikit-learn), SQL, Data Analysis, Machine Learning, Big Data, PySpark, AWS EC2.

Overview

This intensive bootcamp provides a comprehensive grounding in SQL, data visualization, data analysis, machine learning, and big data tools like Spark. It progresses from introductory Python and data manipulation with libraries like NumPy and Pandas to advanced machine learning and big data applications. The course is designed to build foundational skills, solve real-world data problems, and develop hands-on experience with popular data science tools.

Course Content

Sections 1-4: Setup and Introduction

Environment setup: Installing Anaconda, setting up Jupyter, and a Python crash course.
Objective: Set up a robust data analysis environment and gain an introduction to Python basics.

Section 5: NumPy Essentials

Topics: Arrays, indexing, operations, and exercises to develop core NumPy skills.
Objective: Handle numerical data efficiently and prepare for data manipulation with Pandas.

Section 6-7: Data Manipulation with Pandas

Topics: DataFrames, handling missing data, groupby, merging, concatenation, operations, input/output, and exercises.
Objective: Learn the full scope of data wrangling techniques needed for real-world data analysis.

Section 8-12: Data Visualization Techniques

Matplotlib: Basic plotting and exercises.
Seaborn: In-depth exploration of distribution plots, regression plots, matrix plots, and more.
Pandas Built-in: Quick data visualizations using Pandas.
Plotly & Cufflinks: Interactive visualizations for dashboards or websites.
Geographical Plotting: Using maps for data visualization with examples (e.g., 2014 World Power Consumption, 2012 Election Data).

Section 13: Data Analysis Capstone Project

911 Calls Project: Exploring call distribution by type and creating heatmaps.
Finance Project: Analyzing stock prices and exploring financial crisis impacts.
Objective: Practice exploratory data analysis (EDA) with complex datasets and tackle real-life questions.

Sections 14-17: Machine Learning Basics

Introduction to Machine Learning: Supervised learning, classification, and regression metrics.
Linear Regression: Theory, exercises, and a project on eCommerce customer behavior.
Cross-validation: Techniques and bias-variance trade-off.
Logistic Regression: Theory, exercises, and an advertising data prediction project.

Section 18-20: Advanced Machine Learning Techniques

K Nearest Neighbors (KNN): Theory, EDA, standardization, optimal K-values, and a KNN project.
Decision Trees & Random Forests: Project on loan repayment prediction with LendingClub data.
Support Vector Machines (SVM): SVM theory, application with the Iris dataset, and evaluation.

Section 21-24: Unsupervised Learning

K-Means Clustering: Clustering universities and evaluating results.
Principal Component Analysis (PCA): Dimensionality reduction on the Breast Cancer dataset.
Natural Language Processing (NLP): Classifying Yelp reviews with NLP pipeline.

Section 25: Neural Networks and Deep Learning

Topics: Introduction to artificial neural networks (ANNs), activation functions, cost functions, backpropagation, TensorFlow, and Keras.
Objective: Build, train, and evaluate neural networks using TensorFlow and Keras.

Section 26: Big Data and Spark

Topics: Big Data overview, AWS EC2 setup, SSH with Mac, PySpark, RDD transformations, and actions.
Objective: Gain hands-on experience with big data technologies and distributed computing.

Course Projects

Real-World Capstone Projects

911 Calls Project: Analyzing and visualizing emergency call data.
Finance Data Project: EDA on stock prices during the financial crisis.
Linear Regression Project: Advising an eCommerce company on mobile app vs. website focus.
Logistic Regression Project: Predicting ad-clicking behavior.
Random Forest Project: Loan repayment prediction with LendingClub data.
NLP Project: Sentiment analysis on Yelp reviews.

Learning Objectives

Set up an environment for data analysis and machine learning.
Master data wrangling with Pandas and data visualization with Matplotlib, Seaborn, and Plotly.
Gain foundational machine learning skills with Scikit-learn, including supervised and unsupervised algorithms.
Develop practical experience with neural networks using TensorFlow and Keras.
Understand big data basics and work with Spark.

Key Tools & Libraries

Python libraries: NumPy, Pandas, Matplotlib, Seaborn, Plotly, Scikit-learn, TensorFlow, Keras
Big Data: PySpark, AWS EC2, SSH setup
Additional: Plotly Cufflinks for interactive visualizations

Problems and Solutions Faced

Throughout the bootcamp, the course tackled real-world challenges:

Data Quality: Handling missing values and inconsistent formats in large datasets.
Computational Limitations: Using PySpark and AWS EC2 instances for large-scale data.
Model Complexity: Employing cross-validation and hyperparameter tuning to enhance model accuracy and minimize overfitting.
Big Data Challenges: Acquiring skills for distributed data processing with Spark.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Big-data-and-spark		Big-data-and-spark
Data-capstone-projects		Data-capstone-projects
Decision-trees-and-random-forest		Decision-trees-and-random-forest
Geographical-plotting		Geographical-plotting
K-means-clustering		K-means-clustering
K-nearest-neighbors		K-nearest-neighbors
Linear-regression		Linear-regression
Logistic-regression		Logistic-regression
Natural-language-processing		Natural-language-processing
Pandas		Pandas
Recomender-systems		Recomender-systems
TensorFlow		TensorFlow
.DS_Store		.DS_Store
Matplotlib Advanced Concepts.ipynb		Matplotlib Advanced Concepts.ipynb
Matplotlib Exercises.ipynb		Matplotlib Exercises.ipynb
Numpy Exercises.ipynb		Numpy Exercises.ipynb
Plotly and Cufflinks.ipynb		Plotly and Cufflinks.ipynb
Principal Component Analysis.ipynb		Principal Component Analysis.ipynb
Python Crash Course Exercises.ipynb		Python Crash Course Exercises.ipynb
README.md		README.md
Seaborn Exercises.ipynb		Seaborn Exercises.ipynb
Support Vector Machines Project.ipynb		Support Vector Machines Project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python for Data Science and Machine Learning Bootcamp

Overview

Course Content

Course Projects

Real-World Capstone Projects

Learning Objectives

Key Tools & Libraries

Problems and Solutions Faced

About

Releases

Packages

Languages

santi-souza/python-datascience-machinelearning

Folders and files

Latest commit

History

Repository files navigation

Python for Data Science and Machine Learning Bootcamp

Overview

Course Content

Course Projects

Real-World Capstone Projects

Learning Objectives

Key Tools & Libraries

Problems and Solutions Faced

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages