From 45982a86807115ecbc9856ab850ca42cd3334622 Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Mon, 6 Jan 2025 20:51:43 +0530 Subject: [PATCH 1/7] Create Heart-Disease-detection-using-ML --- .../projects/Heart-Disease-detection-using-ML | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 docs/ML/projects/Heart-Disease-detection-using-ML diff --git a/docs/ML/projects/Heart-Disease-detection-using-ML b/docs/ML/projects/Heart-Disease-detection-using-ML new file mode 100644 index 00000000..659b5cbd --- /dev/null +++ b/docs/ML/projects/Heart-Disease-detection-using-ML @@ -0,0 +1,76 @@ +# Heart-Disease-Detection-using-ML + +## Overview +Heart disease is one of the leading causes of death worldwide. Early detection of heart disease can significantly improve patient outcomes by enabling timely intervention. This project leverages machine learning techniques to predict the likelihood of heart disease based on various medical attributes and patient data. + +## Features +- Preprocessing of medical data +- Multiple machine learning models for classification +- Evaluation metrics for model performance +- Visualization of results + +## Technologies Used +- Python +- Machine Learning libraries: scikit-learn, TensorFlow, or PyTorch +- Data manipulation: pandas, numpy +- Data visualization: matplotlib, seaborn + +## Installation + +1. Clone the repository: + ```bash + git clone https://github.com/yourusername/Heart-Disease-detection-using-ML.git + ``` +2. Navigate to the project directory: + ```bash + cd Heart-Disease-detection-using-ML + ``` +3. Install required dependencies: + ```bash + pip install -r requirements.txt + ``` + +## Dataset +This project uses a publicly available heart disease dataset from [Kaggle](https://www.kaggle.com/) or [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/45/heart+disease). Ensure you download the dataset and place it in the `data` folder. + +## Notebook +This is notebook of the following project [Kaggle](https://www.kaggle.com/code/nandagopald2004/notebookc2da436efa) + +## Usage + +1. **Data Preprocessing**: Run the preprocessing script to clean and normalize the dataset. + ```bash + python preprocess.py + ``` +2. **Model Training**: Train the model using the provided script. + ```bash + python train_model.py + ``` +3. **Evaluation**: Evaluate the trained model on the test set. + ```bash + python evaluate_model.py + ``` +4. **Visualization**: Visualize the results and performance metrics. + ```bash + python visualize_results.py + ``` + + +## Contribution +Contributions are welcome! Please follow these steps: +1. Fork the repository. +2. Create a new branch for your feature or bug fix. +3. Commit your changes. +4. Push to your branch. +5. Open a pull request. + +## License +This project is licensed under the MIT License. See the LICENSE file for more details. + +## Acknowledgements +- The dataset providers for making this research possible. +- Open-source libraries and contributors. + +## Contact +For questions or feedback, please contact [NANDA GOPAL.D] at [nandagopalng2004@gmail.com]. + From 4be02aa486b7cb990981d13df46fa270eebed3b3 Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Mon, 6 Jan 2025 20:56:35 +0530 Subject: [PATCH 2/7] Rename Heart-Disease-detection-using-ML to Heart_Disease_detection_using_ML --- ...isease-detection-using-ML => Heart_Disease_detection_using_ML} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/ML/projects/{Heart-Disease-detection-using-ML => Heart_Disease_detection_using_ML} (100%) diff --git a/docs/ML/projects/Heart-Disease-detection-using-ML b/docs/ML/projects/Heart_Disease_detection_using_ML similarity index 100% rename from docs/ML/projects/Heart-Disease-detection-using-ML rename to docs/ML/projects/Heart_Disease_detection_using_ML From 5e0c2e838c068b05d9b336716d6e024ab5d1b65b Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Tue, 7 Jan 2025 21:33:44 +0530 Subject: [PATCH 3/7] Update Heart_Disease_detection_using_ML --- docs/ML/projects/Heart_Disease_detection_using_ML | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ML/projects/Heart_Disease_detection_using_ML b/docs/ML/projects/Heart_Disease_detection_using_ML index 659b5cbd..312ff164 100644 --- a/docs/ML/projects/Heart_Disease_detection_using_ML +++ b/docs/ML/projects/Heart_Disease_detection_using_ML @@ -34,7 +34,7 @@ Heart disease is one of the leading causes of death worldwide. Early detection o This project uses a publicly available heart disease dataset from [Kaggle](https://www.kaggle.com/) or [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/45/heart+disease). Ensure you download the dataset and place it in the `data` folder. ## Notebook -This is notebook of the following project [Kaggle](https://www.kaggle.com/code/nandagopald2004/notebookc2da436efa) +This is notebook of the following project [Kaggle](https://www.kaggle.com/code/nandagopald2004/heart-disease-detection-using-ml) ## Usage From 37168e69f0742d1b578ff9516054afdb49af253b Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Tue, 7 Jan 2025 22:38:00 +0530 Subject: [PATCH 4/7] Update Heart_Disease_detection_using_ML --- .../projects/Heart_Disease_detection_using_ML | 351 ++++++++++++++---- 1 file changed, 278 insertions(+), 73 deletions(-) diff --git a/docs/ML/projects/Heart_Disease_detection_using_ML b/docs/ML/projects/Heart_Disease_detection_using_ML index 312ff164..c0abb298 100644 --- a/docs/ML/projects/Heart_Disease_detection_using_ML +++ b/docs/ML/projects/Heart_Disease_detection_using_ML @@ -1,76 +1,281 @@ -# Heart-Disease-Detection-using-ML - -## Overview -Heart disease is one of the leading causes of death worldwide. Early detection of heart disease can significantly improve patient outcomes by enabling timely intervention. This project leverages machine learning techniques to predict the likelihood of heart disease based on various medical attributes and patient data. - -## Features -- Preprocessing of medical data -- Multiple machine learning models for classification -- Evaluation metrics for model performance -- Visualization of results - -## Technologies Used -- Python -- Machine Learning libraries: scikit-learn, TensorFlow, or PyTorch -- Data manipulation: pandas, numpy -- Data visualization: matplotlib, seaborn - -## Installation - -1. Clone the repository: - ```bash - git clone https://github.com/yourusername/Heart-Disease-detection-using-ML.git - ``` -2. Navigate to the project directory: - ```bash - cd Heart-Disease-detection-using-ML - ``` -3. Install required dependencies: - ```bash - pip install -r requirements.txt - ``` - -## Dataset -This project uses a publicly available heart disease dataset from [Kaggle](https://www.kaggle.com/) or [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/45/heart+disease). Ensure you download the dataset and place it in the `data` folder. - -## Notebook + + + + + +# Project Title +Heart-Disease-Detection-using-ML + +### AIM +The aim of this project is to develop a reliable and efficient machine learning-based system for the early detection and diagnosis of heart disease. By leveraging advanced algorithms, the system seeks to analyze patient data, identify significant patterns, and predict the likelihood of heart disease, thereby assisting healthcare professionals in making informed decisions. + + +### DATASET LINK +This project uses a publicly available heart disease dataset from [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/45/heart+disease) + + +### NOTEBOOK LINK This is notebook of the following project [Kaggle](https://www.kaggle.com/code/nandagopald2004/heart-disease-detection-using-ml) -## Usage - -1. **Data Preprocessing**: Run the preprocessing script to clean and normalize the dataset. - ```bash - python preprocess.py - ``` -2. **Model Training**: Train the model using the provided script. - ```bash - python train_model.py - ``` -3. **Evaluation**: Evaluate the trained model on the test set. - ```bash - python evaluate_model.py - ``` -4. **Visualization**: Visualize the results and performance metrics. - ```bash - python visualize_results.py - ``` - - -## Contribution -Contributions are welcome! Please follow these steps: -1. Fork the repository. -2. Create a new branch for your feature or bug fix. -3. Commit your changes. -4. Push to your branch. -5. Open a pull request. - -## License -This project is licensed under the MIT License. See the LICENSE file for more details. - -## Acknowledgements -- The dataset providers for making this research possible. -- Open-source libraries and contributors. - -## Contact -For questions or feedback, please contact [NANDA GOPAL.D] at [nandagopalng2004@gmail.com]. + +### LIBRARIES NEEDED + + + +??? quote "LIBRARIES USED" + + - pandas + - numpy + - scikit-learn + - matplotlib + - seaborn + +--- + +### DESCRIPTION + + + + + +!!! info "What is the requirement of the project?" + + - Write the answer here in simple bullet points. + +??? info "Why is it necessary?" + + - Write the answer here in simple bullet points. + +??? info "How is it beneficial and used?" + + - Write the answer here in simple bullet points. + +??? info "How did you start approaching this project? (Initial thoughts and planning)" + + - Write the answer here in simple bullet points. + +??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)." + + - Write the answer here in simple bullet points. + + +--- + +### EXPLANATION + +#### DETAILS OF THE DIFFERENT FEATURES + + +Age: Patient's age in years. + +Sex: Gender of the patient (1 = male; 0 = female). + +Chest Pain Type (cp): Categorized as: + +0: Typical angina +1: Atypical angina +2: Non-anginal pain +3: Asymptomatic +Resting Blood Pressure (trestbps): Measured in mm Hg upon hospital admission. + +Serum Cholesterol (chol): Measured in mg/dL. + +Fasting Blood Sugar (fbs): Indicates if fasting blood sugar > 120 mg/dL (1 = true; 0 = false). + +Resting Electrocardiographic Results (restecg): + +0: Normal +1: Having ST-T wave abnormality (e.g., T wave inversions and/or ST elevation or depression > 0.05 mV) +2: Showing probable or definite left ventricular hypertrophy by Estes' criteria +Maximum Heart Rate Achieved (thalach): Peak heart rate during exercise. + +Exercise-Induced Angina (exang): Presence of angina induced by exercise (1 = yes; 0 = no). + +Oldpeak: ST depression induced by exercise relative to rest. + +Slope of the Peak Exercise ST Segment (slope): + +0: Upsloping +1: Flat +2: Downsloping +Number of Major Vessels Colored by Fluoroscopy (ca): Ranges from 0 to 3. + +Thalassemia (thal): + +1: Normal +2: Fixed defect +3: Reversible defect +Target: Diagnosis of heart disease (0 = no disease; 1 = disease). + +--- + +#### PROJECT WORKFLOW + +### 1.Problem Definition + +Identify the objective: To predict the presence or absence of heart disease based on patient data. +Define the outcome variable (target) and input features. + +### 2.Data Collection + +Gather a reliable dataset, such as the Cleveland Heart Disease dataset, which includes features relevant to heart disease prediction. + +### 3.Data Preprocessing + +Handle missing values: Fill or remove records with missing data. +Normalize/standardize data to ensure all features have comparable scales. +Encode categorical variables like sex, cp, and thal using techniques like one-hot encoding or label encoding. + +### 4.Exploratory Data Analysis (EDA) + +Visualize data distributions using histograms, boxplots, or density plots. +Identify relationships between features using correlation matrices and scatterplots. +Detect and handle outliers to improve model performance. + +### 5.Feature Selection + +Use statistical methods or feature importance metrics to identify the most relevant features for prediction. +Remove redundant or less significant features. + +### 6.Data Splitting + +Divide the dataset into training, validation, and testing sets (e.g., 70%-15%-15%). +Ensure a balanced distribution of the target variable in all splits. + +### 7.Model Selection + +Experiment with multiple machine learning algorithms such as Logistic Regression, Random Forest, Decision Trees, Support Vector Machines (SVM), and Neural Networks. +Select models based on the complexity and nature of the dataset. + +### 8.Model Training + +Train the chosen models using the training dataset. +Tune hyperparameters using grid search or random search techniques. + +### 9.Model Evaluation + +Assess models on validation and testing datasets using metrics such as: +Accuracy +Precision, Recall, and F1-score +Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC). +Compare models to identify the best-performing one. + +### 10.##Deployment and Prediction + +Save the trained model using frameworks like joblib or pickle. +Develop a user interface (UI) or API for end-users to input data and receive predictions. + +### 11.Iterative Improvement + +Continuously refine the model using new data or advanced algorithms. +Address feedback and optimize the system based on real-world performance. + + + +#### PROJECT TRADE-OFFS AND SOLUTIONS + + +=== "Trade Off 1" + - Accuracy vs. Interpretability + - Complex models like Random Forests or Neural Networks offer higher accuracy but are less interpretable compared to simpler models like Logistic Regression. +=== "Trade Off 2" + - Overfitting vs. Generalization + - Models with high complexity may overfit the training data, leading to poor generalization on unseen data. +--- + + +!!! success "Project workflow" + + ``` mermaid + graph LR + A[Start] --> B{Error?}; + B -->|Yes| C[Hmm...]; + C --> D[Debug]; + D --> B; + B ---->|No| E[Yay!]; + ``` + +??? tip "Visualizations and EDA of different features" + + === "Image Topic" + ![img](images/.png "a title") + +??? example "Model performance graphs" + + === "Image Topic" + ![img](images/.png "a title") + +--- + +### MODELS USED AND THEIR EVALUATION METRICS + + +| Model | Score | +|------------|----------| +| Logistic regression | 88% | +| K-Nearest Classifier | 68% | +| Random Forest Classifier | 86% | + +--- + +### CONCLUSION + +#### KEY LEARNINGS + + +1. Data Insights +Understanding Healthcare Data: Learned how medical attributes (e.g., age, cholesterol, chest pain type) influence heart disease risk. +Data Imbalance: Recognized the challenges posed by imbalanced datasets and explored techniques like SMOTE and class weighting to address them. +Importance of Preprocessing: Gained expertise in handling missing values, scaling data, and encoding categorical variables, which are crucial for model performance. + +2. Techniques Mastered +Exploratory Data Analysis (EDA): Applied visualization tools (e.g., histograms, boxplots, heatmaps) to uncover patterns and correlations in data. +Feature Engineering: Identified and prioritized key features using statistical methods and feature importance metrics. +Modeling: Implemented various machine learning algorithms, including Logistic Regression, Random Forest, Gradient Boosting, and Support Vector Machines. +Evaluation Metrics: Learned to evaluate models using metrics like Precision, Recall, F1-score, and ROC-AUC to optimize for healthcare-specific goals. +Hyperparameter Tuning: Used grid search and random search to optimize model parameters and improve performance. +Interpretability Tools: Utilized SHAP and feature importance analysis to explain model predictions. + +3. Skills Developed +Problem-Solving: Addressed trade-offs such as accuracy vs. interpretability, and overfitting vs. generalization. +Critical Thinking: Improved decision-making on model selection, preprocessing methods, and evaluation strategies. +Programming: Strengthened Python programming skills, including the use of libraries like scikit-learn, pandas, matplotlib, and TensorFlow. +Collaboration: Enhanced communication and teamwork when discussing medical insights and technical challenges with domain experts. +Time Management: Balanced experimentation with computational efficiency, focusing on techniques that maximized impact. +Ethical Considerations: Gained awareness of ethical issues like ensuring fairness in predictions and minimizing false negatives, which are critical in healthcare applications. + +4. Broader Understanding +Interdisciplinary Knowledge: Combined expertise from data science, healthcare, and statistics to create a meaningful application. +Real-World Challenges: Understood the complexities of translating machine learning models into practical tools for healthcare. +Continuous Learning: Learned that model development is iterative, requiring continuous refinement based on feedback and new data. + +#### USE CASES + + +=== "Application 1" + + **Clinical Decision Support Systems (CDSS)** + + - ML models can be integrated into Electronic Health Record (EHR) systems to assist doctors in diagnosing heart disease. The model can provide predictions based on patient data, helping clinicians make faster and more accurate decisions. +=== "Application 2" + + **Early Screening and Risk Assessment** + + - Patients can undergo routine screening using a heart disease detection system to assess their risk level. The system can predict whether a patient is at high, moderate, or low risk, prompting early interventions or lifestyle changes. From 08c6ebd87a1ac81f3e86718b25196edb2674e8fa Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Wed, 8 Jan 2025 20:44:16 +0530 Subject: [PATCH 5/7] Update Heart_Disease_detection_using_ML i have removed all unnecessary parts of documentation --- .../projects/Heart_Disease_detection_using_ML | 67 ++----------------- 1 file changed, 5 insertions(+), 62 deletions(-) diff --git a/docs/ML/projects/Heart_Disease_detection_using_ML b/docs/ML/projects/Heart_Disease_detection_using_ML index c0abb298..69c65af9 100644 --- a/docs/ML/projects/Heart_Disease_detection_using_ML +++ b/docs/ML/projects/Heart_Disease_detection_using_ML @@ -1,8 +1,3 @@ - - - - - # Project Title Heart-Disease-Detection-using-ML @@ -19,10 +14,7 @@ This is notebook of the following project [Kaggle](https://www.kaggle.com/code/n ### LIBRARIES NEEDED - - -??? quote "LIBRARIES USED" - pandas - numpy @@ -33,7 +25,7 @@ This is notebook of the following project [Kaggle](https://www.kaggle.com/code/n --- ### DESCRIPTION - - - - - -!!! info "What is the requirement of the project?" - - - Write the answer here in simple bullet points. - -??? info "Why is it necessary?" - - - Write the answer here in simple bullet points. - -??? info "How is it beneficial and used?" - - - Write the answer here in simple bullet points. - -??? info "How did you start approaching this project? (Initial thoughts and planning)" - - - Write the answer here in simple bullet points. - -??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)." - - - Write the answer here in simple bullet points. - - ---- ### EXPLANATION @@ -126,7 +91,7 @@ Target: Diagnosis of heart disease (0 = no disease; 1 = disease). --- #### PROJECT WORKFLOW - + ### 1.Problem Definition Identify the objective: To predict the presence or absence of heart disease based on patient data. @@ -189,7 +154,7 @@ Address feedback and optimize the system based on real-world performance. #### PROJECT TRADE-OFFS AND SOLUTIONS - + === "Trade Off 1" - Accuracy vs. Interpretability @@ -200,28 +165,6 @@ Address feedback and optimize the system based on real-world performance. --- -!!! success "Project workflow" - - ``` mermaid - graph LR - A[Start] --> B{Error?}; - B -->|Yes| C[Hmm...]; - C --> D[Debug]; - D --> B; - B ---->|No| E[Yay!]; - ``` - -??? tip "Visualizations and EDA of different features" - - === "Image Topic" - ![img](images/.png "a title") - -??? example "Model performance graphs" - - === "Image Topic" - ![img](images/.png "a title") - ---- ### MODELS USED AND THEIR EVALUATION METRICS @@ -237,7 +180,7 @@ Address feedback and optimize the system based on real-world performance. ### CONCLUSION #### KEY LEARNINGS - + 1. Data Insights Understanding Healthcare Data: Learned how medical attributes (e.g., age, cholesterol, chest pain type) influence heart disease risk. @@ -266,7 +209,7 @@ Real-World Challenges: Understood the complexities of translating machine learni Continuous Learning: Learned that model development is iterative, requiring continuous refinement based on feedback and new data. #### USE CASES - + === "Application 1" From f7812623f13a80f399d668d7e806591cab5256d6 Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Thu, 9 Jan 2025 12:46:20 +0530 Subject: [PATCH 6/7] Rename Heart_Disease_detection_using_ML to Heart_Disease_detection_using_ML.md I have added .md extension --- ...ase_detection_using_ML => Heart_Disease_detection_using_ML.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/ML/projects/{Heart_Disease_detection_using_ML => Heart_Disease_detection_using_ML.md} (100%) diff --git a/docs/ML/projects/Heart_Disease_detection_using_ML b/docs/ML/projects/Heart_Disease_detection_using_ML.md similarity index 100% rename from docs/ML/projects/Heart_Disease_detection_using_ML rename to docs/ML/projects/Heart_Disease_detection_using_ML.md From 87f4d1ea6bcaeee4b39abe57bfaebb75a6823832 Mon Sep 17 00:00:00 2001 From: "NANDA GOPAL.D" Date: Thu, 9 Jan 2025 12:55:31 +0530 Subject: [PATCH 7/7] Update and rename Heart_Disease_detection_using_ML.md to Heart_Disease_Detection_Model.md I have changed project title as per instructions --- ..._detection_using_ML.md => Heart_Disease_Detection_Model.md} | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) rename docs/ML/projects/{Heart_Disease_detection_using_ML.md => Heart_Disease_Detection_Model.md} (99%) diff --git a/docs/ML/projects/Heart_Disease_detection_using_ML.md b/docs/ML/projects/Heart_Disease_Detection_Model.md similarity index 99% rename from docs/ML/projects/Heart_Disease_detection_using_ML.md rename to docs/ML/projects/Heart_Disease_Detection_Model.md index 69c65af9..a25c7bff 100644 --- a/docs/ML/projects/Heart_Disease_detection_using_ML.md +++ b/docs/ML/projects/Heart_Disease_Detection_Model.md @@ -1,5 +1,4 @@ -# Project Title -Heart-Disease-Detection-using-ML +# Heart Disease Detection Model ### AIM The aim of this project is to develop a reliable and efficient machine learning-based system for the early detection and diagnosis of heart disease. By leveraging advanced algorithms, the system seeks to analyze patient data, identify significant patterns, and predict the likelihood of heart disease, thereby assisting healthcare professionals in making informed decisions.