Skip to content

Commit

Permalink
Merge pull request #583 from ghousiya47/Weather-Analysis-#571
Browse files Browse the repository at this point in the history
Weather analysis #571
  • Loading branch information
abhisheks008 authored Feb 14, 2024
2 parents ea285df + d658f5d commit 8fdc00a
Show file tree
Hide file tree
Showing 16 changed files with 3,827 additions and 0 deletions.
48 changes: 48 additions & 0 deletions Weather Analysis/Dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Weather Analysis Dataset

The Dataset used here is taken from the Kaggle database website. You can download the file from the link given here, Weather Analysis and Prediction.( https://www.kaggle.com/datasets/mastmustu/weather-analysis)

## About the dataset

The data contains day wise weather attributes from 2009 to July 2020. Our CSV file has 22 columns and 3902 entries(Rows).

**Columns Description**:

- Date
- Average temperature (°F)
- Average humidity (%)
- Average dewpoint (°F)

- Average barometer (in)

- Average windspeed (mph)

- Average gustspeed (mph)

- Average direction (°deg)

- Rainfall for month (in)

- Rainfall for year (in)

- Maximum rain per minute

- Maximum temperature (°F)

- Minimum temperature (°F)

- Maximum humidity (%)

- Minimum humidity (%)

- Maximum pressure

- Minimum pressure

- Maximum windspeed (mph)

- Maximum gust speed (mph)

- Maximum heat index (°F)

- Month
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,659 changes: 3,659 additions & 0 deletions Weather Analysis/Model/Weather_Analysis.ipynb

Large diffs are not rendered by default.

111 changes: 111 additions & 0 deletions Weather Analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
<h1>Weather Analysis</h1>

**GOAL**

To build a machine learning model for predicting the Average Rainfall per month for a given atmospheric conditions like temperature, humidity , dewpoint, pressure, windspeed, etc.

**DATASET**

[https://www.kaggle.com/datasets/mastmustu/weather-analysis]

**DESCRIPTION**

To analyze the dataset of Weather Analysis and build and train the model on the basis of different features and variables.

The datasets have a csv file with 3902 entries, 22 columns.

**Columns Description**:

- Date
- Average temperature (°F)
- Average humidity (%)
- Average dewpoint (°F)
- Average barometer (in)
- Average windspeed (mph)
- Average gustspeed (mph)
- Average direction (°deg)
- Rainfall for month (in)
- Rainfall for year (in)
- Maximum rain per minute
- Maximum temperature (°F)
- Minimum temperature (°F)
- Maximum humidity (%)
- Minimum humidity (%)
- Maximum pressure
- Minimum pressure
- Maximum windspeed (mph)
- Maximum gust speed (mph)
- Maximum heat index (°F)
- Month


### Visualization and EDA of different attributes:

<img alt="Distribution" src="./Images/distribution plot 1.png">

<img alt="Distribution" src="./Images/distribution plot 2.png">

<img alt="Regression" src="./Images/avg barometer vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg dewpoint vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg humidity vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg temp vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg windspeed vs rainfall per mnth.png">

<img alt="Regression" src="./Images/max temp vs rainfall per mnth.png">

<img alt="Regression" src="./Images/month vs rainfall per month.png">


**MODELS USED**

| Model | MSE_train | R2_train | MSE_test | R2_test |
|---------------------------|-----------|----------|-----------|-----------|
|Random Forest Regression | 0.0126 | 0.965291 | 0.082938 | 0.773470 |
|XGBoost Regression | 0.0056 | 0.984504 | 0.089369 | 0.755905 |
|Decision Tree | 0.58e-34 | 1.000000 | 0.144070 | 0.606500 |
|Riddge Regression | 3.58e-34 | 1.000000 | 0.144070 | 0.606500 |
|Linear Regression | 0.274 | 0.243614 | 0.281541 | 0.231021 |
|Elastic Net Regression | 2.94e-01 | 0.190594 | 0.302724 | 0.173166 |
|Neural Network Regression | 0.358 | 0.076272 | 0.405645 |-0.107945 |


**WHAT I HAD DONE**

* Load the dataset which is CSV format.
* It has 3902 entries(Rows), 22 columns.
* Checked for missing values and cleaned the data accordingly.
* Analyzed the data, found insights and visualized them accordingly.
* Found detailed insights of different columns with target variable using plotting libraries.
* Train the datasets by different models and saves their accuracies into a dataframe.


**LIBRARIES NEEDED**

1. Pandas
2. Matplotlib
3. Sklearn
4. NumPy
5. XGBoost
6. Tensorflow
7. Keras
8. Sci-py
9. Seaborn



**CONCLUSION**

- Random Forest and XGBoost Regression models show promising performance with lower MSE and higher R-square values for both training set and dataset.
- Decision Tree Regression achieved perfect R-square on the training set but on the test set it's value is 0.6, indicating overfitting.
- Deep Neural Network (NN) has a high MSE and negative R-square on testing set, approximately zero on training set, suggesting poor performance on both training and test sets.


**YOUR NAME**

*Ghousiya Begum*

[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/ghousiya-begum-a9b634258/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ghousiya47)
9 changes: 9 additions & 0 deletions Weather Analysis/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
numpy==1.19.2
pandas==1.4.3
matplotlib==3.7.1
scikit-learn~=1.0.2
scipy==1.5.0
seaborn==0.10.1
xgboost~=1.5.2
tensorflow==2.4.1
keras==2.4.0

0 comments on commit 8fdc00a

Please sign in to comment.