-
-
Notifications
You must be signed in to change notification settings - Fork 215
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #583 from ghousiya47/Weather-Analysis-#571
Weather analysis #571
- Loading branch information
Showing
16 changed files
with
3,827 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Weather Analysis Dataset | ||
|
||
The Dataset used here is taken from the Kaggle database website. You can download the file from the link given here, Weather Analysis and Prediction.( https://www.kaggle.com/datasets/mastmustu/weather-analysis) | ||
|
||
## About the dataset | ||
|
||
The data contains day wise weather attributes from 2009 to July 2020. Our CSV file has 22 columns and 3902 entries(Rows). | ||
|
||
**Columns Description**: | ||
|
||
- Date | ||
- Average temperature (°F) | ||
- Average humidity (%) | ||
- Average dewpoint (°F) | ||
|
||
- Average barometer (in) | ||
|
||
- Average windspeed (mph) | ||
|
||
- Average gustspeed (mph) | ||
|
||
- Average direction (°deg) | ||
|
||
- Rainfall for month (in) | ||
|
||
- Rainfall for year (in) | ||
|
||
- Maximum rain per minute | ||
|
||
- Maximum temperature (°F) | ||
|
||
- Minimum temperature (°F) | ||
|
||
- Maximum humidity (%) | ||
|
||
- Minimum humidity (%) | ||
|
||
- Maximum pressure | ||
|
||
- Minimum pressure | ||
|
||
- Maximum windspeed (mph) | ||
|
||
- Maximum gust speed (mph) | ||
|
||
- Maximum heat index (°F) | ||
|
||
- Month |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
<h1>Weather Analysis</h1> | ||
|
||
**GOAL** | ||
|
||
To build a machine learning model for predicting the Average Rainfall per month for a given atmospheric conditions like temperature, humidity , dewpoint, pressure, windspeed, etc. | ||
|
||
**DATASET** | ||
|
||
[https://www.kaggle.com/datasets/mastmustu/weather-analysis] | ||
|
||
**DESCRIPTION** | ||
|
||
To analyze the dataset of Weather Analysis and build and train the model on the basis of different features and variables. | ||
|
||
The datasets have a csv file with 3902 entries, 22 columns. | ||
|
||
**Columns Description**: | ||
|
||
- Date | ||
- Average temperature (°F) | ||
- Average humidity (%) | ||
- Average dewpoint (°F) | ||
- Average barometer (in) | ||
- Average windspeed (mph) | ||
- Average gustspeed (mph) | ||
- Average direction (°deg) | ||
- Rainfall for month (in) | ||
- Rainfall for year (in) | ||
- Maximum rain per minute | ||
- Maximum temperature (°F) | ||
- Minimum temperature (°F) | ||
- Maximum humidity (%) | ||
- Minimum humidity (%) | ||
- Maximum pressure | ||
- Minimum pressure | ||
- Maximum windspeed (mph) | ||
- Maximum gust speed (mph) | ||
- Maximum heat index (°F) | ||
- Month | ||
|
||
|
||
### Visualization and EDA of different attributes: | ||
|
||
<img alt="Distribution" src="./Images/distribution plot 1.png"> | ||
|
||
<img alt="Distribution" src="./Images/distribution plot 2.png"> | ||
|
||
<img alt="Regression" src="./Images/avg barometer vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/avg dewpoint vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/avg humidity vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/avg temp vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/avg windspeed vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/max temp vs rainfall per mnth.png"> | ||
|
||
<img alt="Regression" src="./Images/month vs rainfall per month.png"> | ||
|
||
|
||
**MODELS USED** | ||
|
||
| Model | MSE_train | R2_train | MSE_test | R2_test | | ||
|---------------------------|-----------|----------|-----------|-----------| | ||
|Random Forest Regression | 0.0126 | 0.965291 | 0.082938 | 0.773470 | | ||
|XGBoost Regression | 0.0056 | 0.984504 | 0.089369 | 0.755905 | | ||
|Decision Tree | 0.58e-34 | 1.000000 | 0.144070 | 0.606500 | | ||
|Riddge Regression | 3.58e-34 | 1.000000 | 0.144070 | 0.606500 | | ||
|Linear Regression | 0.274 | 0.243614 | 0.281541 | 0.231021 | | ||
|Elastic Net Regression | 2.94e-01 | 0.190594 | 0.302724 | 0.173166 | | ||
|Neural Network Regression | 0.358 | 0.076272 | 0.405645 |-0.107945 | | ||
|
||
|
||
**WHAT I HAD DONE** | ||
|
||
* Load the dataset which is CSV format. | ||
* It has 3902 entries(Rows), 22 columns. | ||
* Checked for missing values and cleaned the data accordingly. | ||
* Analyzed the data, found insights and visualized them accordingly. | ||
* Found detailed insights of different columns with target variable using plotting libraries. | ||
* Train the datasets by different models and saves their accuracies into a dataframe. | ||
|
||
|
||
**LIBRARIES NEEDED** | ||
|
||
1. Pandas | ||
2. Matplotlib | ||
3. Sklearn | ||
4. NumPy | ||
5. XGBoost | ||
6. Tensorflow | ||
7. Keras | ||
8. Sci-py | ||
9. Seaborn | ||
|
||
|
||
|
||
**CONCLUSION** | ||
|
||
- Random Forest and XGBoost Regression models show promising performance with lower MSE and higher R-square values for both training set and dataset. | ||
- Decision Tree Regression achieved perfect R-square on the training set but on the test set it's value is 0.6, indicating overfitting. | ||
- Deep Neural Network (NN) has a high MSE and negative R-square on testing set, approximately zero on training set, suggesting poor performance on both training and test sets. | ||
|
||
|
||
**YOUR NAME** | ||
|
||
*Ghousiya Begum* | ||
|
||
[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/ghousiya-begum-a9b634258/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ghousiya47) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
numpy==1.19.2 | ||
pandas==1.4.3 | ||
matplotlib==3.7.1 | ||
scikit-learn~=1.0.2 | ||
scipy==1.5.0 | ||
seaborn==0.10.1 | ||
xgboost~=1.5.2 | ||
tensorflow==2.4.1 | ||
keras==2.4.0 |