Skip to content

Latest commit

 

History

History
47 lines (32 loc) · 2.88 KB

README.md

File metadata and controls

47 lines (32 loc) · 2.88 KB

Solar Correlation Map Plot in R language

Port of python library to provide correlation visualization between explanatory and dependent variables as well as between explanatory variables pairs. The original python library source code can be found in the following GitHub repository

Overview

With help of provided tool its easy to find visually correlations between dependent and explanatory (input) variables. Also tool visualise correlation between explanatory (input) variables allowing to reduce feature space by replacing set of inter-correlated variables with one.

The level of correlation between dependent and explanatory variables visualized as set of orbits arround "sun" (dependent variable). The closer to the "sun" variables has higher Pearson correlation coefficient.

The inter-correlated explanatory variables visualized as "moons" arround specific planet. We have defined inter-correlated variables as explanatory variables with Pearson correlation coefficient score higher then 0.8. Usually, a strong correlation is anything above a Pearson coefficient of 0.5.

The positive and negative correlation between explanatory and dependent variables expressed as color of variables names. The green color is for positive correlation (the more, the better) and red color is for negative correlation (the less is better).

Usage

Command line from project root directory:

Rscript src/sol_corr_map.R -d CSV_FILE_PATH -v DV

where:

  • CSV_FILE_PATH: the path to the CSV file with data (should have header)
  • DV: the name of dependent variable in the provided data set (column name)

The resulting plot will be saved as Rplots.pdf in working directory.

From R script file

Just make sure to source src/sol_corr_map.R into your script and invoke plotSolarCorrelation function. See source file for input parameters.

Example

Download Boston Housing prices data set and create CSV file from provided housing.data and housing.names by combining columns names from last file with data corpus from former one.

Save it into data directory and issue following command:

Rscript src/sol_corr_map.R -d data/housing.csv -v MEDV

The resulting plot:

alt text

References

  1. Original python library
  2. A new visualization to beautifully explore correlations