Skip to content

Commit

Permalink
Some distance metrics added (#558)
Browse files Browse the repository at this point in the history
* add : metrics (`KoppenI`, `KoppenII`, `KuderRichardson`, `KuhnsI`, `KuhnsII`) added.

* log : changes logged.

* update : document for distance updated.

* test : tests updated.

* fix : `KoppenI` fixed.

* fix : tests fixed.

* fix : `KuderRichardson_calc` calculation fixed.

* update : `Distance` notebook fixed.

* doc : Distance.ipynb updated

* fix : KoppenI_calc function updated

* fix : tests updated

* fix : autopep8

* doc : Distance.ipynb updated

* doc : minor bug in document fixed

---------

Co-authored-by: sepandhaghighi <[email protected]>
  • Loading branch information
sadrasabouri and sepandhaghighi authored Oct 5, 2024
1 parent 3f6975f commit fd220a5
Show file tree
Hide file tree
Showing 6 changed files with 421 additions and 6 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]
### Added
- 5 new distance/similarity
1. KoppenI
2. KoppenII
3. KuderRichardson
4. KuhnsI
5. KuhnsII
- `feature_request.yml` template
- `config.yml` for issue template
- `SECURITY.md`
Expand Down
280 changes: 279 additions & 1 deletion Document/Distance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2773,6 +2773,278 @@
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Köppen I"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Köppen I correlation [[38]](#ref38)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$sim_{KoppenI} =\n",
"\\frac{\\frac{2 \\times TP+FP+FN}{2}.\\frac{2 \\times TN+FP+FN}{2} - \\frac{FP+FN}{2}}\n",
"{\\frac{2 \\times TP+FP+FN}{2}.\\frac{2 \\times TN+FP+FN}{2}}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 0.96875, 1: 0.9368421052631579, 2: 0.9300699300699301}"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.distance(metric=DistanceType.KoppenI)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li><span style=\"color:red;\">Notice </span> : new in <span style=\"color:red;\">version 4.1</span> </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Köppen II"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Köppen II correlation [[38]](#ref38)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$sim_{KoppenII} =\n",
"TP + \\frac{FP + FN}{2}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 4.0, 1: 2.5, 2: 5.5}"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.distance(metric=DistanceType.KoppenII)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li><span style=\"color:red;\">Notice </span> : new in <span style=\"color:red;\">version 4.1</span> </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KuderRichardson"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kuder & Richardson correlation [[39]](#ref39)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$corr_{KuderRichardson} =\n",
"\\frac{4 \\times (TP \\times TN - FP \\times FN)}\n",
"{(TP+FP)(FN+TN) + (TP+FN)(FP+TN) + 2(TP \\times TN - FP \\times FN)}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 0.8076923076923077, 1: 0.4067796610169492, 2: 0.2891566265060241}"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.distance(metric=DistanceType.KuderRichardson)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li><span style=\"color:red;\">Notice </span> : new in <span style=\"color:red;\">version 4.1</span> </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KuhnsI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kuhns I correlation [[40]](#ref40)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$corr_{KuhnsI} =\n",
"\\frac{2 \\times \\delta(TP + FP, TP + FN)}\n",
"{N}\n",
"$$\n",
"\n",
"$$\n",
"\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 0.2916666666666667, 1: 0.08333333333333333, 2: 0.08333333333333333}"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.distance(metric=DistanceType.KuhnsI)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li><span style=\"color:red;\">Notice </span> : new in <span style=\"color:red;\">version 4.1</span> </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KuhnsII"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kuhns II correlation [[40]](#ref40)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$corr_{KuhnsII} =\n",
"\\frac{\\delta(TP + FP, TP + FN)}\n",
"{\\max(TP + FP, TP + FN)}\n",
"$$\n",
"\n",
"$$\n",
"\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 0.35, 1: 0.16666666666666666, 2: 0.08333333333333333}"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.distance(metric=DistanceType.KuhnsII)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li><span style=\"color:red;\">Notice </span> : new in <span style=\"color:red;\">version 4.1</span> </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -2856,7 +3128,13 @@
"\n",
"<blockquote id=\"ref36\">36- M. G. Kendall, \"A new measure of rank correlation,\" <i>Biometrika</i>, vol. 30, no. 1/2, pp. 81-93, 1938.</blockquote>\n",
"\n",
"<blockquote id=\"ref37\">37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" <i>Handbook of behavioral assessment</i>, pp. 279-328, 1977.</blockquote>"
"<blockquote id=\"ref37\">37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" <i>Handbook of behavioral assessment</i>, pp. 279-328, 1977.</blockquote>\n",
"\n",
"<blockquote id=\"ref38\">38- W. Köppen, \"In Repertorium für Meteorologie,\" <i>Akademiia Nauk</i>, pp. 189–238, 1870.</blockquote>\n",
"\n",
"<blockquote id=\"ref39\">39- G. F. Kuder and M. W. Richardson, \"The theory of the estimation of test reliability,\" <i>Psychometrika</i>, pp. 151–160, 1937.</blockquote>\n",
"\n",
"<blockquote id=\"ref40\">40- J. L. Kuhns, \"Statistical Association Methods for Mechanized Documentation,\" <i>National Bureau of Standards Miscellaneous Publication</i>, pp. 33-40, 1964.</blockquote>"
]
}
],
Expand Down
6 changes: 3 additions & 3 deletions Otherfiles/notebook_to_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"Example8"]

MAIN_DOCS_LIST = ["Distance",
"Document"]
"Document"]

NOTEBOOK_EXTENSION = ".ipynb"

Expand Down Expand Up @@ -61,7 +61,7 @@
nb = nbformat.read(f, as_version=4)
ep.preprocess(
nb, {
'metadata': {
'metadata': {
'path': OUTPUT_FOLDER_PATH}})
with open(notebook_copy_path, 'w', encoding='utf-8') as f:
nbformat.write(nb, f)
Expand Down Expand Up @@ -89,7 +89,7 @@
nb = nbformat.read(f, as_version=4)
ep.preprocess(
nb, {
'metadata': {
'metadata': {
'path': OUTPUT_FOLDER_PATH}})
with open(notebook_copy_path, 'w', encoding='utf-8') as f:
nbformat.write(nb, f)
Expand Down
10 changes: 10 additions & 0 deletions Test/verified_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,16 @@
>>> assert isclose(cm2.distance(metric=DistanceType.KentFosterI)[1], -0.23529411764705888, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm1.distance(metric=DistanceType.KentFosterII)[1], -0.0012804097311239404, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm2.distance(metric=DistanceType.KentFosterII)[1], -0.002196997436837158, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm1.distance(metric=DistanceType.KoppenI)[1], 0.9993589743589744, abs_tol=ABS_TOL, rel_tol=REL_TOL) # normalizer: None
>>> assert isclose(cm2.distance(metric=DistanceType.KoppenI)[1], 0.9991825772172593, abs_tol=ABS_TOL, rel_tol=REL_TOL) # normalizer: None
>>> assert isclose(cm1.distance(metric=DistanceType.KoppenII)[1], 4.0, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm2.distance(metric=DistanceType.KoppenII)[1], 5.5, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm1.distance(metric=DistanceType.KuderRichardson)[1], 0.6643835616438356, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm2.distance(metric=DistanceType.KuderRichardson)[1], 0.5285677463699631, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm1.distance(metric=DistanceType.KuhnsI)[1], 0.005049979175343606, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm2.distance(metric=DistanceType.KuhnsI)[1], 0.005004425239483548, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm1.distance(metric=DistanceType.KuhnsII)[1], 0.49489795918367346, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> assert isclose(cm2.distance(metric=DistanceType.KuhnsII)[1], 0.32695578231292516, abs_tol=ABS_TOL, rel_tol=REL_TOL)
>>> mlcm = MultiLabelCM(actual_vector=[{"cat", "bird"}, {"dog"}], predict_vector=[{"cat"}, {"dog", "bird"}], classes=["cat", "dog", "bird"]) # Verified Case -- (http://bitly.ws/GNq2)
>>> mlcm.actual_vector_multihot
[[1, 0, 1], [0, 1, 0]]
Expand Down
Loading

0 comments on commit fd220a5

Please sign in to comment.