Skip to content

Latest commit

 

History

History
31 lines (25 loc) · 1.67 KB

README.md

File metadata and controls

31 lines (25 loc) · 1.67 KB

ILPD-Data-Mining

  • Implemented and investigated performance of classification algorithms such as decision tree, K-nearest neighbors, logistic regression and random forest to classify patients with liver problems in a clinical data set.

  • Experimented and identified best features for different algorithms.

  • Performed data normalization using different methods (Min-Max, z-score).

  • Performed N-fold cross-validation on the data set.

  • Compared precision, recall and F-score of the algorithms.

  • This data set contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos.

Data Set Characteristics Number of Instances Area Attribute Characteristics Number of Attributes Date Donated Associated Tasks
Multivariate 583 Life Integer, Real 10 2012-05-21 Classification

Data Set Information:

  • This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from north east of Andhra Pradesh, India. Selector is a class label used to divide into groups(liver patient or not). This data set contains 441 male patient records and 142 female patient records.

  • Any patient whose age exceeded 89 is listed as being of age "90".

Attribute Information:

  1. Age Age of the patient
  2. Gender Gender of the patient
  3. TB Total Bilirubin
  4. DB Direct Bilirubin
  5. Alkphos Alkaline Phosphotase
  6. Sgpt Alamine Aminotransferase
  7. Sgot Aspartate Aminotransferase
  8. TP Total Protiens
  9. ALB Albumin
  10. A/G Ratio Albumin and Globulin Ratio
  11. Selector field used to split the data into two sets (labeled by the experts)