Skip to content

Latest commit

 

History

History
453 lines (403 loc) · 18.1 KB

File metadata and controls

453 lines (403 loc) · 18.1 KB

Breast-cancer-prediction-ML-Python

GitHub followers Twitter Twitter Follow

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
Dataset - Breast Cancer Wisconsin (Original) Data Set
This code demonstrates logistic regression on the dataset and also uses gradient descent to lower the BCE(binary cross entropy).

Dataset description

  1. Sample code number: id number
  2. Clump Thickness: 1 - 10
  3. Uniformity of Cell Size: 1 - 10
  4. Uniformity of Cell Shape: 1 - 10
  5. Marginal Adhesion: 1 - 10
  6. Single Epithelial Cell Size: 1 - 10
  7. Bare Nuclei: 1 - 10
  8. Bland Chromatin: 1 - 10
  9. Normal Nucleoli: 1 - 10
  10. Mitoses: 1 - 10
  11. Class: (2 for benign, 4 for malignant)

Libraries required

  1. numpy
    pip install numpy
  2. pandas
    pip install pandas
  3. random
    pip install random
  4. seaborn
    pip install seaborn

Logistic regression algorithm

  • Use the sigmoid activation function -
  • Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error
  • Gradient Descent - Logistic regression -
  • Conditions for E:
    1. Convex or as convex as possible
    2. Should be function of
    3. Should be differentiable
  • So use, Entropy =
  • As we cant use both and y so use cross entropy as
  • So add 2 cross entropies CE 1 = and CE 2 = . We get Binary Cross entropy (BCE) =
  • So now our formula becomes,
  • Using simple chain rule we obtain,
  • Now apply Gradient Descent with this formula

Code

  1. Data preprocessing
    Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1.
  2. sns.pairplot(df)
    Create pair wisegraphs for the features.
  3. Do Principal component analysis for simplified learning.
  4. full_data=np.matrix(full_data)
    x0=np.ones((full_data.shape[0],1)) data=np.concatenate((x0,full_data),axis=1)
    print(data.shape)
    theta=np.zeros((1,data.shape[1]-1))
    print(theta.shape)
    print(theta)

    Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta.
  5. test_size=0.2
    X_train=data[:-int(test_size*len(full_data)),:-1]
    Y_train=data[:-int(test_size*len(full_data)),-1]
    X_test=data[-int(test_size*len(full_data)):,:-1]
    Y_test=data[-int(test_size*len(full_data)):,-1]

    Create the train-test split
  6. def sigmoid(Z):
      return 1/(1+np.exp(-Z))

    def BCE(X,y,theta):
      pred=sigmoid(np.dot(X,theta.T))
      mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))
      return mcost.mean()

    Define the code for sigmoid function as mentioned and the BCE.
  7. def grad_descent(X,y,theta,alpha):
      h=sigmoid(X.dot(theta.T))
      loss=h-y
      dj=(loss.T).dot(X)
      theta -= (alpha/(len(X))*dj)
      return theta
    cost=BCE(X_train,Y_train,theta)
    print("cost before: ",cost)
    theta=grad_descent(X_train,Y_train,theta,alpha)
    cost=BCE(X_train,Y_train,theta)
    print("cost after: ",cost)

    Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration.
  8. def logistic_reg(epoch,X,y,theta,alpha):
      for ep in range(epoch):
    #update theta
      theta=grad_descent(X,y,theta,alpha)
    #calculate new loss
      if ((ep+1)%1000 == 0):
        loss=BCE(X,y,theta)
        print("Cost function ",loss)
      return theta

    theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)

    Define the logistic regression with gradient descent code.
  9. print(BCE(X_train,Y_train,theta))

    print(BCE(X_test,Y_test,theta))

    Finally test the code,

Now we are done with the code 😀

The Algorithm as a web service

Python 3+

import urllib.request
import json

data = {
        "Inputs": {
                "input1":
                [
                    {
                            '1': "4",   
                            '2': "7",   
                            '3': "3",   
                            '5': "5",   
                            '1000025': "1002945",   
                            '1 (2)': "4",   
                            '1 (3)': "5",   
                            '1 (4)': "10",   
                            '1 (5)': "2",   
                            '1 (6)': "1",   
                            '2 (2)': "2",   
                    }
                ],
        },
    "GlobalParameters":  {
    }
}

body = str.encode(json.dumps(data))

url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(json.loads(error.read().decode("utf8", 'ignore')))

Python

import urllib2
import json

data = {
        "Inputs": {
                "input1":
                [
                    {
                            '1': "4",   
                            '2': "7",   
                            '3': "3",   
                            '5': "5",   
                            '1000025': "1002945",   
                            '1 (2)': "4",   
                            '1 (3)': "5",   
                            '1 (4)': "10",   
                            '1 (5)': "2",   
                            '1 (6)': "1",   
                            '2 (2)': "2",   
                    }
                ],
        },
    "GlobalParameters":  {
    }
}

body = str.encode(json.dumps(data))

url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib2.Request(url, body, headers)

try:
    response = urllib2.urlopen(req)

    result = response.read()
    print(result)
except urllib2.HTTPError, error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(json.loads(error.read())) 

R

library("RCurl")
library("rjson")

# Accept SSL certificates issued by public Certificate Authorities
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

h = basicTextGatherer()
hdr = basicHeaderGatherer()

req =  list(
    Inputs = list(
            "input1"= list(
                list(
                        '1' = "4",
                        '2' = "7",
                        '3' = "3",
                        '5' = "5",
                        '1000025' = "1002945",
                        '1 (2)' = "4",
                        '1 (3)' = "5",
                        '1 (4)' = "10",
                        '1 (5)' = "2",
                        '1 (6)' = "1",
                        '2 (2)' = "2"
                    )
            )
        ),
        GlobalParameters = setNames(fromJSON('{}'), character(0))
)

body = enc2utf8(toJSON(req))
api_key = "abc123" # Replace this with the API key for the web service
authz_hdr = paste('Bearer', api_key, sep=' ')

h$reset()
curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger",
httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
postfields=body,
writefunction = h$update,
headerfunction = hdr$update,
verbose = TRUE
)

headers = hdr$value()
httpStatus = headers["status"]
if (httpStatus >= 400)
{
print(paste("The request failed with status code:", httpStatus, sep=" "))

# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(headers)
}

print("Result:")
result = h$value()
print(fromJSON(result))

C#

// This code requires the Nuget package Microsoft.AspNet.WebApi.Client to be installed.
// Instructions for doing this in Visual Studio:
// Tools -> Nuget Package Manager -> Package Manager Console
// Install-Package Microsoft.AspNet.WebApi.Client

using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Net.Http.Formatting;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;

namespace CallRequestResponseService
{
    class Program
    {
        static void Main(string[] args)
        {
            InvokeRequestResponseService().Wait();
        }

        static async Task InvokeRequestResponseService()
        {
            using (var client = new HttpClient())
            {
                var scoreRequest = new
                {
                    Inputs = new Dictionary<string, List<Dictionary<string, string>>> () {
                        {
                            "input1",
                            new List<Dictionary<string, string>>(){new Dictionary<string, string>(){
                                            {
                                                "1", "4"
                                            },
                                            {
                                                "2", "7"
                                            },
                                            {
                                                "3", "3"
                                            },
                                            {
                                                "5", "5"
                                            },
                                            {
                                                "1000025", "1002945"
                                            },
                                            {
                                                "1 (2)", "4"
                                            },
                                            {
                                                "1 (3)", "5"
                                            },
                                            {
                                                "1 (4)", "10"
                                            },
                                            {
                                                "1 (5)", "2"
                                            },
                                            {
                                                "1 (6)", "1"
                                            },
                                            {
                                                "2 (2)", "2"
                                            },
                                }
                            }
                        },
                    },
                    GlobalParameters = new Dictionary<string, string>() {
                    }
                };

                const string apiKey = "abc123"; // Replace this with the API key for the web service
                client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue( "Bearer", apiKey);
                client.BaseAddress = new Uri("https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger");

                // WARNING: The 'await' statement below can result in a deadlock
                // if you are calling this code from the UI thread of an ASP.Net application.
                // One way to address this would be to call ConfigureAwait(false)
                // so that the execution does not attempt to resume on the original context.
                // For instance, replace code such as:
                //      result = await DoSomeTask()
                // with the following:
                //      result = await DoSomeTask().ConfigureAwait(false)

                HttpResponseMessage response = await client.PostAsJsonAsync("", scoreRequest);

                if (response.IsSuccessStatusCode)
                {
                    string result = await response.Content.ReadAsStringAsync();
                    Console.WriteLine("Result: {0}", result);
                }
                else
                {
                    Console.WriteLine(string.Format("The request failed with status code: {0}", response.StatusCode));

                    // Print the headers - they include the requert ID and the timestamp,
                    // which are useful for debugging the failure
                    Console.WriteLine(response.Headers.ToString());

                    string responseContent = await response.Content.ReadAsStringAsync();
                    Console.WriteLine(responseContent);
                }
            }
        }
    }
}

More about the project

  1. My medium article on same - here
  2. My research paper on this - here
  3. Another must read paper about the same topic -here

Other algorithms for same project by me

  1. Multiclass Neural Networks
  2. Random Forest classifier
    Project

About me

Rishit Dagli
Website
LinkedIn