Build Logistic Regression Algorithm From Scratch and Apply It on Data set

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set

Data set - Breast Cancer Wisconsin (Original) Data Set
This code and tutorial demonstrates logistic regression on the data set and also uses gradient descent to lower the BCE (binary cross entropy).

Teacher Notes

Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.

Step 1: Pre Requisites

  1. Knowledge of Python
  2. Familiarity with linear regression and gradient descent
  3. Installed libraries
    • numpy
    • pandas
    • seaborn
    • random
  4. I have also included the code GitHub Link at the end

Step 2: About the Data Set

  1. Sample code number: id number
  2. Clump Thickness: 1 - 10
  3. Uniformity of Cell Size: 1 - 10
  4. Uniformity of Cell Shape: 1 - 10
  5. Marginal Adhesion: 1 - 10
  6. Single Epithelial Cell Size: 1 - 10
  7. Bare Nuclei: 1 - 10
  8. Bland Chromatin: 1 - 10
  9. Normal Nucleoli: 1 - 10
  10. Mitoses: 1 - 10
  11. Class: (2 for benign, 4 for malignant)

Step 3: Logistic Regression Algorithm


  • Use the sigmoid activation function -
  • Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error
  • Gradient Descent - Logistic regression -
  • Conditions for E:

    1. Convex or as convex as possible
    2. Should be function of
    3. Should be differentiable
  • So use, Entropy =
  • As we cant use both and y so use cross entropy as
  • So add 2 cross entropies CE 1 = and CE 2 = . We get Binary Cross entropy (BCE) =
  • So now our formula becomes,
  • Using simple chain rule we obtain,
  • Now apply Gradient Descent with this formula

Step 4: Code

  1. Data preprocessing
    Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1.

  2. sns.pairplot(df)
    Create pair wise graphs for the features.

  3. Do Principal component analysis for simplified learning.

  4. full_data=np.matrix(full_data)
    x0=np.ones((full_data.shape[0],1))
    data=np.concatenate((x0,full_data),axis=1)
    print(data.shape)
    theta=np.zeros((1,data.shape[1]-1))
    print(theta.shape)
    print(theta)
    Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta.

  5. test_size=0.2

    X_train=data[:-int(test_size*len(full_data)),:-1]
    Y_train=data[:-int(test_size*len(full_data)),-1]
    X_test=data[-int(test_size*len(full_data)):,:-1]
    Y_test=data[-int(test_size*len(full_data)):,-1]
    Create the train-test split

  6. def sigmoid(Z):
    return 1/(1+np.exp(-Z))

    def BCE(X,y,theta):
    pred=sigmoid(np.dot(X,theta.T))
    mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))
    return mcost.mean()
    Define the code for sigmoid function as mentioned and the BCE.

  7. def grad_descent(X,y,theta,alpha):
    h=sigmoid(X.dot(theta.T))
    loss=h-y
    dj=(loss.T).dot(X)
    theta -= (alpha/(len(X))*dj)
    return theta
    cost=BCE(X_train,Y_train,theta)
    print("cost before: ",cost)
    theta=grad_descent(X_train,Y_train,theta,alpha)
    cost=BCE(X_train,Y_train,theta)
    print("cost after: ",cost)
    Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration.

  8. def logistic_reg(epoch,X,y,theta,alpha):
    for ep in range(epoch):
    #update theta
    theta=grad_descent(X,y,theta,alpha)
    #calculate new loss
    if ((ep+1)%1000 == 0):
    loss=BCE(X,y,theta)
    print("Cost function ",loss)
    return theta

    theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)
    Define the logistic regression with gradient descent code.

  9. print(BCE(X_train,Y_train,theta))

    print(BCE(X_test,Y_test,theta))
    Finally test the code,

Now we are done with the code.

Step 5: Additional Reading

1. Multiclass Neural Networks
2. Random Forest classifier

Projects

GitHub

Rishit Dagli

Website
LinkedIn

Be the First to Share

    Recommendations

    • CNC Contest

      CNC Contest
    • Make it Move

      Make it Move
    • Teacher Contest

      Teacher Contest

    Discussions