# Build Logistic Regression Algorithm From Scratch and Apply It on Data set

320

1

## Introduction: Build Logistic Regression Algorithm From Scratch and Apply It on Data set

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set

Data set - Breast Cancer Wisconsin (Original) Data Set
This code and tutorial demonstrates logistic regression on the data set and also uses gradient descent to lower the BCE (binary cross entropy).

## Step 1: Pre Requisites

1. Knowledge of Python
2. Familiarity with linear regression and gradient descent
3. Installed libraries
• numpy
• pandas
• seaborn
• random
4. I have also included the code GitHub Link at the end

## Step 2: About the Data Set

1. Sample code number: id number
2. Clump Thickness: 1 - 10
3. Uniformity of Cell Size: 1 - 10
4. Uniformity of Cell Shape: 1 - 10
5. Marginal Adhesion: 1 - 10
6. Single Epithelial Cell Size: 1 - 10
7. Bare Nuclei: 1 - 10
8. Bland Chromatin: 1 - 10
9. Normal Nucleoli: 1 - 10
10. Mitoses: 1 - 10
11. Class: (2 for benign, 4 for malignant)

## Step 3: Logistic Regression Algorithm

• Use the sigmoid activation function -
• Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error
• Gradient Descent - Logistic regression -
• Conditions for E:

1. Convex or as convex as possible
2. Should be function of
3. Should be differentiable
• So use, Entropy =
• As we cant use both and y so use cross entropy as
• So add 2 cross entropies CE 1 = and CE 2 = . We get Binary Cross entropy (BCE) =
• So now our formula becomes,
• Using simple chain rule we obtain,
• Now apply Gradient Descent with this formula

## Step 4: Code

1. Data preprocessing
Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1.

2. sns.pairplot(df)
Create pair wise graphs for the features.

3. Do Principal component analysis for simplified learning.

4. full_data=np.matrix(full_data)
x0=np.ones((full_data.shape,1))
data=np.concatenate((x0,full_data),axis=1)
print(data.shape)
theta=np.zeros((1,data.shape-1))
print(theta.shape)
print(theta)
Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta.

5. test_size=0.2

X_train=data[:-int(test_size*len(full_data)),:-1]
Y_train=data[:-int(test_size*len(full_data)),-1]
X_test=data[-int(test_size*len(full_data)):,:-1]
Y_test=data[-int(test_size*len(full_data)):,-1]
Create the train-test split

6. def sigmoid(Z):
return 1/(1+np.exp(-Z))

def BCE(X,y,theta):
pred=sigmoid(np.dot(X,theta.T))
mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))
return mcost.mean()
Define the code for sigmoid function as mentioned and the BCE.

h=sigmoid(X.dot(theta.T))
loss=h-y
dj=(loss.T).dot(X)
theta -= (alpha/(len(X))*dj)
return theta
cost=BCE(X_train,Y_train,theta)
print("cost before: ",cost)
cost=BCE(X_train,Y_train,theta)
print("cost after: ",cost)
Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration.

8. def logistic_reg(epoch,X,y,theta,alpha):
for ep in range(epoch):
#update theta
#calculate new loss
if ((ep+1)%1000 == 0):
loss=BCE(X,y,theta)
print("Cost function ",loss)
return theta

theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)
Define the logistic regression with gradient descent code.

9. print(BCE(X_train,Y_train,theta))

print(BCE(X_test,Y_test,theta))
Finally test the code,

Now we are done with the code.

1. Multiclass Neural Networks
2. Random Forest classifier

Projects

Rishit Dagli

Website