Extension: Multiple Logistic Regression

2.11. Extension: Multiple Logistic Regression#

In multiple linear regression we have multiple input variables. These linear regression models are of the form

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_N x_N\]

Just like multiple linear regression, we can have multiple logistic regression. These logistic models are of the form

\[y = \cfrac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_N x_N)}}\]

Let’s consider the following dataset:

Time Spent Studying (hours)

Assignment Mark (%)

Exam (Fail 0/Pass 1)

4.5

73

1

8

89

1

1.5

65

0

3.5

66

1

5.5

67

1

3

65

0

6.5

92

1

In this case we have 2 pieces of information

  • The amount each student studied

  • The assignment mark of each student

And our goal is to predict whether the student will pass or fail the exam.

../../_images/study_mark_classification.png

Just like with our linear regression model the main difference in the code is that our input variable x is a 2D array with \(n\) rows, one for each sample in the dataset and 2 columns since we now have 2 input variables.

Below is a complete example, including a prediction for a test student who studied 6 hours and has an assignment mark of 77.

from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np

data = pd.read_csv("pass_fail_assignment.csv")
x = data[["Time Spent Studying (hours)", "Assignment Mark (%)"]].to_numpy()
y = data["Exam Result"].to_numpy()

logistic_reg = LogisticRegression()
logistic_reg.fit(x, y)

x_test = np.array([[6, 77]])
print("predicted result: {}".format(logistic_reg.predict(x_test)))
Output
predicted result: [1]
Code Challenge: Extension: Multiple Logistic Regression

In addition to the hours of sunshine we also have humidity information. Let’s try adding this additional information into our logistic regression model. Then, we can evaluate this new model on the test data shown below by calculating the accuracy of the model.

Sunshine

Humidity

RainTomorrow

2.2

56

0

5.0

70

0

7.3

54

0

3.2

77

1

1.7

82

1

4.1

39

0

5.5

47

1

Instructions

  1. Copy and paste your code from Predicting With A Logistic Regression Model using rain.csv.

  2. Update training data

    • Modify x so that it contains both the 'Sunshine' and 'Humidity' columns.

    • Don’t modify y

  3. Fit the LogisticRegression model to the augmented training data

  4. Using the table above, add the Humidity data as another column to your test data

  5. Use .predict to predict whether it will rain the following day on the test data

  6. Create an array storing the actual classes of each test sample (you can copy this from Extension: Further Classification Metrics).

  7. Calculate the accuracy on the test data

  8. Print the accuracy

Your output should look like this:

Accuracy: X.XX

Calculating Accuracy

To calculate the accuracy:

  1. Create a variable to hold the total number of correct predictions

  2. Loop over each prediction

    • If the prediction matches the real value then increase the number of correct predictions by 1

  3. Divide the sum by the length of the test set

Solution

Solution is locked