Lung Cancer Detection using Convolutional Neural Network on the Cainvas Platform

6 min readSep 23, 2021

Introduction

The World Health Organization (WHO) states that Cancer is the leading cause of death worldwide and accounts for nearly 10 million deaths in 2020 alone. Out of all the different possible types of Cancer, Lung cancer was the most common cause of cancer death in 2020, accounting for about 1.80 million deaths.

Lung cancer occurs when cells of the lungs start dividing uncontrollably without dying off. This causes the growth of tumors which can reduce a person’s ability to breathe and spread to other parts of the body.

In this article, our main focus will revolve around the classification of the lung CT scans to identify whether the cases are Benign, Malignant or Normal. The use case will be implemented by creation of a Convolutional Neural Network on the Cainvas platform.

The dataset

The dataset used in this article can be fetched from here.

The IQ-OTHNCCD lung cancer dataset contains a total of 1190 images representing CT scan slices of 110 cases. These cases are grouped into three classes: normal, benign, and malignant. Out of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases.

In this article, we will use Tensorflow — an opensource software library that provide tools and resources to create machine learning algorithms, and Keras — an interface for the Tensorflow library for developing deep learning models , to create a convolutional neural network and try to accurately predict the classes of the white blood cells from the images of the blood samples.

The entire code will be written on the Cainvas platform’s Notebook Server for better performance as well as scaling the model later to use it in EDGE devices.

Setting Up the Platform

You can create an account on the Cainvas website here.

After successful creation of an account, login to the platform and go to the Dashboard section to open the Notebook Server.

Importing the necessary libraries

We will use some commonly used libraries like Numpy and Matplotlib. We will use OpenCV2 and Matplotlib to access the images and display it in the notebook.

Other imports include Tensorflow and Keras to create the convolutional neural network and perform preprocessing of the data to perform training on it.

Loading the dataset

Cainvas platform allows us to upload datasets on the platform which facilitates ease of use. These datasets can be then easily loaded on to the notebook and used with enough flexibility to create the model without any hassle.

In order to upload your dataset, you can head to the Pailette section which allows uploading of files, images, videos and even sensor data.

We will upload the dataset as a zip file in this article. The URL of the uploaded file can be obtained after the upload and used in the notebook to fetch it. To view the uploaded files just click on the Uploads sections. Click on the Copy URL button to copy the URL of the file.

We can use the URL with !wget command to load it in our notebook. We can then unzip the zip file in quiet mode using!unzip -qo filename.zip

We can access an image to check if the dataset has been loaded successfully.

Preparing the data

The training data is present inside three folders —Benign, Malignant & Normal. We will use the ImageDataGenerator offered by Keras to prepare the data and get appropriate labels pertaining to the folder structure. The generator also provides us the flexibility of creating train and validation split sets from the entire training dataset.

We can now check the labels fetched through the folder structure of our Training data.

array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.]], dtype=float32)

We receive one hot encoded vectors due to the categorical nature of the data. The index position of 1s indicate the corresponding class of the case(Benign, Malignant, Normal) detected from the CT Scans.

Creating the Model

As mentioned, we will create a Convolutional Neural Network to predict the correct classes of cells from the images. We have used 3 Conv2D layers with MaxPool2D layers after each for the feature extraction from the images. The activation function used is ReLU. The output layer has only three neurons corresponding to the three classes of tumors (Benign, Malignant, Normal), with Softmax activation function.

The model summary for the above created model is as follow:

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_32 (Conv2D)           (None, 62, 62, 32)        320       
_________________________________________________________________
max_pooling2d_32 (MaxPooling (None, 31, 31, 32)        0         
_________________________________________________________________
conv2d_33 (Conv2D)           (None, 29, 29, 64)        18496     
_________________________________________________________________
max_pooling2d_33 (MaxPooling (None, 9, 9, 64)          0         
_________________________________________________________________
conv2d_34 (Conv2D)           (None, 9, 9, 32)          18464     
_________________________________________________________________
max_pooling2d_34 (MaxPooling (None, 4, 4, 32)          0         
_________________________________________________________________
flatten_7 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense_25 (Dense)             (None, 32)                16416     
_________________________________________________________________
dense_26 (Dense)             (None, 64)                2112      
_________________________________________________________________
dense_27 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_28 (Dense)             (None, 3)                 99        
=================================================================
Total params: 57,987
Trainable params: 57,987
Non-trainable params: 0
_________________________________________________________________

We will use Early Stopping so that our model stops training if the monitored parameter does not change over time. This will make the training process more efficient and further avoid overfitting.

Compiling and Training the Model

We will compile the model with Adam as the optimizer and Categorical Crossentropy as the loss function. We will train the model for 50 epochs with the callback. We will store the accuracy, loss, val_accuracy and val_loss at each epoch in the history for plotting meaningful data later.

Accuracy & Loss

We will now plot the model performance at each epoch during the training phase.

Testing the Model

We will test the model by predicting the first 5 images in the test dataset created from the training set.

We will first fetch the one hot encoded predictions from the model on the unseen data.

We will then print the actual values with the predicted ones by displaying them in a more meaningful fashion.

ACTUAL: {0: 'Malignant', 1: 'Benign', 2: 'Malignant', 3: 'Normal', 4: 'Normal'}
PREDICTIONS: {0: 'Malignant', 1: 'Benign', 2: 'Malignant', 3: 'Normal', 4: 'Normal'}

We will now visualize the predictions for better insights

Visualized predictions of the first 5 images with actual and predicted classes

Conclusion

In this article, we saw how to predict the correct class of tumor from CT Scans of lungs to detect presence of cancer using a convolutional neural network created on the Cainvas Platform. We observed the capabilities of artificial intelligence and a simple use case of how it can be employed to automate healthcare systems and improve patient outcomes.

The Cainvas Platform provides a one stop solution to creating deep learning models which can also be compiled into EDGE device friendly models for using it in your IOT projects. The platform boasts of various other tools and resources to guide you for your next deep learning IOT project.