Lung cancer classification using convolutional neural network and DenseNet

ABSTRACT


INTRODUCTION
Uncontrolled cell proliferation is the hallmark of the condition known as cancer, which can damage a number of body organs.According to data [1] about one million new instances of cancer are reported each year.The lung, breast, colon, prostate, and many other body parts are just a few of the places where cancer can develop.Genetic factors, contact with carcinogens, bad lifestyles, and environmental variables are only a few of the causes of cancer that might differ [2].In 2019, lung cancer became a deadly scourge in Poland.About 20% of the country's population is affected by lung cancer, with 9.9% of women and 16.1% of men diagnosed [3].Lung cancer is one of the most common diseases and a leading cause of death [4].Lung cancer is usually associated with active and passive smoking.However, not all cases of lung cancer are related to smoking.One of the most prevalent illnesses and a major cause of death is lung cancer.Lung cancer is a type of cancer that begins in the bronchial mucosa or lung glands.Cancer cells cause serious harm to the patient's respiratory system and impede oxygen exchange as they proliferate and spread [5].Regular smoking-both actively and passively-is linked to lung cancer.However, not all lung cancer occurrences can be attributed to smoking.Asbestos exposure, air pollution, ancestry, and certain viral infections like the human papillomavirus (HPV) are additional risk factors [6].
The best way to improve the prognosis and therapy for lung cancer is through early identification and correct classification.But this is certainly not an easy thing.It is difficult for radiologists to distinguish between benign and malignant small-to-intermediate sized, 5-15 mm, lung nodules seen by computed tomography (CT) [7].By creating classification techniques that are capable of accurately identifying the kind and stage of lung cancer, information technology and artificial intelligence have made significant advances in this subject.However, there is a big problem aimed at cancer experts in real time.Therefore, the presence of artificial intelligence methods to perform classification and early detection in cancer is especially needed [8].
With the increase in lung cancer cases, the ever-increasing population, and the lack of adequate clinical services, artificial intelligence is very useful in this regard.Using artificial intelligence techniques such as Artificial Neural Networks (ANN) [9], Random Forest [10], and Convolutional Neural Networks (CNN) [11], numerous studies have been undertaken in recent years to develop effective and precise lung cancer classification systems.To characterize lung cancer, this approach extracts features even at a very high level of resolution from clinical data, genomic data, or medical imaging [12].The availability of larger data sets and technological advancements have made this method the most accurate in classifying lung cancer.In data analysis, classification is a part of the data analysis process that is often used to extract that can describe data classes and future data trends [13].
A multidimensional, region-based, fully convolutional network (mRFCN) [14] has been utilized in earlier experiments to categorize lung nodules.Features from CT images are extracted using this technique.The findings indicate that this algorithm's classification accuracy is 97.91%.The research's findings indicate that accuracy is fairly high.However, this needs to be improved in order to provide appropriate therapy and increase the classification accuracy of lung cancer.Additionally, there are a number of disadvantages to using mRFCN, such as the need for relatively large datasets, complex models with numerous convolutional layers and multi-round functions, significant memory usage, and a high sensitivity to hyperparameter changes [15].
Convolutional Neural Networks (CNN) are one of the artificial intelligence approaches that are frequently employed in the classification of lung cancer [16].An artificial neural network architecture called CNN was created primarily to evaluate image data.In the case of lung cancer, CNN can analyze intricate patterns on x-rays of the lung and precisely pinpoint the presence of cancer [17].Based on medical pictures of the lungs, CNN has been successfully used in several studies to detect cancer [18]- [20], distinguish between benign and malignant cancer, and forecast patient prognosis.Currently, a CNN design with dense connections between layers is known as DenseNet [21].Each DenseNet layer is intimately connected to every layer that came before it, allowing for a reliable and interconnected flow of data.DenseNet is able to accomplish more robust feature representation because of this, which allows it to avoid performance degradation overshoot problems in deeper artificial neural networks [22].
Overall, CNN and DenseNet's combined application in lung cancer classification shows significant promise for enhancing the precision, dependability, and effectiveness of early diagnosis and classification of lung cancer.This method has been utilized in earlier research to detect lung cancer, distinguish between benign and malignant tumors, and estimate patient prognosis from pictures of the lungs.To aid physicians in early identification and enhance lung cancer therapy, this research is anticipated to result in a classification system that can accurately identify and distinguish between malignant and benign lung cancer.

METHOD
Figure 1 is a flowchart of the proposed method.From figure 1 above, it can be seen that the input goes through 2 processes before being trained to get its accuracy, namely, passing through processing and also building a CNN model.

Model Architecture
Figure 2 is the architecture of the model, as can be seen in the image below.

Create Hyperparameter
When creating a hyperparameter, the first thing we do is specify the parameters that will be used.In this case, we use several parameters, including BATCH_SIZE, IMAGE_SIZE, EPOCHS, and CHANNELS.According to the key parameters listed above, each has a specific function.For example, batch size is used to manage how much data will be inserted into the model [23], and image size is used to specify the size of each particular car that will be lowered [24].Then there is EPOCHS, which is used to monitor how many iterations must be performed in order to achieve accurate results [25].Later, the third channel was used to broadcast numerous warnings from the model training.

Preprocessing
The purpose of this pre-processing stage is to enhance image quality and relevance prior to further analysis and processing [26] and to make the process of image classification easier.2073 photos in total are sorted into three groups in this procedure: normal cases, benign cases, and malignant instances.The cv2.imread command kept in the IMG file is used to read the images once they have been divided into three categories.Next, the image's original size is computed using np.shape().The presence of a color channel is indicated if the image has a value larger than 2. Following that, cv.imread().shape is used to determine the loaded image's height, width, and color channel number.Image data is calculated by color channel and then displayed according to kind [27].This will facilitate data visualization and guarantee that the image data corresponds with the labels that have been supplied.

Rescaling
By altering the image size variable and defining the required size for the image you wish to scale, the rescaling method attempts to resize the image in its implementation [28].The image is scaled using a linear change within the scale itself.The calculation of this linear transformation itself ensures that the image assignment has a minimum value of 1 between 0 and its highest value [29].The rescaled results can be seen in Figure 3.

Train Test Split Data
Method for dividing datasets to make model processing easier afterwards.In order to divide the items above, randomization (shuffle) is used.We include a shuffle parameter with a true value, which will randomly generate the data using a seed value of 12.The order of the data will be randomly generated.

Callback
A callback is a method that functions to automate a command when the model is being trained [30].For example, if you want to train a model with a maximum accuracy rate of 99.92%, when the epoch is in progress and the accuracy reaches 99.92%, the model training will stop automatically.

Train-Test Model
In this training, we use the CNN and DenseNet architectures to get better results than previous research.We use several layers in the CNN architecture, including Conv2D, with a filter number of 32, kernel size (3,3), and MaxPooling2D, with a size (2,2).And we also added 5 Conv2D layers, which have 64 filters.Apart from using the 2 layers above, we also added flatten and dense, which have 128 filters and 2 activations: relu and softmax.

RESULTS AND DISCUSSIONS 3.1 Preprocessing Result
Combining the multi-layer Convolutional Neural Network (CNN) and DenseNet algorithms yields results for classifying lung cancer images.Preprocessing data which contains resize and rescale definitions, serves as the foundation of the model architecture for this categorization.Then, do more intricate feature extraction by switching to the Conv2D and MaxPooling2D levels and repeating the process numerous times with an increasing number of filters (64 at each Conv2D level).The output from the preceding layer is then transformed using the flatten layer into a 1D vector that can be utilized as the dense input layer.The collected features are then linked to the relevant class labels by a thick layer.The number of units in the final dense layer equals the number of classes that need to be classified.The graphic below shows lung cancer that has been categorized.Lung cancer that has been classified can be seen in the image below.
In Figure 4, Figure 5, and also Figure 6, it can be seen that this is a type of lung cancer itself, with normal cases in Figure 4, benign cancer in Figure 5, and malignant cancer in Figure 6. Figure 4 depicts a homogeneous lung with uniform margins, little density, and no lesions or tumors.A homogeneous lesion or mass with straight edges and low density is depicted in Figure 5, whereas an inhomogeneous lesion or mass with jagged edges and high density is depicted in Figure 6.7 shows the labeling of the dataset using the three labels previously explained.

Resize and Rescaling Results
In order to make the data modeling procedure that will handle hundreds of photographs from the dataset that we have identified simpler, we downscaled the experiment size from 512 × 512 to 256 × 256.The size of the before and after images can be seen in Figure 8.

Train Test Split Result
We were able to determine the percentage of dataset division that we had previously made from the outcomes of the train test split that we performed.The data successfully demonstrates this, as shown in the graph below, where the data train receives a value of 76% for the data train, 4% for the data validation, and 20% for the test data.The dataset distribution chart can be seen in the image below.The precision of the model utilized can give good accuracy for classifying lung cancer, as shown in Figures 10 and 11.This is demonstrated in the graph above, where there is little overfitting.This can be seen in the graph above, where there is no significant overfitting.We get model training results with 99.49% accuracy, which uses the Convolutional Neural Network (CNN) method and also adds DenseNet to it.Which makes the research that we do go beyond previous research that has been done with an accuracy of 97%.

Comparison Performance Analysis
A comparison of the accuracy of the proposed method with previous methods can be seen in Table 1.
Table 1.Comparison performance analysis Methods Accuracy MRFCN [14] 97% CNN 99.49% One of the objectives of this research is to optimize previous research which can be seen in Table 1, using the mRFCN method with an accuracy of 97%, while using the CNN method and adding a DenseNet layer, can optimize the accuracy of previous research with a result of 99.49%.Therefore, the CNN method if coupled with the DenseNet layer is very efficient to use.

CONCLUSION
This research shows that CNN and DenseNet designs can categorize lung cancer more accurately than MRFCN.In previous studies, using MRFCN as a method only got an accuracy of 97%, but after we modified and revamped it using the ConHal method, this is because both architectures are built to process and extract information from image data.The convolution layer that extracts data from a small portion of the image allows the CNN to learn the hierarchical features of the image.One of the most complex and advanced CNN topologies is DenseNet.A directly connected structure between levels of the same block exists in the DenseNet architecture.It is hoped that the findings from this study will help in the early detection of lung cancer and speed up its treatment.

Figure 1 .
Figure 1.Flowchart of lung cancer classification modeling procedures using CNN and denseNet

Figure 1 .
Figure 1.Summary layers of model From the results of the architecture model above, 8 layers are obtained consisting of 1 input, 1 output, 4 Conv2D layers, and 4 MaxPool2D layers.Which begins by resizing the 256 × 256 image and then resizing it to 128 × 128 to get accurate results in the training data.

Figure 5 .
Figure 5. Lung cancer Meanwhile, Figure7shows the labeling of the dataset using the three labels previously explained.