Enhancing cirrhosis detection: A deep learning approach with convolutional neural networks

ABSTRACT


INTRODUCTION
Cirrhosis is a serious liver condition with a high prevalence worldwide.It is increasingly prevalent in both developing and developed countries and remains a prominent contributor to adult mortality [1].The development of technology today provides various conveniences for various life problems.Humans are required to be able to use technology as well [2].Early detection of cirrhosis is crucial as it significantly impacts patients' outcomes, especially in advanced stages.Patients with cirrhosis requiring intensive care treatment have a higher risk of mortality compared to those without cirrhosis [3].Detecting cirrhosis accurately is essential, and deep transfer learning (DTL) has shown potential in determining cirrhosis through the use of conventional T2-weighted MRI scans [4].DTL, using a CNN trained in advance using a comprehensive database of natural images, achieved expert-level classification accuracy for liver cirrhosis detection [5], [6].Non-invasive blood tests, such as FibroTest and ActiTest, can also be used to gauge the occurrence rate of cirrhosis, with a prevalence of 32% observed in individuals suffering from chronic hepatitis C [7].Early detection and improved outcomes for cirrhosis patients require concerted efforts and preventive measures.
The development of cirrhosis without obvious initial symptoms emphasizes the need for sophisticated detection methods.Cirrhosis is a chronic disease that progresses slowly over years or decades, often without noticeable symptoms in the early stages [8].This delayed presentation makes it challenging to diagnose cirrhosis in its early stages, leading to an underestimation of its prevalence [9].Early detection is crucial because newer research has shown that early cirrhosis may be reversible [10].Therefore, sophisticated detection methods are needed to identify cirrhosis before it progresses to advanced stages and causes irreversible damage to the liver.Non-invasive markers, such as the fibrosis index (FI) and the combination of protein induced by vitamin K antagonist-II (PIVKA-II) with alpha-fetoprotein (AFP), have shown promise in detecting Hepatocellular carcinoma (HCC) in the early stages among individuals with cirrhotic livers [11], [12].These markers can help identify patients who require further evaluation and intervention, allowing for timely management and potentially improving outcomes.
Conventional diagnostic techniques in medical imagery have limitations in identifying complex patterns.Comparing patient groups using binary or multi-class classifiers when compared to healthy individuals often leads to inconclusive or conflicting findings [13].These methods struggle to model the compounding factors of diseases and the comorbidity of multiple diseases [14].Conversely, multi-label classifiers can effectively represent the co-occurrence of diseases by assigning subjects to two or more labels [15].In addition, traditional computer vision methods used in medical image processing are not as powerful as modern deep learning approaches for representation learning [16].Generative adversarial networks, although practical, can be challenging to succeed in highly varied datasets [17].Therefore, there is a need to explore semi-supervised learning and its use in medical information retrieval problems.
Deep learning CNN is considered a promising approach to cirrhosis detection due to its ability to outperform traditional methods and provide accurate predictions.It offers several advantages compared to traditional techniques.Firstly, advanced deep learning architectures, like deep neural networks (DNNs), can utilize a large number of variables and features, including laboratory measurements and diagnoses, to improve prediction accuracy [18].Secondly, CNN models can effectively identify and diagnose diseases in plants by analyzing leaf images, achieving high accuracy rates and reducing training time [19], [20] Thirdly, deep learning techniques, like CNN-ELM, can classify human activities accurately without the need for handcrafted features, enhancing the performance of human activity recognition systems [21].Additionally, deep CNN models, combined with ensemble learning, have shown robustness and better performance in crack detection, surpassing traditional image processing methods [22].Lastly, deep learning CNN models can assess the liver fibrosis stage using CT images and provide visual explanations of diagnostic decisions, improving transparency and understanding [23].
CNNs can contribute to automating the process of extracting features from medical imagery in several ways.Firstly, CNNs can be used to recognize and classify features in medical images, such as motor imagery and EEG signals [24].Secondly, CNNs can be utilized to analyze the encoding of scale information in medical images, allowing for the extraction of scale-covariant features [25].Additionally, CNNs can be trained in a parallelized manner to reduce training time and efficiently train CNN models for medical imaging tasks [5].Furthermore, CNNs can leverage non-expert annotations as a source of weak annotation to guide network learning, improving segmentation performance in medical image analysis [26].Overall, CNNs provide a powerful tool for automating feature extraction in medical imagery, enabling more efficient and accurate analysis and diagnosis.
Fast and accurate detection of cirrhosis is significant in preventing further liver damage and improving patient outcomes.Cirrhosis-induced sarcopenia, characterized by muscle loss and weakness, plays a detrimental involvement in patients awaiting transplantation [27].The Child-Turcotte-Pugh (CTP) score and the Model for End-Stage Liver Disease (MELD) score are frequently employed prognostic instruments for individuals with cirrhosis [28].Nevertheless, recently emerging biochemical models and imaging technologies are striving to enhance the accuracy of the MELD score [29].The liver frailty index (LFI), relying on easily measurable clinical parameters, can effectively detect cirrhosis-induced sarcopenia, which has a harmful impact on patients awaiting transplantation [3].Sarcopenia impacts decompensation event frequency, prolonged hospitalization and mortality, and can help prioritize patients on transplant lists [30].Early detection of sarcopenia and evaluating the complete spectrum of muscle mass, strength, and function is essential for effective clinical practice.Overall, fast and accurate cirrhosis detection allows for timely intervention and personalized management, leading to better patient outcomes.
The literature review presents a comprehensive analysis of the application of deep learning (DL) and traditional machine learning (ML) techniques in the detection of cirrhosis, indicating a significant shift towards advanced DL models, particularly CNNs, due to their superior performance in medical image analysis and disease detection compared to conventional ML methods [3], [30].CNNs have outperformed other models in classifying liver cirrhosis with high accuracy, even exceeding expert human assessment [17], and have shown promise in predicting complications such as hepatocellular carcinoma [31], [32] and portal hypertension [33].The review highlights the growing trend of employing CNNs for feature extraction and classification tasks in healthcare [33], [34] while acknowledging the advancements in traditional ML algorithms like Random Forest and XGBoost, which still demonstrate excellence in certain diagnostic tasks [35], [36].Despite the progress, a gap remains in the ease of clinical application and accessibility of these advanced ML techniques, with traditional models facing challenges due to their complexity and limitations in handling modern datasets [37], [38] The literature suggests the need for further research in enhancing the interpretability and clinical integration of DL models, as well as in exploring the potential of ML in non-invasive prognostic modeling and diagnosis using a variety of medical data sources [5], [30].This gap indicates a direction for future research to focus on improving the applicability of DL models in clinical settings and on developing algorithms that can provide more accessible and interpretable diagnostic tools for cirrhosis and other liver diseases.
Comparing CNN with established machine learning techniques such as SVM, KNN, Random Forest, Decision Tree, and XGBoost in the context of cirrhosis detection is important to determine the best-performing algorithm for this specific task.The rationale behind this comparison is to evaluate the accuracy and performance of each algorithm in detecting cirrhosis, which can help in improving early diagnosis and treatment of the disease.By comparing these techniques, researchers can identify the strengths and weaknesses of each algorithm and determine which one is most suitable for cirrhosis detection.This comparison can also provide insights into the potential of CNN to outperform traditional machine learning techniques in this particular domain [31], [32].

METHOD Proposed Model
Our proposed model is a CNN tailored for 1-dimensional data.At its core, a CNN uses a hierarchy of layers to process and transform input data, in this case, a sequence of values with specific padding, through convolution, pooling, and fully connected layers, to produce a binary output.
Convolution Layer (Conv1D): The first layer is a convolutional layer with 128 filters of size 3.This layer scans through the input sequence one dimension at a time with each filter, applying the following operation to get the feature map F, as calculated in equation (1).
Where X is the input, W represents the filter weights, b is the bias, and "ReLU" (x)=max(0,x) is the activation function that adds non-linearity.
MaxPooling Layer (MaxPooling1D): This layer simplifies the output by taking the maximum value in a region, reducing the dimensionality and retaining only the most significant features.
Dropout: To prevent overfitting, the Dropout layer randomly sets a portion of the feature detectors to 0 during training.For a dropout rate of 0.3, it means 30% of the nodes are turned off.This process can be represented in equation (2).
Where F is the feature map after pooling, and M is a mask matrix where 30% of values are 0 and 70% are 1.This sequence of layers (convolution, pooling, dropout) is repeated two more times with varying numbers of filters (256 and 512), each time abstracting the data further and capturing more complex patterns.Finally, the model is compiled with the Adam optimizer, which is an adaptive learning rate optimizer, and 'binary_crossentropy' as the loss function appropriate for a binary classification problem.The accuracy metric is used to evaluate the model's performance.

Dataset
This research employs the Cirrhosis Prediction Dataset from Kaggle as its data source, encompassing 276 observations across 19 attributes, which encapsulate a range of medical and demographic information.The attributes are N_Days, Status, Drug, Age, Sex, Ascites, Hepatomegaly, Spiders, Edema, Bilirubin, Cholesterol, Albumin, Copper, Alk_Phos, SGOT, Tryglicerides, Platelets, Prothrombin, Stage.The study aims to predict the incidence of cirrhosis in patients, predicated on a binary target variable: "severe" denoting the presence of cirrhosis and "very severe" indicating a normal condition.This dataset lays the groundwork for the construction and appraisal of various machine learning algorithms, including CNN, alongside several baseline algorithms, to prognosticate the cirrhosis status in individuals based on the available attributes.The dataset is partitioned into a training set constituting 80% (220 observations) and a testing set comprising 20% (56 observations).In the training set, 150 observations are categorized as "severe" while 70 are marked as "very severe"; whereas in the testing set, 38 observations fall under the "severe" category and 18 under the "very severe" category, thereby facilitating a structured approach for algorithm training and evaluation.

Preprocessing
In the code, there are several data preprocessing steps performed before using the data for model training.Incomplete data is removed using .dropna(),values in categorical columns are converted to numerical values using .replace(),and numeric data is normalized using MinMaxScaler.Additionally, class labels in the 'Stage' column are transformed into 1 (positive) if the class is 4 and 0 (negative) if it's not 4, thus converting the classification problem into a binary classification problem.After preprocessing, the data is prepared for utilization in training various machine learning models such as CNN, SVM, Decision Tree, KNN, GNB, and GBoost.The performance evaluation results of these models are then displayed in the form of precision-recall plots.

Comparison of Methods
This research applies CNN as the primary algorithm for modeling and classifying the data while also comparing it with five baseline algorithms, namely SVM, Decision Tree, KNN, GNB, and GBoost.By using CNN, the research aims to automatically leverage essential features in the data through deep learning processes.Meanwhile, the initial reference point algorithms used for comparison are conducted to evaluate how well CNN performs in the classification assignment.The evaluation of the outcomes of these various algorithms will provide a deeper understanding of the effectiveness of CNN in solving the classification problem that is the focus of this research and can assist in selecting the best model for this purpose.

Training and Evaluations
The model training process involves 200 epochs (training iterations) with a batch size of 64.The model's performance is assessed through the utilization of metrics like the Confusion Matrix and Classification Report.The Confusion Matrix measures the model's performance in classifying data by calculating the counts of True Positives (TP), True Negatives (TN), False Positives(FP), and False Negatives(FN) in equation (3).On the other hand, the Classification Report provides further information about evaluation metrics such as precision, recall, F1-score, and accuracy, along with their corresponding mathematical formulas as listed in equation ( 4), ( 5), (6), and (7).

RESULTS AND DISCUSSIONS Training Process
In this section, we present two crucial graphs for evaluating model performance.The first figure, Figure 1, displays the training accuracy (accuracy on the training data) in blue and the validation accuracy (accuracy on the validation data) in orange.This figure provides an overview of how well the model can learn patterns from the training data and how effectively it can generalize its results to unseen data.

Model Performance
In this section, we delve into the comprehensive evaluation of our model's performance using key metrics and analysis.Two essential tables, Table 1 and Table 2, are presented to provide a comprehensive view of the model's capabilities and effectiveness across multiple machine learning algorithms.Table 1 showcases the confusion matrix, a critical tool for assessing classification performance.It tabulates the counts of TP, FP, FN, and TN obtained from various algorithms, including CNN, SVM, Decision Tree, KNN, GNB, and GBoost.This matrix allows us to gain insights into how well the models are correctly classifying instances and identifying errors in their predictions.Meanwhile, Table 2 presents the classification report, offering a more comprehensive assessment of each algorithm's performance.It includes essential metrics such as accuracy, precision, recall, and F1-score for each algorithm -CNN, SVM, Decision Tree, KNN, GNB, and GBoost.These metrics provide a holistic perspective on the algorithms' capabilities, encompassing their ability to make correct classifications, manage false positives and negatives, and balance precision and recall.These tables play a pivotal role in our analysis, aiding in the selection of the most suitable machine learning algorithm for the task at hand while highlighting the strengths and weaknesses of each approach.The insights gained from these metrics enable us to make informed decisions and optimize our model's performance in the context of the research focus -cirrhosis prediction.

Summarization of Key Findings
The problem addressed in this research is to create or construct and evaluate various machine learning algorithms, including CNN, SVM, Decision Tree, KNN, GNB, and GBoost, for the task of predicting severe or very severe cirrhosis in patients using a collection of medical and demographic characteristics as a foundation.The main findings from this evaluation reveal that CNN demonstrates strong performance with an accuracy of 84%, indicating its effectiveness in distinguishing between patients with very severe cirrhosis and severe cirrhosis.Additionally, CNN also exhibits the highest precision and F1 score for the 'Severe' class, showcasing its ability to classify cases of cirrhosis severity correctly.On the other hand, SVM shows lower performance, particularly in terms of recall, indicating its difficulty in identifying actual cases of cirrhosis.Decision Tree, KNN, GNB, and GBoost also exhibit varying levels of performance, with Decision Tree having the lowest overall accuracy.These findings underscore the importance of selecting a suitable algorithm for cirrhosis prediction, with CNN emerging as a promising choice based on various evaluation metrics considered.

Interpretation of The Result
The results of this study offer valuable perspectives on the effectiveness of different machine learning algorithms for cirrhosis prediction based on medical and demographic attributes.Several patterns and relationships among the data can be identified from the evaluation.Firstly, CNN emerged as the top-performing algorithm, achieving an accuracy of 84%, which exceeded expectations.It demonstrated the ability to effectively distinguish between patients with cirrhosis and those without, particularly in terms of correctly classifying non-cirrhotic cases (high precision).This highlights the significance of leveraging deep learning techniques for medical diagnosis tasks.Secondly, SVM, Decision Tree, KNN, GNB, and GBoost displayed varying degrees of performance, with SVM and GBoost falling short of expectations in terms of recall and F1score, indicating challenges in identifying true positive cirrhosis cases.

Implications of The Research
This research is highly relevant and has implications within the realm of medical diagnosis and machine learning.Firstly, the study underscores the practical applicability of CNN in the analysis of medical images and the prediction of diseases.The robust performance of CNN in cirrhosis prediction indicates its potential for aiding healthcare professionals in early and accurate diagnosis.This finding aligns with prior literature emphasizing the growing role of deep learning techniques in the analysis of images in the medical field.Secondly, the varying performance of alternative algorithms, such as SVM, Decision Tree, KNN, GNB, and GBoost, highlights the importance of algorithm selection in healthcare applications.These results corroborate the current understanding that the choice of algorithm is pivotal in achieving accurate medical diagnoses.The research contributes new insights by showcasing that CNN, initially designed for image-related tasks, can be successfully adapted to tabular medical data for disease prediction.This cross-domain application widens the horizon for utilizing deep learning in medical research beyond just medical imaging.
Additionally, the study emphasizes the significance of not only achieving high accuracy but also considering other metrics like recall, precision, and F1-score, especially in healthcare contexts where false negatives or false positives can have critical consequences.These insights serve as a valuable guide for future research and advancements in machine learning models for medical diagnosis.Overall, this research makes a contribution to the evolving landscape in the field of healthcare technology by demonstrating the potential of CNN and reinforcing the importance of thoughtful algorithm selection in medical decision support systems.

Limitations of The Research
The findings of this research deserve consideration in light of certain limitations.Firstly, the dataset used for cirrhosis prediction consists of 276 observations, which, although substantial, may not capture the full spectrum of clinical variability in cirrhosis cases.A more extensive and more varied dataset might offer a more thorough grasp of the model's performance.Additionally, the study focused on a specific set of medical and demographic attributes, and the inclusion of additional relevant features, such as genetic markers or lifestyle factors, could enhance the predictive capabilities of the models.Moreover, the research primarily evaluated algorithmic performance based on accuracy, precision, recall, and F1-score, but further investigation into the clinical impact and interpretability of model decisions could offer valuable insights for medical practitioners.Despite these limitations, the study's results remain valid for addressing the research questions as they provide valuable comparative insights into the effectiveness of machine learning algorithms in predicting cirrhosis.The limitations serve as avenues for future research to explore more extensive datasets, additional features, and the clinical applicability of the models, ultimately contributing to advancements in medical diagnosis.

Recommendations for Future Research
Taking into account the discoveries from this research, several practical implementations and avenues for future research can be recommended.Firstly, the high accuracy and predictive capabilities of the CNN model suggest its potential for real-world clinical deployment.Future research could focus on integrating the CNN model into healthcare systems to assist healthcare professionals in cirrhosis diagnosis.Secondly, investigating the effectiveness of ensemble methods that combine multiple machine learning algorithms, including CNN, SVM, Decision Tree, KNN, GNB, and GBoost, could lead to enhanced predictive performance by leveraging the strengths of individual algorithms.Thirdly, exploring feature engineering techniques to identify and incorporate additional relevant features for cirrhosis prediction, such as genetic markers or lifestyle data, may further improve model accuracy.
Additionally, developing methods for explaining the decisions made by machine learning models, particularly CNN, in the context of medical diagnosis can enhance the trust and acceptance of these models among healthcare practitioners.Furthermore, conducting studies to assess the clinical significance and economic efficiency of using machine learning models for cirrhosis prediction in real healthcare settings is essential.Lastly, expanding the dataset, including longitudinal data, and applying the knowledge gained to predict other medical conditions or diseases can contribute to the broader application of machine learning in healthcare.These recommendations provide a roadmap for practical implementations and future research directions that can contribute to enhancing the progress of medical diagnosis through machine learning and, in the end, enhance patient care.

CONCLUSION
Conclusions derived from the outcomes and discussions in this study encompass several key points.Firstly, the CNN model demonstrates remarkable accuracy and predictive capabilities in cirrhosis prediction, indicating its potential for practical clinical deployment and assisting healthcare professionals with early diagnosis.Secondly, the comparative analysis of different machine learning algorithms underscores the significance of thoughtful algorithm selection in medical diagnosis, with CNN emerging as the top performer among the algorithms tested.Furthermore, the study highlights the potential for improving model accuracy through feature engineering, such as incorporating additional relevant attributes like genetic markers or lifestyle data.The need for interpretable AI models, particularly in medical contexts, is emphasized, necessitating the development of methods to explain model decisions.
Additionally, the research recommends evaluating the clinical significance and cost-effectiveness of deploying machine learning models in real healthcare settings.Finally, future research directions encompass exploring ensemble methods, dataset expansion, longitudinal analysis, and cross-disease predictions, further advancing the utilization of machine learning in healthcare.These findings contribute significantly to the field of medical diagnosis using machine learning, offering valuable insights for future research and practical healthcare applications.

Figure 1 .
Figure 1.Training and validation accuracy in the training process Meanwhile, the second figure, Figure 2, visualizes the training loss (loss on the training data) in blue and the validation loss (loss on the validation data) in orange.This figure helps us understand how well the model reduces errors during training and measures its performance on data that was not used for training.By examining both of these figures, we can identify overfitting or underfitting and optimize the model to achieve better results in the classification task, which is the focus of this research.

Figure 1 .
Figure 1.Training and validation loss in the training process

Table 2 .
Classification Report