Aim: The aim of this study was to develop a simple and reliable scoring system for colorectal cancer risk assessment considering the effectiveness of different screening options.
Background: Colorectal cancer (CRC) is one of the leading causes of cancer-related mortality worldwide. As a preventable disease, accurate risk assessment plays a critical role in early detection and prevention strategies.
Methods: This study analyzed data from 3,465 individuals, including patients, first-degree, second-degree, and third-degree relatives of patients, and volunteers participating in a national CRC screening program. The dataset reflects nine years of prevention efforts targeting at-risk populations. Machine learning algorithms—Logistic Regression, Naïve Bayes, Neural Networks, Random Forest, and Support Vector Machines (SVM)—were applied to the data. Model performance was evaluated using cross-validation, ROC curves, F1 score, classification accuracy (CA), and overall accuracy metrics.
Results: Of the 3,465 participants, 2,240 were diagnosed with colorectal cancer, 60 were relatives of affected patients, and 1,086 were non-patient relatives. Among CRC patients, 1,979 were over the age of 40. Logistic Regression and Naïve Bayes achieved the highest AUC values (0.910 and 0.907, respectively), with corresponding accuracies of 84.5% and 82.8%. Logistic Regression demonstrated the highest overall accuracy. The proposed scoring system incorporated variables such as age, sex, diabetes, inflammatory bowel disease (IBD), smoking, alcohol consumption, tobacco use, BMI, family history, and five key symptoms: rectal bleeding, abdominal pain, unintended weight loss, changes in bowel habits, and anemia. The highest score was assigned to individuals over 60 years of age, while a score of 2 was allocated for a family history involving more than two affected relatives.
Conclusion: A scoring system based on easily measurable variables offers a practical and efficient tool for widespread clinical use. Such systems can assist healthcare professionals in streamlining risk assessment and improving early identification of high-risk individuals.