Background: Nonalcoholic fatty liver disease (NAFLD) represents a growing global health burden, strongly associated with rising rates of obesity, diabetes, and metabolic syndrome. This study introduces a machine learning framework to precisely diagnose NAFLD, classify disease severity, and stratify risk using routine clinical data. Our model improves early detection and risk prediction, supporting evidence-based clinical decisions. Leveraging predictive analytics, this scalable approach identifies high-risk patients and enables personalized interventions. The data-driven strategy optimizes NAFLD management by extracting maximal value from standard healthcare records, delivering both clinical and operational advantages. Methods: This study examined 181 NAFLD patients across disease stages. The dataset was compiled from February 2010 to January 2019 at Eheim University Hospital, comprising general volunteers who were diagnosed with or without fatty liver based on histopathological evaluation of liver biopsy samples. Forward selection and mutual information identified predictive features, applied in classification models (e.g., random forest) to assess steatosis severity. Explainable AI (XAI) improved model interpretability. Combining robust feature selection, machine learning, and XAI ensured accurate, clinically actionable NAFLD severity evaluation. Results: The XGBoost classifier with forward feature selection attained a classification accuracy of 69.23%±5.5% for steatosis severity. Interpretability analysis highlighted age, Body Mass Index (BMI), High-Density Lipoprotein (HDL), Low-Density Lipoprotein (LDL), A1c Hemoglobin (HbA1c), and glutamate pyruvate transaminase (GPT) as the most impactful variables across three severity classes. Furthermore, GPT, age, BMI, HDL, HbA1c, LDL, triglycerides, and cholesterol were critical to model performance, emphasizing their diagnostic significance in NAFLD progression. These findings suggest their utility in clinical assessments and risk stratification. Conclusion: This study developed a machine learning model for accurate NAFLD diagnosis and severity stratification using routine clinical data. Accessible biomarkers reliably predicted disease progression, enabling gastroenterologists to facilitate early intervention. This cost-effective approach reduces healthcare costs while improving outcomes through precision medicine. Implementing such predictive tools in clinical practice could optimize resource allocation and enhance long-term NAFLD management. The framework supports timely diagnostics and targeted therapies, advancing patient-centered care. |