Introduction: The principal objective of this paper is to propose an automated classification system using machine learning techniques and data fusion, to identify BI-RADS categories with high precision and integration.
Methods: In this research, a novel method for automatic extraction of BI-RADS classification from text reports is proposed. At first, mammography vocabulary is used to select keywords from medical text reports. Word2Vec and TFIDF techniques were used to extract features, and finally, they were combined with Hospital Information System (HIS) reports. Different classifiers such as convolutional neural networks (CNN), multilayer perceptron (MLP), decision tree (DT), and k-nearest neighbor (k-NN) were used to compare their accuracy with and without HIS investigation. Results: The results confirm that the proposed approach, namely the use of Word2Vec combined with TFIDF, and their integration with HIS, has a significant impact on the accuracy of medical text classification. The output vectors of Word2Vec were used for BI-RADS level classification when TFIDF is applied or not applied, as well as with and without the integration of HIS, for classifiers such as CNN, MLP, DT, and k-NN, and the results were compared using evaluation measures such as accuracy, precision, sensitivity, positive predictive value, negative predictive value, and F1 score. The results show that the best accuracy with the proposed method using the multilayer perceptron classifier is 98.74%, but without HIS, the accuracy for the same classifier is 92.23%. Conclusion: By combining Word2Vec with TFIDF, the accuracy of text classification can be increased, but the medical history of patients is important in the diagnosis of disease and can improve the accuracy. The results show that one should not focus only on medical reports and other clinical information and patients' history should also be used. Therefore, the use of HIS along with medical text reports can improve BI-RADS classification and have a positive effect on diagnosis and treatment processes.