Cecilia Besa

RL1, Publisher: Link>


Pablo Pino, Denis Parra, Claudio Lagos, Cecilia Besa


We address the task of automatically generating a medical report from chest X-rays. Many authors have proposed deep learning models to solve this task, but they focus mainly on improving NLP metrics, such as BLEU and CIDEr, which are not suitable to measure clinical correctness in clinical reports. In this work, we propose CNN-TRG, a Template-based Report Generation model that detects a set of abnormalities and verbalizes them via fixed sentences, which is much simpler than other state-of-the-art NLG methods and achieves better results in medical correctness metrics. We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines as well as deep learning-based models, by employing the Chexpert labeler and MIRQI as clinical correctness evaluations, and NLP metrics as secondary evaluation. We also provide further evidence indicating that traditional NLP metrics are not suitable for this task by presenting their lack of robustness in multiple cases. We show that slightly altering a template-based model can increase NLP metrics considerably while maintaining high clinical performance. Our work contributes by a simple but effective approach for chest X-ray report generation, as well as by supporting a model evaluation focused primarily on clinical correctness metrics and secondarily on NLP metrics.

27 visualizaciones Ir a la publicación

2022, Publisher: ACM Computing Surveys, Link>


Sergio Uribe, Daniel Capurro, Pablo Pino, Pablo Messina, Marcelo Andia, Denis Parra, Cristian Tejos, Claudia Prieto, Cecilia Besa, Álvaro Soto


Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.

29 visualizaciones Ir a la publicación