A Corpus for Evidence Based Medicine Summarisation

Diego Mollá Aliod, María Elena Santiago Martínez

Abstract

Background

Automated text summarisers that find the best clinical evidence reported in collections of medical literature are of potential benefit for the practice of Evidence Based Medicine (EBM). Research and development of text summarisers for EBM, however, is impeded by the lack of corpora to train and test such systems.

Aims

To produce a corpus for research in EBM summarisation.

Method

We sourced the “Clinical Inquiries” section of the Journal of Family Practice (JFP) and obtained a sizeable sample of questions and evidence based summaries. We further processed the summaries by combining automated techniques, human annotations, and crowdsourcing techniques to identify the PubMed IDs of the references.

Results

The corpus has 456 questions, 1,396 answer components, 3,036 answer justifications, and 2,908 references.

Conclusion

The corpus is now available for the research community at http://sourceforge.net/projects/ebmsumcorpus.

Full Text: PDF