A Corpus for Evidence Based Medicine Summarisation

Diego Mollá Aliod, María Elena Santiago Martínez



Automated text summarisers that find the best clinical evidence reported in collections of medical literature are of potential benefit for the practice of Evidence Based Medicine (EBM). Research and development of text summarisers for EBM, however, is impeded by the lack of corpora to train and test such systems.


To produce a corpus for research in EBM summarisation.


We sourced the “Clinical Inquiries” section of the Journal of Family Practice (JFP) and obtained a sizeable sample of questions and evidence based summaries. We further processed the summaries by combining automated techniques, human annotations, and crowdsourcing techniques to identify the PubMed IDs of the references.


The corpus has 456 questions, 1,396 answer components, 3,036 answer justifications, and 2,908 references.


The corpus is now available for the research community at http://sourceforge.net/projects/ebmsumcorpus.

Full Text: PDF