Extractive Summarisation of Medical Documents

Abeed Sarker, Diego Molla, Cecile Paris

Abstract

Background

Evidence Based Medicine (EBM) practice requires practitioners to extract evidence from published medical research when answering clinical queries. Due to the time-consuming nature of this practice, there is a strong motivation for systems that can automatically summarise medical documents and help practitioners find relevant information.

Aim

The aim of this work is to propose an automatic query-focused, extractive summarisation approach that selects informative sentences from medical documents.

Method

We use a corpus that is specifically designed for summarisation in the EBM domain. We use approximately half the corpus for deriving important statistics associated with the best possible extractive summaries. We take into account factors such as sentence position, length, sentence content, and the type of the query posed. Using the statistics from the first set, we evaluate our approach on a separate set. Evaluation of the qualities of the generated summaries is performed automatically using ROUGE, which is a popular tool for evaluating automatic summaries.

Results

Our summarisation approach outperforms all baselines (best baseline score: 0.1594; our score 0.1653). Further improvements are achieved when query types are taken into account.

Conclusion

The quality of extractive summarisation in the medical domain can be significantly improved by incorporating domain knowledge and statistics derived from a specialised corpus. Such techniques can therefore be applied for content selection in end-to-end summarisation systems.

Full Text: PDF