Algorithmic Generation of Grammar Simplification Rules Using Large Corpora

AMIA Jt Summits Transl Sci Proc. 2019 May 6:2019:72-81. eCollection 2019.

Abstract

There is often a discontinuity between patients' literacy level and educational materials. In response, we are developing an online medical text simplification editor. In this paper, we describe generating grammar simplification rules from a large parallel corpus (N=141,500) containing original sentences and their simplified variants. We algorithmically identified grammatical transformations between sentences (N=26,600) and used distributional characteristics in two corpora to select transformations with the broadest application and the least ambiguity. This resulted in a top set of 146 rules. Two experts evaluated 20 representative rules reflecting 4 characteristics (long/short and weak/strong) each with 5 example sentences. Generally, we found that the rules are helpful for guiding simplification. Using a 5-point Likert scale (5=best), stronger rules scored higher for ease of applying (4.11), overall helpfulness (4.40) and usefulness of examples (4.05). Rule length did not affect the expert scores. The grammar simplification rules are being integrated in our text editor.