Active Learning of Predefined Models for Information Extraction: Selecting Regular Expressions from Examples


We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.

Fuzzy Systems and Data Mining