Journal

Designing Automatically a Representation for Grammatical Evolution

A long-standing problem in Evolutionary Computation consists in how to choose an appropriate representation for the solutions. In this work we investigate the feasibility of synthesizing a representation automatically, for the large class of problems …

How Phishing Pages Look Like?

Recent phishing campaigns are increasingly targeted to specific, small population of users and last for increasingly shorter life spans. There is thus an urgent need for developing defense mechanisms that do not rely on any forms of blacklisting or …

HearthBot\: An Autonomous Agent Based on Fuzzy ART Adaptive Neural Networks for the Digital Collectible Card Game HearthStone

Digital collectible card games, as partially observable games based on alternating turns, such as HearthStone, have been the most played card games in recent years, where the main challenge is the creation of strategies capable of subdue the enemy's …

Active Learning of Regular Expressions for Entity Extraction

We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been …

Can A Machine Replace Humans In Building Regular Expressions? A Case Study

Regular expressions are routinely used in a variety of different application domains. Building a regular expression involves a considerable amount of skill, expertise and creativity. In this work we investigate whether a machine may surrogate these …

Inference of Regular Expressions for Text Extraction from Examples

A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this …

Predicting the Effectiveness of Pattern-based Entity Extractor Inference

An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor …

Regex-based Entity Extraction with Active Learning and Genetic Programming

We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user annotates only one …

Data Quality Challenge: Toward a tool for string processing by examples

Many data-related activities at organizations of all sizes are concerned with low-level string processing, such as format transformation and validation, data cleaning, substring extraction and classification, and so on. Problems of this sort occur …

Automatic Synthesis of Regular Expressions from Examples

We propose a system for the automatic generation of regular expressions for text-extraction tasks. The user describes the desired task only by means of a set of labeled examples. The generated regexes may be used with common engines such as those …