An important topic in artificial intelligence concerns the ability of computer programs to learn from examples. When looking at human learning, one can observe a lot of different learning tasks: learning to walk, learning to speak, learning to recognize words and, more generally, several forms of behavior learning. Formalizing all these concepts is far from being trivial and multiple scientific domains, which focus on different aspects of the problem, have naturally emerged over the past sixty years.

Supervised Learning focuses on learning a function given a set of examples of inputs and associated outputs. Most of the work in this field dealt with continuous outputs (i.e. regression) or discrete outputs (i.e. classification). During the last decade, statistical machine learning has reached a high level of maturity to solve these problems with efficient and theoretically well-founded algorithms. This success story has led researchers to move on more advanced learning problems.

Classification and regression respectively consist in predicting simple discrete or continuous outputs. In many different fields such as biology, natural language processing, image processing or chemistry, data may be naturally described as structured objects like sequences, trees, lattices or graphs. This data, which does not fit in the classical learning frameworks, has recently witnessed a surge of interest in the machine learning community. Supervised learning problem where the outputs are structured objects is known as learning with structured outputs or structured prediction (SP). SP has a lot of real-world applications, including prediction of protein structure in bio-informatics, image understanding, speech processing, handwriting recognition and several natural language processing tasks such as part-of-speech tagging, named entity extraction, sentence parsing or automatic translation.

The two major components of any structured prediction system are called learning and inference. Given an input, inference is the process of predicting an output, i.e. selecting the best output among the set of possible outputs. Given the set of training examples, learning aims at finding the best way to perform inference in order to maximize the quality of predictions.

Most existing approaches for structured prediction rely on the idea of learning a compatibility function between inputs and outputs. Given such a compatibility function, predictions can be performed by searching the most output which maximizes its compatibility with a given input. For example, given a sequence of handwritten letters, prediction aims at finding the sequence of recognized characters that is the most compatible with the input. Inference thus involves the resolution of a combinatorial search problem in the space of possible outputs. This search problem can be solved efficiently for simple data structures such as sequences, thanks to dynamic programming algorithms. However, with more complex data, this problem quickly becomes intractable, which severely limits the applicability of the corresponding SP approaches.

Instead of modeling what a good output looks like and then searching for the best output, a much simpler approach can be adopted by directly modeling how to build the good output. In this approach, called incremental structured prediction, the predictions are built incrementally: components are added one at a time to a current partial output. Inference then consists in exploring a prediction space defined by states (partial outputs) and actions (choosing a component to be added to the current state) until a complete output is built. For example, in handwritten word recognition, partial outputs are partially recognized words and actions may consist in adding one recognized letter to the partial output. Inference then consists in selecting actions until reaching a fully recognized word.

My PhD is essentially composed of generalizations of previous work on incremental SP according to three directions:

  • Generality of inference procedures While previous work focused on learning to select partial outputs in a prediction space, this thesis focuses on the more general problem of learning to perform learning-based decision in a general inference procedure. We consider inference procedure that may perform any sequence of operations on any number of variables that may be relevant to the process of constructing a prediction. In this thesis, we formalize and experiment inference procedures that are much more complicated than those given in the literature.
  • Generality of supervision assumptions Previous work in structured prediction relied on the assumptions that the decision performed during inference can be supervised locally and immediately. These are strong assumptions that are not verified in multiple applications. In order to alleviate this assumption, this thesis introduces reward-based supervision as a new mean of learning inference procedures. Reward-based supervision leads to a new family of methods for structured prediction: approximate reinforcement learning algorithms. These general-purpose algorithms make it possible to deal with a strictly larger range of structured prediction applications than previous works.
  • Generality of problems Although most of this thesis is about structured prediction, the learning approaches that we develop can be relevant to other fields. As an example, this thesis includes a prospective work on the learning-for-search problem, which consists in using machine learning to improve the efficiency of combinatorial search algorithms.

Here are some keywords of my research interests:

  • Manipulation of sequences, e.g. sequences of words, sequences of written characters, ...
  • Manipulation of trees, e.g. XML and HTML trees.
  • Dependency Parsing structures (natural language analysis).
  • Generic Feature Generators, for transforming structures into sparse description vectors.
  • Very large Markov Decision Processes.
  • Structured/Relationnal State and Actions.
  • Approximated Reinforcement Learning algorithms.
  • A user interface for navigating in Markov Decision Processes, Representations and Features.
  • Incremental Structure Prediction algorithms, e.g. LaSO and Searn.
  • Learning for search.
  • Stacked Learning.
  • Applications: Sequence Labeling, Dependency Parsing, XML Structure Mapping, simple Reinforcement Learning applications, ...