Explanations of Optimal Policies for Markov Decision Processes H/F

Détail de l'offre

Informations générales

Entité de rattachement

Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.

Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un socle de recherche fondamentale. Le CEA s'engage depuis plus de 75 ans au service de la souveraineté scientifique, technologique et industrielle de la France et de l'Europe pour un présent et un avenir mieux maîtrisés et plus sûrs.

Implanté au cœur des territoires équipés de très grandes infrastructures de recherche, le CEA dispose d'un large éventail de partenaires académiques et industriels en France, en Europe et à l'international.

Les 20 000 collaboratrices et collaborateurs du CEA partagent trois valeurs fondamentales :

• La conscience des responsabilités
• La coopération
• La curiosité



Description de l'unité

Among other activities, CEA LIST Software Safety and Security Laboratory (LSL) research teams
design and implement automated analysis in order to make software systems more trustworthy,
to exhaustively detect their vulnerabilities, to guarantee conformity to their specifications,
and to accelerate their certification.
The lab recently extended its activities on the topic of AI trustworthiness
and gave birth to a new research group: AISER (Artificial Intelligence Safety, Explainability and Robustness).

Description du poste


Mathématiques, information  scientifique, logiciel



Intitulé de l'offre

Explanations of Optimal Policies for Markov Decision Processes H/F

Sujet de stage

Explanations of Optimal Policies for Markov Decision Processes

Durée du contrat (en mois)

5 to 6

Description de l'offre

Markov Decision Processes (MDPs) are a class of models used by intelligent agents to reason about their environment.  MDPs describe the environment and the (stochastic) effects of the agent's actions.  Given an MDP, an intelligent agent can determine it optimal course of actions (aka, policy) in order to reach their goal with minimal cost / maximal reward.

The AI may propose actions that appear illogical to a human user in the current state.  The irrationality of such decisions might an error in the MDP, a bug in the AI, or a superb strategy that the user did not anticipate (such as taking a back step to take a better jump).  To distinguish between these scenarios, the AI needs to provide explanations that justify or illuminate the decisions.

There is no single definition of what an explanation is.  In this work, we are interested in `contrastive explanations' that answer the question

`What is the minimal change to the MDP that would modify the optimal policy?'

These answers allow the human user to better understand what aspects of the environment affect the decision.


During this internship, you will develop methods for contrastive explanations of MDPs.  You will determine how queries from users about policies can be formalised: develop algorithms to solve these problems; and propose heuristics to tame the computational complexity of these problems. We expect that a Branch \& Bound procedure will be required and that clever evaluation procedures will be necessary to avoid exponential search spaces.

In practice, the internship will be split in several subtasks as follows:

  • Review the existing scientific literature on explanations for MDPs.
  • Formalise (= translate in mathematical language) natural queries that a user might want being answered.
  • Build benchmarks for testing the algorithms.
  • Propose algorithms and test their scalability.

Localisation du poste



Localisation du poste

France, Auvergne-Rhône-Alpes, Isère (38)



Critères candidat

Diplôme préparé

Bac+5 - Diplôme École d'ingénieurs

Formation recommandée

Computer Science

Possibilité de poursuite en thèse
