Alternance - In computer science, data science, artificial intelligence, or a related field H/F
In a world of fast acceleration of Research & Innovation in the fields of low carbon processes and high sustainability solutions, IFPEN plays a major role as a committed player in the threefold ecological, energy and digital transition, as an institute open to society, and a trusted third party for public authorities.
In this context, the Physics and Analysis Direction aims to produce a large amount of analysis and characterizations in several fields, such as various types of spectroscopies or microscopies. These data, and the analysis of their contents by specialists, are stored in a wide variety of forms: databases, Microsoft Office files … Over the years, numerous experiments have generated a large volume of heterogeneous documents (reports, notes, meeting summaries) containing valuable information on protocols, parameters, and results. These archives remain difficult to exploit for understanding experimental dynamics and contextualizing new studies.
To address this challenging issue, the Digital and Science Technology team aims to develop a hybrid AI architecture combining language models and deep learning to transform the archives into a tool for analysis and trend identification.
1) Structuring past experiments
Use large language models (LLMs) to extract and organize key information from historical documents and provide semantic search and navigation tools to efficiently explore experimental history.
2) Analyzing trends with AI
Leverage deep learning and AI techniques (embeddings, knowledge graphs, Graph Neural Networks, clustering) to analyze trends and dynamics, and to relate historical experiments to new or ongoing studies.
This project combines natural language processing, deep learning, and knowledge graphs to structure and analyze experimental history, providing insight into trends and the context of new experiments. You will work closely with data scientists as well as with analysis researchers, in a stimulating environment.
Techniques applied during the apprenticeship
- Large Language Models (LLMs) for information extraction and structuring from heterogeneous documents
- Hybrid representations combining semantic embeddings and knowledge graphs
- Deep learning methods for similarity analysis (embedding-based similarity, metric learning), clustering, and trend analysis
- Graph-based learning approaches (e.g., Graph Neural Networks) to model relationships and experimental dynamics
Apprentice engineer in computer science, data science, artificial intelligence, or a related field