PhD on Deep Reinforcement Learning for Online Combinatorial Optimization

Research / Academic

This project contributes both to developing novel algorithms and applying them. While online decision-making, i.e., repeated decision-making while uncertainty reveals, becomes more and more common, our understanding of suitable algorithms is still in its infancy. We target to develop novel online learning algorithms which leverage real-time data to improve decision-making, e.g., deep reinforcement learning. A qualified candidate will develop new deep reinforcement learning algorithms and apply them to a variety of use cases.

A qualified candidate will tackle the algorithm development, ranging from modeling the online counterparts of well-known combinatorial problems to permit efficient problem solving (including uncertainty, human behavior, and reward functions) to algorithm design (architecture of the neural networks, exploration-exploitation tradeoff, updating of online policy, etc.).

We consider several use cases for evaluating the performance of the developed algorithms:

  • Online stacking: 3D objects are placed into or onto a load carrier using a robot arm. When placing the first objects, the shape, weight, and other characteristics of later items are not yet known. The goal is to place each item such that a stable and dense stack of objects emerges.
  • Online rebalancing: To efficiently operate bike-sharing systems, operators must 'rebalance' bikes before demand is realized, deciding on how many vehicles to place at which station. Customer behavior further complicates this decision-making process since customers walk to nearby stations, use alternative modes of transport, and travel in groups.
  • Online fleet charging: With increased electrification of truck fleets (also reacting to strengthening EU regulations), logistics service providers (LSPs) must consider when and where to charge their fleet of trucks. Compared to almost any other application of energy usage, LSPs have the unique chance to balance the energy usage throughout the larger areas. Ultimately, adapting the charging decision will also impact the routing decisions, turning this from currently static decision-making to its online counterpart.

TU/e and the team

The PhD students will be supervised by dr. Layla Martin, dr. Mehrdad Mohammadi and dr. Willem van Jaarsveld. Mehrdad and Layla are both assistant professors focusing on optimization and machine learning algorithms in Industry 4.0 and operations research for transport and logistics, respectively, and Willem is an associate professor for machine learning in operations management.

The project, the supervisors, and eventually also the PhD students are embedded in TU/e's Operations, Planning, Accounting, and Control (OPAC) group. OPAC uses methods from operations research and operations management on a wide variety of problems, and currently hosts around 50 PhD students from various backgrounds.

The supervisors are actively involved in the European Supply Chain Forum (, a leading platform for collaboration between industry and academia on supply chain challenges. The project draws upon long-standing collaborations with industry. The PhD student will collaborate with several companies throughout the project, e.g., with ASML, Den Hartogh, Ewals, and Vanderlande.

Target profile

We are looking for PhD students with a keen interest in using artificial intelligence for better decision-making. Students shall have prior experience in optimization, operations research, decision support, and/or artificial intelligence, for example proven by a suitable master degree (e.g., Computer Science, Industrial Engineering, Econometrics, Applied Mathematics). Ideally, applicants have already finished their master degree or are expected to finish soon to allow a project start in the upcoming academic year.

Candidates must have experience in programming, using languages such as C++ or Java. Additionally, experience in Python is beneficial.

Prior experience in implementing artificial intelligence techniques (such as deep reinforcement learning) for decision-making is considered a plus. Ideally, candidates have experience with mathematical modeling and implementation in CPLEX or Gurobi, implementing efficient heuristics, and/or machine learning.


  • A Master's degree in Computer Science, Operations Research, Industrial Engineering, Econometrics, Applied Mathematics, or a related field
  • Strong analytical and mathematical skills
  • Programming experience in object-oriented languages such as C++ or Java
  • Excellent verbal and written communication skills in English

Salary Benefits:

A meaningful job in a dynamic and ambitious university, in an interdisciplinary setting and within an international network. You will work on a beautiful, green campus within walking distance of the central train station. In addition, we offer you:

  • Full-time employment for four years, with an intermediate evaluation (go/no-go) after nine months. You will spend 10% of your employment on teaching tasks.
  • Salary and benefits (such as a pension scheme, paid pregnancy and maternity leave, partially paid parental leave) in accordance with the Collective Labour Agreement for Dutch Universities, scale P (min. € 2770,-,  max. € 3539,-).
  • A year-end bonus of 8.3% and annual vacation pay of 8%.
  • High-quality training programs and other support to grow into a self-aware, autonomous scientific researcher. At TU/e we challenge you to take charge of your own learning process.
  • An excellent technical infrastructure, on-campus children's day care and sports facilities.
  • An allowance for commuting, working from home and internet costs.
  • A Staff Immigration Team and a tax compensation scheme (the 30% facility) for international candidates.
Work Hours:

38 hours per week


De Rondom 70