Insights
March 8, 2026
AI in Energy Management: why we believe rule-based EMS will give way to Deep Reinforcement Learning
Reinout de Jongh

The energy transition has fundamentally changed the market. Electricity prices are now dictated not by demand, but by the weather. Sunny afternoons cause surpluses and grid congestion, while sunsets lead to scarcity and price spikes. For companies with assets like solar panels, batteries, and charging stations, this volatility offers enormous trading opportunities. However, these opportunities also come with risks for grid connection and operational reliability. To navigate successfully in this complex data environment, traditional 'rule-based' software is now reaching its limits.

Why rule-based EMS falls short in a dynamic energy market

Until recently, energy management was relatively straightforward. A traditional (rule-based) Energy Management System (EMS) operates based on simple, human-programmed 'if-then' rules. For example: β€œIf the electricity price is high, then discharge the battery” or β€œIf the solar panels are generating, then turn on the charging stations.” This is transparent and works excellently for simple processes.Β 

However, we're seeing that the reality behind the meter is now much more complex. What should the software do if the sun is shining and the electricity price is negative, but heavy machinery unexpectedly starts up and the vehicle fleet needs to be fully charged in an hour?

For every conceivable exception or weather change, a programmer must manually add a new rule. In practice, we see that the software encounters three fundamental limitations here:

  • Safety margins reduce returns: The real world is unpredictable. Because it's impossible to write a perfect rule for every unique combination of circumstances (prices, weather, local consumption), this forces designers to work with broad safety margins. For example, the system always keeps a large portion of the battery charged as a precaution. This is good for operational reliability, but often completely unnecessary, meaning the battery misses out on potential revenue.Β 
  • No adaptive learning capability: The software only does what a human has pre-programmed. It doesn't independently discover new correlations between weather forecasts, consumption peaks, and market movements. And that's a problem, because the market is changing rapidly and could look completely different tomorrow due to decisions by, for example, TenneT or the government.Β 
  • Unmanageable operational maintenance burden: Every business location is unique in terms of grid connection, battery capacity, and consumption profile. A rule-based system requires manual calibration of rules for each new installation. What works at location A is not transferable to location B. This makes scaling inefficient and expensive.

These limitations led us to an important insight: to achieve ultimate scalability – and guarantee the best performance for our customers – we need to look beyond traditional predictions and optimizations. This led us to (Deep) Reinforcement Learning. Many people immediately label this 'AI', but we are still cautious about that.Β 

Integrated Control with Deep Reinforcement Learning (DRL)

Deep Reinforcement Learning (DRL) is interesting to us because it addresses the shortcomings of rule-based systems through a completely different approach: it learns by interacting with its environment. Instead of following rigid rules, a virtual 'agent' makes decisions, observes the outcome, and receives a reward for good results – such as higher trading profits or better peak shaving. Through trial-and-error in a safe, virtual simulation, the agent independently discovers which strategies work best in the long term.

A DRL algorithm rapidly discovers complex relationships, including those hidden from the human eye. This is what makes this technology so powerful.

We train the model in a simulation of reality. During this training process, the agent receives a reward for making a profit and a penalty for incurring a loss. When determining the next action, the algorithm immediately processes all relevant real-time data, such as weather and market forecasts. These forecasts always contain uncertainties. While a rigid rule-based system often struggles with such unpredictability, a DRL agent in the simulation learns to perfectly anticipate these margins of uncertainty. For example, the agent learns that a specific combination of solar intensity, anticipated price peaks, and the margin of error in the weather report constitutes the optimal condition for an action. By running this simulation millions of times, the algorithm continuously improves its ability to recognize data patterns and discovers which actions lead to the best results.

This eliminates broad, inefficient safety margins, allows the system to automatically adapt to changing consumption patterns, and makes scaling to new locations much easier as manual calibration is no longer required.

To be clear: while this falls under the broad umbrella of β€˜AI’, we are explicitly not talking about models like ChatGPT or Claude that autonomously make unpredictable decisions. We are referring to a specific agent that we train ourselves, which makes decisions solely based on hard data, current forecasts, and the defined physical environment.Β 

Outlook: Operational Safety and Control over AI

The current energy landscape has become too complex and dynamic for traditional rule-based systems. Especially when you add all local considerations – such as specific energy taxes and grid management costs – into the equation. Deep Reinforcement Learning (DRL) offers a powerful alternative: it enables an EMS to learn autonomously, discover hidden patterns, and respond directly to changing market conditions and local consumption. However, deploying self-learning algorithms to control critical physical infrastructure (such as a heavy-duty business connection) raises important design questions.Β 

How do you guarantee that an algorithm constantly seeking rewards never takes operational risks that could jeopardize the grid connection? And since a DRL model is typically a closed system: how do you ensure that in practice it still communicates securely with third-party software?

Part 2 of this series discusses making the 'black box' of AI in energy management transparent, safely constraining DRL systems, and thereby maintaining control over operational reliability.

The grid doesn't wait.

Every month without proper management means lost margin.
Zympler can be immediately deployed on existing infrastructure.