Smart Grid Breakthrough: Hierarchical AI Controls Voltage in Solar and EV Networks
As the global energy transition accelerates, the integration of renewable energy and electric vehicles (EVs) into power distribution systems has become both a necessity and a challenge. While photovoltaic (PV) panels and EVs offer a pathway to decarbonization, their rapid deployment introduces significant operational complexities for utility networks. Fluctuating solar output and unpredictable EV charging patterns can lead to voltage instability, threatening grid reliability. In a groundbreaking study published in Power System Protection and Control, a team of researchers from Shandong University has unveiled an innovative solution that leverages advanced artificial intelligence to manage these challenges with unprecedented efficiency.
The research, led by QI Xianglong, CHEN Jian, ZHAO Haoran, ZHANG Wen, and ZHANG Keyu from the Key Laboratory of Power System Intelligent Dispatch and Control at Shandong University, presents a novel multi-time scale cooperative voltage control strategy based on hierarchical deep reinforcement learning (DRL). This sophisticated approach addresses the core issue of voltage deviation in modern distribution networks by intelligently coordinating a diverse set of controllable resources, from large-scale solar inverters to the charging behavior of thousands of individual electric vehicles.
The challenge facing modern power grids is multifaceted. Traditional distribution networks were designed for a one-way flow of electricity from centralized power plants to passive consumers. The advent of distributed energy resources (DERs) like rooftop solar has turned many consumers into “prosumers,” capable of both consuming and generating power. This bidirectional flow, particularly when solar generation peaks during midday, can cause voltage levels to rise above safe operating limits, a phenomenon known as overvoltage. Conversely, during periods of high demand, such as early evening when solar generation fades and EVs return home, voltage can drop too low, causing undervoltage. The situation is exacerbated by the charging habits of EV owners. Without intervention, a large number of EVs charging simultaneously in the evening can create a new peak in electricity demand, further straining the grid and worsening voltage fluctuations. This confluence of factors creates a dynamic and highly complex control problem that traditional methods struggle to manage effectively.
Conventional voltage control strategies often rely on physical devices such as on-load tap changers (OLTCs) on transformers and switched capacitor banks (SCs). These tools are effective but have limitations. They are typically slow-responding, designed for adjustments over minutes or hours, and their operation is often based on local measurements, lacking a holistic view of the entire network. Furthermore, these strategies are predominantly “model-based,” meaning they require precise mathematical models of the grid’s physical behavior. As networks grow more complex with the addition of millions of new data points from smart meters and inverters, these models become increasingly difficult to maintain and solve in real time. The computational burden can be overwhelming, and any inaccuracy in the model can lead to suboptimal or even destabilizing control actions. This reliance on complex models and extensive communication infrastructure has been a significant bottleneck in achieving truly adaptive and resilient grid management.
The Shandong University team recognized that a paradigm shift was needed. Instead of relying on a top-down, model-driven approach, they turned to data-driven artificial intelligence. Their solution is built on the foundation of deep reinforcement learning, a branch of machine learning where an “agent” learns optimal decision-making strategies through trial and error by interacting with a simulated environment. In this context, the agent is a software controller, and the environment is a digital twin of the power distribution network. The agent performs actions—such as adjusting a real-time price signal or commanding a solar inverter to produce reactive power—and receives feedback in the form of a “reward” based on how well those actions maintain stable voltage levels across all network nodes.
The true innovation of their work lies in its hierarchical and multi-time scale design. The researchers understood that not all controllable resources respond at the same speed. Solar inverters and static var compensators (SVCs) can adjust their output in seconds or minutes, making them ideal for rapid, fine-grained corrections. In contrast, influencing the charging behavior of EV users is a much slower process, measured in hours, as it depends on human decision-making and the availability of vehicles at charging stations. Attempting to control both with a single, monolithic algorithm would be inefficient and could lead to conflicting signals.
To address this, the team designed a two-layered control architecture. The upper layer operates on a longer time scale, making decisions every hour. Its primary tool is the real-time electricity price. By dynamically adjusting the price of electricity, this layer provides an economic incentive for EV owners to shift their charging to times when it is most beneficial for the grid. For example, if solar generation is high during midday, the system can lower the price, encouraging EVs to charge then and help absorb the excess renewable energy. Conversely, if a voltage drop is anticipated in the evening, the price can be increased to discourage charging and prevent overloading. This layer treats the aggregate EV charging load as a single, coarse-grained control variable.
The lower layer operates on a much shorter time scale, making decisions every 15 minutes. It directly controls the fast-acting physical devices: the PV inverters and SVCs. These devices are used to inject or absorb reactive power, a form of electricity that does not perform useful work but is essential for maintaining voltage stability. By precisely managing the reactive power output of these devices, the lower layer can make immediate, fine-tuned adjustments to counteract rapid voltage fluctuations that occur due to sudden changes in solar output or load. This layer uses a multi-agent DRL approach, where each inverter and SVC has its own intelligent agent. These agents are trained in a centralized manner, allowing them to learn how to cooperate for the overall good of the network, but they execute their control actions independently using only local measurements. This “centralized training with decentralized execution” (CTDE) framework is crucial, as it ensures the system remains robust and functional even if communication between devices is lost.
A critical component of this strategy is the modeling of human behavior. The success of the upper layer depends on accurately predicting how EV users will respond to price signals. The researchers developed a sophisticated probabilistic model that accounts for the diverse travel patterns of EV owners. They categorized users based on their primary purpose for travel—commuting to work, returning home, or leisure activities—each with distinct arrival and departure times at charging stations. By using statistical distributions to model these patterns, along with the state of charge of the vehicle’s battery, the model can predict the “window of opportunity” for charging. The model then calculates the financial incentive (the difference between the cost of charging now versus at a cheaper time) and uses this to determine the probability that a user will respond to a price signal and change their charging behavior. This allows the system to make informed predictions about the aggregate impact of its pricing decisions.
The research team rigorously tested their strategy on a modified IEEE 33-node distribution network, a standard benchmark in power systems research. The simulation included three 1.5 MW solar farms, three SVCs, switched capacitor banks, and three EV charging stations, each serving 700 vehicles. Real-world solar and load data from Belgium was used to ensure the simulation reflected realistic conditions. The results were compelling. The hierarchical DRL strategy successfully kept all node voltages within the safe operating range of 0.93 to 1.07 per unit throughout the day, effectively eliminating both overvoltage and undervoltage conditions.
To demonstrate the superiority of their approach, the researchers compared it against several alternative control strategies. A baseline strategy that used only fast-acting devices (PV and SVCs) without any price signals for EVs resulted in significant voltage deviations, particularly during the evening peak. A strategy that used a fixed, day-ahead time-of-use price for EVs improved the situation but was still unable to handle the dynamic fluctuations of the day. A strategy that attempted to control all devices on the same hourly time scale performed poorly, highlighting the importance of matching the control action to the response speed of the resource. Even a traditional optimization algorithm, the particle swarm optimization (PSO), which is commonly used for such problems, was outperformed by the DRL approach.
The hierarchical DRL strategy achieved the lowest average voltage deviation, reducing it by 38% compared to the baseline and by 16-18% compared to other advanced strategies. This translates to a more stable, reliable, and efficient grid. The implications of this research are far-reaching. For utilities, it offers a powerful new tool to manage the increasing penetration of renewables and EVs without the need for massive and costly infrastructure upgrades. It can defer or even eliminate the need for new transformers or power lines. For consumers, it enables greater participation in the energy market through dynamic pricing, potentially lowering their electricity bills while contributing to grid stability. For policymakers, it provides a technological pathway to achieve higher renewable energy targets with greater confidence in grid reliability.
The work also represents a significant step forward in the application of AI to critical infrastructure. Unlike many AI systems that are “black boxes,” the hierarchical structure of this controller provides a degree of interpretability. Engineers can understand the separation of concerns between the long-term economic incentives and the short-term physical control. The use of the CTDE framework ensures that the system is not reliant on a single point of failure in the communication network, making it inherently more resilient. The fact that the system is “model-free” is another major advantage. It does not require a perfect, up-to-date model of the entire grid, which is a constant challenge as new devices are added and network topology changes. Instead, it learns the optimal control policy directly from data, making it highly adaptable to evolving conditions.
The potential applications of this technology extend beyond voltage control. The same hierarchical DRL framework could be applied to other grid management problems, such as frequency regulation, congestion management, or optimizing the charging of a fleet of EVs for a ride-sharing service. It could also be integrated with microgrid controllers to manage local energy communities with high levels of self-consumption. As the energy landscape continues to evolve, the ability to make intelligent, real-time decisions across multiple time scales will be paramount. The research by QI Xianglong and his colleagues at Shandong University provides a robust and scalable blueprint for the intelligent, self-optimizing power grids of the future.
This study is a testament to the power of interdisciplinary research, combining deep expertise in power systems engineering with cutting-edge advances in artificial intelligence. It moves beyond theoretical exploration to present a practical, validated solution to one of the most pressing challenges in the modern energy sector. By harnessing the power of AI to orchestrate the complex dance of electrons across a network of solar panels, EVs, and traditional loads, this work paves the way for a more sustainable, resilient, and consumer-centric energy system. The future of the smart grid is not just connected; it is intelligent, adaptive, and learning from every interaction.
QI Xianglong, CHEN Jian, ZHAO Haoran, ZHANG Wen, ZHANG Keyu, Key Laboratory of Power System Intelligent Dispatch and Control, Shandong University. Power System Protection and Control, DOI: 10.19783/j.cnki.pspc.240122