To address the long-term operational challenges of space environment monitoring buoys under extreme Arctic conditions—characterized by polar day–night alternations, ultra-low temperatures (often dropping below -40°C), strong winds, and highly variable renewable energy availability—this paper proposes an energy management optimization method based on deep reinforcement learning (DRL) algorithms. A comprehensive buoy system model is constructed, integrating renewable energy (photovoltaic panels and wind turbines) and lithium-ion battery power supply units, lead–acid battery energy storage units, and multi-sensor load units (including ionospheric detectors, geomagnetic instruments, high-precision attitude sensors, temperature and humidity sensors, and satellite communication modules). Critically, the model incorporates Arctic-specific environmental constraints, particularly low-temperature-induced battery efficiency degradation patterns that can reduce energy storage performance by 30–50% in extreme cold, as well as dynamic fluctuations in solar irradiance (zero during polar nights) and wind speed (frequent gusts exceeding 25 m/s). To balance operational stability and energy efficiency, a scientific reward function is designed to minimize unsupplied energy while strictly ensuring the functional integrity of multi-sensor monitoring tasks, with penalty terms for both energy surplus (wasted renewable generation) and deficit (compromised sensing operations). Using the Twin Delay Deep Deterministic Policy Gradient (TD3) algorithm on the MATLAB simulation platform, comparative experiments are conducted to verify the effectiveness of two energy storage configurations: a photovoltaic-lithium battery–lead–acid battery system and a hybrid photovoltaic–wind–lithium battery–lead–acid battery system for long-term Arctic observation. Based on 180 days of real Arctic environmental data (irradiance, wind speed, temperature) sourced from the NSF Data Center, the results demonstrate that the proposed method dynamically optimizes charging/discharging strategies of energy storage systems and power allocation of supply units in real time. It not only reduces lithium battery power demand by 87.5% (from 61.44 kWh to 7.685 kWh) compared to photovoltaic-only systems but also extends the buoy’s continuous operational duration to over six months—overcoming the 60-day limit of traditional solar-powered buoys once polar nights begin. Additionally, the TD3 algorithm’s dual-critic networks and target policy smoothing mechanisms enhance scheduling intelligence and robustness, effectively mitigating the uncertainties of renewable energy in extreme environments. This research provides a robust technical solution for overcoming energy bottlenecks in polar monitoring equipment, reduces operational costs and environmental risks associated with lithium battery overreliance, and offers new insights for energy management in intelligent polar observation systems operating in harsh, dynamic environments.
Previous Article in event
Next Article in event
Next Article in session
An Energy Management Optimization Method for Arctic Space Environment Monitoring Buoys Based on Deep Reinforcement Learning
Published:
07 May 2026
by MDPI
in The 3rd International Online Conference on Energies
session Energy and Environment. Sustainable Transition
Abstract:
Keywords: Deep reinforcement learning; Arctic space environment monitoring buoy; Energy management; Dual-delay deep deterministic policy gradient; Multi-sensor
