Published Papers
2022
In recent years, deep reinforcement learning has attracted significant attention from both industry and academia and is used in solving various complex problems. In this paper, an actor-critic algorithm called twin delayed deep deterministic policy gradient (TD3) is used to solve the Inverted Pendulum control problem. For the same purpose a novel dynamic and improved reward function is presented. The choice of reward function was based on dynamic behaviour of the pendulum. The experiment results show the dominance of combination of TD3 and proposed reward function over other methods that were used to solve this problem in terms of performance specification and reward stabilization.
2023
Soft actor critic (SAC) is an off policy deep reinfor-cment learning algorithm that learns by using the principle of maximum entropy regularization. It updates the policy stochas-tically. The actor-critic formulation is combined with off policy learning procedure in SAC which helped it to attain state-of-art performance. This paper suggests some methods to enhance the performance of SAC. Firstly, multi-step look ahead is suggested which deals with the estimation error of q - values followed by an approach of dual actor network and delayed updates of the target and policy networks which minimize the errors transferred during the training. The resulted ImprovedSAC is compared with the original SAC interms of maximum rewards obtained. The simulation results in the benchmark control tasks of MUJOCO environments shows the effectiveness of the suggested methods.
Read more: Improved Soft Actor-Critic: Reducing Bias and Estimation Error for Fast Learning
2024
In this paper, a new secondary voltage control system for islanded AC microgrids is introduced. This system utilizes a multi-agent framework to maintain voltage stability and ensure proportional distribution of current. An advanced off-policy deep reinforcement learning technique, specifically the twin delayed deep deterministic policy gradient (TD3), is integrated as the secondary controller within the AC microgrid configuration. A specialized reward function is developed to correct the discrepancies in current distribution and stabilize voltage levels, optimizing the performance of the TD3 algorithm. This control mechanism is structured in a distributed manner, allowing each agent to communicate and share vital control information. Simulation results on a three-distributed generators (DGs) AC inverter-based microgrid system validate the accuracy of this novel multi-agent TD3-based secondary voltage controller. These results demonstrate improved performance compared to those reported in previous studies.
2025
In recent years, Reinforcement Learning (RL) has emerged as a promising approach for addressing control problems in complex, nonlinear environments. The ability of RL algorithms to autonomously learn optimal control policies without requiring an explicit system model makes them particularly suited for real-time applications in fields like robotics and industrial automation. This paper investigates the application of RL algorithms for controlling a magnetic levitation (maglev) system, which presents significant challenges due to its nonlinear dynamics. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and the integration of Prioritized Experience Replay (PER) are employed to enhance learning efficiency. The impact of varying action noise (random perturbations added to actions) for exploration and system performance is explored; results demonstrate that TD3 with PER achieves superior stability, faster convergence, and reduced rise and settling times without overshoot. These findings emphasize the importance of tailored exploration strategies in optimizing control policies for complex environments, highlighting the potential of advanced RL techniques for real-time control
applications.