Investigating the Hyperparameters of Reinforcement Learning Effects on Orbit Raising Maneuver

Hamed, Soleymani; Majid, Bakhtiari; Kamran, Daneshjou

Investigating the Hyperparameters of Reinforcement Learning Effects on Orbit Raising Maneuver

عنوان مقاله: Investigating the Hyperparameters of Reinforcement Learning Effects on Orbit Raising Maneuver
شناسه ملی مقاله: AEROSPACE22_075
منتشر شده در بیست و دومین کنفرانس بین المللی انجمن هوافضای ایران در سال 1402

مشخصات نویسندگان مقاله:

Hamed Soleymani - Ph. D studentIran University of Science and Technology, School of New Technologies
Majid Bakhtiari - Assistant ProfessorIran University of Science and Technology, School of New Technologies
Kamran Daneshjou - Professor,Iran University of Science and Technology, School of Mechanical Engineering

خلاصه مقاله:

In recent years, significant advancements in the fieldof artificial intelligence have prompted space research,particularly in orbital missions, to increasinglyembrace these methods, with a specific focus onmachine learning. In this research, considering thedynamics of circular in-plane low-thrust orbit transferbased on the equinoctial differential equations as theenvironment for establishing agent interaction, acontinuous space for the problem variables which arethe six equinoctial orbital elements of a spacecraft, amodel-free algorithm called Actor-Critic algorithm, isimplemented. The action space which defined as athrust vector is applied to the environment under apolicy, and the agent is trained by Actor-Criticalgorithm, to be capable of performing the LEO toGEO low-thrust transfer. Effects of thehyperparameters such as the discount factor, learningrate and the number of nodes in actor and criticnetwork, are investigated in this scenario. It is shownthat increasing the discount factor and learning rateassists the trained agent in operating accurately in theenvironment of the orbital transfer problem. Increasein the number of nodes in the neural network cause anincrement in the learning time of the agent. Byincreasing amount of the discount factor near to ۱, theagent performs some further searches in theenvironment to find other possible optimal policies.After two training processes, one can use the trainedagent in different cases with similar dynamics to themain problem, and there is no need to adjust or resimulatethe parameters and dynamics of the problem.

کلمات کلیدی:

Low-Thrust – Equinoctial orbital elements– Reinforcement Learning – Actor-Critic networks –Agent

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/2058606/