Novel Dynamic Decomposition-Based Multi-Objective Evolutionary Algorithm Using Reinforcement Learning Adaptive Operator Selection (DMOEA/D-SL)



Título del documento: Novel Dynamic Decomposition-Based Multi-Objective Evolutionary Algorithm Using Reinforcement Learning Adaptive Operator Selection (DMOEA/D-SL)
Revista: Computación y sistemas
Base de datos:
Número de sistema: 000607921
ISSN: 1405-5546
Autores: 1
1
1
1
1
1
2
1
Instituciones: 1Tecnológico Nacional de México, Instituto Tecnológico de Ciudad Madero, México
2Tecnológico Nacional de México, Instituto Tecnológico de León, México
Año:
Periodo: Abr-Jun
Volumen: 28
Número: 2
Paginación: 739-749
País: México
Idioma: Inglés
Resumen en inglés Within the multi-objective (static) optimization field, various works related to the adaptive selection of genetic operators can be found. These include multi-armed bandit-based methods and probability-based methods. For dynamic multi-objective optimization, finding this type of work is very difficult. The main characteristic of dynamic multi-objective optimization is that its problems do not remain static over time; on the contrary, its objective functions and constraints change over time. Adaptive operator selection is responsible for selecting the best variation operator at a given time within a multi-objective evolutionary algorithm process. This work proposes incorporating a new adaptive operator selection method into a Dynamic Multi-objective Evolutionary Algorithm Based on Decomposition algorithm, which we call DMOEA/D-SL. This new adaptive operator selection method is based on a reinforcement learning algorithm called State-Action-Reward-State-Action Lambda or SARSA (λ). SARSA Lambda trains an Agent in an environment to make sequential decisions and learn to maximize an accumulated reward over time; in this case, select the best operator at a given moment. Eight dynamic multi-objective benchmark problems have been used to evaluate algorithm performance as test instances. Each problem produces five Pareto fronts. Three metrics were used: Inverted Generational Distance, Generalized Spread, and Hypervolume. The non-parametric statistical test of Wilcoxon was applied with a statistical significance level of 5% to validate the results.
Keyword: Adaptive,
Operator,
Selection,
Dynamic,
Multi-objective,
Optimization
Texto completo: Texto completo (Ver PDF) Texto completo (Ver HTML)