Training intelligent adversaries using self

您所在的位置：网站首页 › 伽罗魔方 › Training intelligent adversaries using self

Training intelligent adversaries using self

2023-12-20 13:25| 来源: 网络整理| 查看: 265

In the latest release of the ML-Agents Toolkit (v0.14), we have added a self-play feature that provides the capability to train competitive agents in adversarial games (as in zero-sum games, where one agent’s gain is exactly the other agent’s loss). In this blog post, we provide an overview of self-play and demonstrate how it enables stable and effective training on the Soccer demo environment in the ML-Agents Toolkit.

The Tennis and Soccer example environments of the Unity ML-Agents Toolkit pit agents against one another as adversaries. Training agents in this type of adversarial scenario can be quite challenging. In fact, in previous releases of the ML-Agents Toolkit, reliably training agents in these environments required significant reward engineering. In version 0.14, we have enabled users to train agents in games via reinforcement learning (RL) from self-play, a mechanism fundamental to a number of the most high profile results in RL such as OpenAI Five and DeepMind’s AlphaStar. Self-play uses the agent’s current and past ‘selves’ as opponents. This provides a naturally improving adversary against which our agent can gradually improve using traditional RL algorithms. The fully trained agent can be used as competition for advanced human players.

Self-play provides a learning environment analogous to how humans structure competition. For example, a human learning to play tennis would train against opponents of similar skill level because an opponent that is too strong or too weak is not as conducive to learning the game. From the standpoint of improving one’s skills, it would be far more valuable for a beginner-level tennis player to compete against other beginners than, say, against a newborn child or Novak Djokovic. The former couldn’t return the ball, and the latter wouldn’t serve them a ball they could return. When the beginner has achieved sufficient strength, they move on to the next tier of tournament play to compete with stronger opponents.

In this blog post, we give some technical insight into the dynamics of self-play as well as provide an overview of our Tennis and Soccer example environments that have been refactored to showcase self-play.

【本文地址】

Training intelligent adversaries using self

Training intelligent adversaries using self

今日新闻

推荐新闻