In the latest feat of artificial intelligence (AI), researchers have taught AI agents to play Ms. Pac-Man - and sometimes do it better than humans. The study, performed by Istvan Szita and Andras Lorincz from the Department of Information Systems at Eotvos University in Hungary, showed that AI agents can successfully be taught how to strategize through reinforcement learning. The researchers hope that teaching Ms. Pac-Man will be an ideal means to explore what artificial intelligence is still missing.
The researchers explain that games are ideal test environments for reinforcement learning (RL). Since the late 1950s, RL has been tested in classical games, such as checkers, backgammon, and chess. Since the 2000s, researchers have begun testing RL on modern computer games, such as the role-playing game Baldur's Gate, the strategy game Wargus, and Tetris.
Szita and Lorincz chose Ms. Pac-Man for their study because the game enabled them to test a variety of teaching methods. In the original Pac-Man, released in 1979, players must eat dots, avoid being eaten by four ghosts, and score big points by eating flashing ghosts. Therefore, a player's movements depend heavily on the movements of ghosts. However, the ghosts' routes are deterministic, enabling players to find patterns and predict future movements.
In Ms. Pac-Man, on the other hand, the ghosts' routes are randomized, so that players can't figure out an optimal action sequence in advance. This means that players must constantly watch the ghosts' movements, and make decisions based on their observations. Because players receive points for eating dots, ghosts, and fruit, a player's decisions directly influence their future decisions (based on whether they scored points or lost a life).
Szita and Lorincz took a hybrid approach to teaching AI agents how to successfully play the game. They used the "cross-entropy method" for the learning process, and rule-based policies to guide how the agent should transform its observations into the best action.
The researchers had agents play 50 games using different RL methods. They found that methods utilizing the cross-entropy policies performed better than methods that were hand-crafted. As they explained, the basic idea of cross-entropy is that it selects the most successful actions, and modifies the distribution of actions to become more peaked around these selected actions.
During the game, the AI agent must make decisions on which way to go, which are governed by ruled-based policies. When the agent has to make a decision, she checks her rule list, starting with the rules with highest priority. In Ms. Pac-Man, ghost avoidance has the highest priority because ghosts will eat her. The next rule say that if there is an edible ghost on the board, then the agent should chase it, because eating ghosts results in the highest points.
One rule that the researchers found to be surprisingly effective was the rule that the agent should not turn back, if all directions are equally good. This rule prevents Ms. Pac-Man from traveling over paths where the dots have already been eaten, resulting in no points.
AI agents who learned with the most successful policy had an average score of 8186 points over 50 games. The average score of five humans, who each played 10 games, was 8064. In the non-learning experiment when agents used random policies, the average score was just 676. Other policies produced scores that fell within this range.
While the AI agents showed they could hold their own against human players, the researchers noticed that humans sometimes used different tactics. For example, humans sometimes tried to lure the ghosts close to Ms. Pac-Man so that they would all be close by before making them edible, allowing Ms. Pac-Man to quickly eat them. The researchers noted that this strategy didn't evolve in the AI experiments. Humans also calculated the time remaining in the period for eating ghosts, as well as approximate the future positions of ghosts - both abilities that the AI agents did not demonstrate.