https://www.youtube.com/watch?v=paO6oD6p-4c

-----

So, first off, I'm speaking as an AI programmer with only a passing familiarity with StarCraft. Anyway, an important thing to note here is how reinforcement learning works; the AI's only incentive is to win the game. And this is typically weird to humans, but "end the game immediately once you're guaranteed victory" does not inherently follow from "win the game."

I would posit that, when it has better economy and more units, the AI might have nearly a 100% chance of victory whether it attacks now or later. Let's say that as long as it plays generally well and maintains some level of harassment, its opponent literally can't win. From the AI's perspective, there would be absolutely no difference between "win now" or "win later" because both of them contain the only thing it cares about, "win the game." It would, potentially, randomly choose between those options, because as far as the AI is concerned there is no advantage to one over the other.

It would take the programmers hard-coding a preference into the AI for shorter games over longer games (which I don't believe was done), or some other factor learned during its training. For example, some things it might potentially have learned: the opponent might make a comeback (win now), having more units increases chance of victory (win later), sometimes over-committing results in defeat (win later). So it's conceivable all these things combine to give it a slight preference for longer games, because in some cases it's safer to do so.

I think this argument is even more interesting in the context of an imperfect information game like StarCraft (as opposed to chess or go where you have complete knowledge of your opponent's actions). It's always difficult to comment on neural networks (some interesting techniques are emerging, but they're still notoriously black boxes), and I've only skimmed parts of DeepMind's posts, but I think this could just reflect on a lack of planning or foresight on the part of the AI. Because it has imperfect information, it's always possible that the other player has a giant army hidden somewhere on the map. Figuring that out requires deductive reasoning and foresight based on observations about the other player's buildings and units throughout the game. I don't have evidence to back this up, but I wouldn't be surprised if AlphaStar's "thoughts" were closer to "I am winning this fight; keep fighting but don't over-commit in case there's a hidden army" rather than "fight to win because it is impossible for there to be a hidden army sizable enough to defeat me based on my past observations." AlphaStar isn't incentivized to "predict if my opponent has enough units to defeat me" it's incentivized to "win," which doesn't necessarily result in the same sub-goals and information prioritization that humans have.

Sorry for the long comment, I just find this super interesting, and thanks for casting these so a non-SC player can appreciate what's going on and what makes sense from a human perspective. I just wanted to point out that's it's very easy to fall into the trap of assuming something should be obvious, when that's only true given our human perspective.