Infinite Tic-Tac-Toe — RL Policy Animation
A lightweight demonstration of a heuristic RL-inspired policy. X tends to build threats; O spreads and blocks.
Policy Notes
X threat-seeker (open-three bias)
O reactive spreader / blocker
O reactive spreader / blocker
This uses heuristic scoring to imitate RL behavior (distance features, threat windows, local density, softmax selection).
No comments:
Post a Comment