kapitals-pi & SEN: x̄ - > Infinite Tic-Tac-Toe

Friday, November 14, 2025

Infinite Tic-Tac-Toe — RL Policy Animation

A lightweight demonstration of a heuristic RL-inspired policy. X tends to build threats; O spreads and blocks.

Speed:

Grid:

Policy Notes

X threat-seeker (open-three bias)
O reactive spreader / blocker

This uses heuristic scoring to imitate RL behavior (distance features, threat windows, local density, softmax selection).