Crusoe in Cogniland: Emergence and Control of Latent Reasoning in Deep RL

This project investigates the emergence and modulation of diverse behavioral policies in Deep Reinforcement Learning agents within the procedurally generated Cogniland environment. By training agents on survival and navigation tasks, the research aims to identify latent internal structures that correspond to distinct strategies—such as risk-taking versus energy-conservation—and analyze them using mechanistic interpretability. Ultimately, the study seeks to develop a framework for online behavioral control, using lightweight constraint networks to steer an agent’s reasoning and decision-making during inference without requiring additional fine-tuning. This work bridges the gap between artificial world models and biological spatial reasoning while addressing core challenges in AI alignment and safety.

Scroll to Top