From Introspective Prompting to Recursive Belief Attribution and Collective Mind.
Large Language Models (LLMs) | Cognitive Science | Recursive Reasoning |
Computational Social Science
This Honors Thesis, completed at the University of Sydney, provides a technical foundation for evaluating the capacity of Large Language Models (LLMs) to approximate complex human social reasoning.
The research is divided into two parts: an extension of the HI-TOM benchmark to test the Recursive Theory of Mind (ToM) up to fourth-order belief structures, and the development of a novel Collective Mind benchmark to evaluate multi-agent coordination under relational constraints. By testing models like o3-mini and GPT-3.5-turbo, the study identifies a "reasoning collapse" as social complexity increases, offering a diagnostic tool for assessing AI reliability in collaborative human environments.
While LLMs excel at pattern matching, they often struggle with Metacognition—the ability to reflect on mental states. This thesis interrogates whether LLMs possess a structured understanding of social logic or if they are simply "hallucinating" social intelligence based on statistical linguistic frequency.
This phase focuses on how well LLMs track nested belief states (e.g., "A thinks that B believes that C is unaware of X").
The Benchmark: I extended the HI-TOM benchmark to evaluate belief attribution up to fourth-order depth.
The Methodology: I conducted a comparative analysis of two "Introspective Prompting" strategies—Chain of Thought (CoT) and Program of Thought (PoT).
Technical Innovation: I adapted PoT prompting to scaffold social reasoning using Python-like pseudocode and belief-chain tracking, the first known application of this method to higher-order ToM tasks.
The second part of the thesis introduces a first-of-its-kind framework for evaluating Collective Mind—the ability of multiple agents to form and maintain shared mental models.
Relational Variables: Unlike dyadic ToM tasks, this benchmark stress-tests model outputs using triadic scenarios with three key variables:
Valence: The emotional rapport (positive, neutral, negative) between agents.
Power: Hierarchical asymmetry (seniority vs. peer status).
Visibility: Whether information and relationships are public or private.
Social Predictability: The model is tasked with predicting agent actions based on these complex relational configurations.
Recursion Limits: Model accuracy declines sharply as reasoning depth increases; while models handle 1st-order social dynamics, their stability is fragile at 3rd and 4th-order belief states.
Prompting Effectiveness: While PoT prompting provides a more stable structure for nested beliefs, its overall accuracy improvements over CoT were limited without external code execution.
Collective Blind Spots: Models exhibit sensitivity to social variables like Power and Visibility, but interaction effects between these variables frequently lead to inconsistent reasoning.
This research represents a technical deep-dive into the cognitive boundaries of AI.
The full technical report covering the development of the Collective Mind benchmark and the PoT prompting evaluation.
Overall summary of the thesis.
This thesis provides a research-based foundation for evaluating AI agents in sophisticated collaborative environments. It demonstrates my ability to conduct original research at the intersection of Computer Science and Cognitive Psychology, and demonstrates understanding of the precise boundaries where AI reasoning fails in high-stakes human systems.