During a high-intensity 4-day practicum at the Harvard Graduate School of Education, we collaborated with Imagine Learning (a leading US K-12 publisher) to solve a critical problem:
How can AI responsibly generate high-quality, classroom-ready tasks that develop "Durable Skills"?
Using Evidence-Centered Design (ECD), we successfully prototyped a Custom GPT that translates abstract communication competencies into precise, observable instructional activities for Grades 9-12.
Initially, our scope was a broad K-12 framework. However, early testing revealed that LLMs tend to drift into "generic pedagogy" when the target audience is too wide.
The Decision: We narrowed our scope to Grades 9-12 and focused on a single, high-impact construct: "Communicating to be Understood."
This allowed us to build a more robust logic gate for the AI, ensuring the output was sophisticated enough for high school students while maintaining rigorous alignment with the America Succeeds framework.
To ensure the AI output wasn't just "flavor text," we built a four-stage engineering pipeline:
Construct Definition: We operationalized Communication not just as "talking," but as the sophisticated exchange of thoughts with acute audience awareness.
Student & Evidence Models: We defined the specific observable behaviors (e.g., adapting tone for a specific stakeholder) that prove a student is competent.
Task Models: We designed "constraints." For example, requiring the AI to design tasks for pairs or small groups to ensure the "communication" evidence stayed attributable to individual students.
Prompt Engineering: We synthesized these models into a multi-layered System Prompt, acting as a "Pedagogical Guardrail" for the LLM.
One of my primary contributions was Constraint Engineering. We discovered that without explicit, step-by-step logic, the GPT would "spill outward"—creating interactions that were fun but impossible for a teacher to grade.
Key Technical Insights:
Avoiding Pedagogical Drift: AI needs a "prioritization logic" to know which learning objectives matter most in a task.
Reducing Teacher Burden: We refined the bot instructions to ensure the output included ready-to-use materials (handouts, prompts), reducing the "work" a teacher has to do to implement the AI's idea.
Iterative Prompting: We used a "Human-in-the-loop" approach, testing 20+ iterations of the same task to find where the AI logic broke down.
Below is the presentation slides, and a full Technical Brief outlining our research, design pivots, and final outcomes.
"The HGSE students delivered while providing a blueprint for the process. The practicum will feed innovation at IL... providing growth for emerging educational leaders and innovators." — Imagine Learning Leadership
Outcome: Delivered a Proof of Concept (PoC) that demonstrated how LLMs, when constrained by Learning Science, can generate evidence-rich tasks that are far superior to standard "off-the-shelf" AI responses.