Talk · April 2026

AI-assisted
engineering

Building a context system that works across sessions, tools, and team members.

Guidelines · Knowledge · Tasks

Overview

What we'll cover

The framework

1

The problem — why sessions forget everything

2

Step 1 — single guidelines file, generated in minutes

3

Step 2 — folder breakdown, skills, task-based planning

4

Step 3 — vendor-agnostic layer with AGENTS.md

5

Step 4 — harness engineering: hooks, verification, steering

Also covered

6

Knowledge management — cross-repo wiki maintained by the LLM

7

Research backing — four independent sources, same conclusions

8

Before / after — what actually changes day to day

9

Demo — live walkthrough on library-service

The problem

Every session starts from scratch

  • Stack, gotchas, and conventions need re-explaining each time.
  • Three tools on one repo produce three different contexts.
  • Non-obvious discoveries (race conditions, framework traps, rejected designs) land in Slack and disappear.
  • Guidelines written once don't get updated from real implementation experience.

The output quality of a coding assistant is determined entirely by the context it receives at the start of each session.

Why context engineering matters

LLMs generate by probability, not by rule

A transformer computes a probability distribution over the next token, then samples from it. The same prompt run twice can produce different outputs.

For engineering tasks — follow this coding standard, never use this pattern — probabilistic sampling produces inconsistent results without steering.

Context files shift the distribution toward better outputs. Harnesses verify the output didn't drift.

Next token after: new ___Map<>() in an @ApplicationScoped bean

transformer-explainer — token probability live

Where we are

"We are currently in a Fuck Around and Find Out phase of AI Engineering."

— Mario Zechner, pi.dev

There is no hard standard yet. That's a good thing.

You are free to form whatever workflow fits your team. And when a standard does emerge, migrating is low-cost — AI can do the restructuring.

The answer

Four levels, adopted as needed.

1

Single guidelines file. Start here. Two minutes to generate.

2

Folder breakdown. When the single file grows past ~80 lines.

3

Vendor-agnostic layer. When the team uses more than one AI tool.

4

Harness engineering. When advice isn't enough — verification catches what prompts miss.

Each level builds on the last. Skipping ahead is fine; stopping at Level 1 is fine.

Step 1

Every tool already has a guidelines file

ToolFile
Claude CodeCLAUDE.md
Junie (JetBrains).junie/guidelines.md
Cursor.cursor/rules/
GitHub Copilot.github/copilot-instructions.md
Windsurf.windsurfrules
Any toolAGENTS.md — AAIF standard

The file is read at the start of every session. If it exists and contains the right information, the assistant starts already knowing the codebase's constraints.

Step 1 · The prompt

Ask the AI to generate it from the codebase

prompt
Read this codebase and write a CLAUDE.md that captures:
  - How to build and run the project
  - The architecture (only what isn't obvious from the code)
  - Coding conventions and non-obvious constraints
  - Anything a developer would need that can't be inferred

Keep it under 100 lines. Only include non-inferable information.

Takes about two minutes. The file doesn't have to be written from scratch.

Step 1 · Example output

For a Java / Quarkus library API

CLAUDE.md
# library-service

## Build & Run
./gradlew quarkusDev   # live reload on :8080

## Stack
Java 21 · Quarkus 3 · JAX-RS · CDI · Jackson · Gradle 8

## Architecture
LibraryResource → LibraryService → LibraryStore (ConcurrentHashMap)
Records: Book, Loan — immutable, use with*() helpers for state transitions.

## Coding Standards
- NEVER use plain HashMap in @ApplicationScoped beans — always ConcurrentHashMap
- Tests: Given-When-Then, no eq() matchers in Mockito

## Known Gotchas
- Borrow availability: use ConcurrentHashMap.replace() — NOT get-then-put.
  Plain put() after get() has a race condition under concurrent requests.
- JAX-RS paths: @Path("/search") BEFORE @Path("/{id}").
  If /{id} comes first, "search" matches as an ID — no compile-time error.

Step 1 · What to include

Inferrable information can be brief.
Non-inferrable needs the full detail.

ETH Zurich (Feb 2026): human-curated context files improve task success by 4%; LLM-generated files reduce it by 3%. The difference is that human files contain only what the AI cannot read from source.

Brief or one lineWrite in full detail
Stack: Java 21, Quarkus 3 ConcurrentHashMap.replace() for borrow — what breaks without it, why
Tests in JUnit 5 + Mockito JAX-RS path ordering trap — the exact failure mode, no compile error
Records in domain/ NEVER eq() in Mockito — the constraint and the reason

That's Step 1. A single file, generated once, immediately useful.

As the codebase evolves, the file grows: testing standards, role definitions, task workflows, new gotchas. At around 80 lines, every session loads all of it — including the parts that aren't relevant.

Step 2 is the answer.

The single file, six weeks later

~150 lines. Every session loads all of it.

CLAUDE.md
# library-service

## Build & Run ...
## Stack ...
## Architecture ...
## Coding Standards ...         ← 15 lines
## Testing Standards ...        ← 20 lines
## Known Gotchas ...            ← 12 lines
## Task Workflow ...            ← 30 lines
## Roles ...                    ← 25 lines
## Workflow Prompts ...         ← 20 lines
## Tech Debt ...                ← 10 lines
  • Fixing a bug in LibraryService doesn't need the role definitions — they load anyway.
  • Testing standards are copy-pasted across three repos. One standard change means three updates.
  • Stale sections: TASK-002 completed, tech debt entry never removed.

Step 2

Folder Breakdown

One concern per file. Load only what the session needs. Shared standards live in one place.

Step 2

CLAUDE.md becomes a manifest

structure
CLAUDE.md (20 lines — @-imports only)
    @.ai/knowledge.md          ← tribal knowledge, loaded every session
    @.ai/context/stack.md      ← build, run, dependencies

.ai/skills/                    ← domain knowledge, workflow phases, and roles
    java-standards/            → /java-standards
    testing-standard/          → /testing-standard
    pre-flight/                → /pre-flight TASK-001
    build-plan/                → /build-plan TASK-001
    execute/                   → /execute TASK-001 "Stage 1"
    close-task/                → /close-task TASK-001
    architect/                 → /architect

.ai/hooks/verify-build.sh      ← quality gate
.ai/tasks/TASK-001/            ← per-task working memory

Run bash .ai/setup.sh once. All skills become native slash commands for Claude Code, Junie, and Cline from the same source.

Step 2 · Skills

Domain knowledge, workflows, and roles in one format

.ai/skills/testing-standard/SKILL.md
---
name: testing-standard
description: Testing standards — Given-When-Then, Mockito patterns, domain
             record deserialization tests. Load when writing or reviewing any test.
---

- NEVER use `eq()` matchers in Mockito — use raw values directly
- ALWAYS use ConcurrentHashMap.replace() for availability updates
- NEVER skip the layer boundary: Resource → Service → Store

Structure: Given / When / Then
New domain records require a JSON deserialization test.
  • The name field becomes the /slash-command.
  • The description controls auto-loading — right skill for the task, no explicit invocation needed.
  • Skill files are capped at 50 lines. One concern per file.

Step 2 · Workflow skills

A self-refreshing phase, not a saved message

.ai/skills/pre-flight/SKILL.md
---
name: pre-flight
description: Deep codebase analysis before planning. Always run before build-plan.
---

## Live context
- Recent commits: !`git log --oneline -5`       ← runs at invocation, always current
- Modified files: !`git diff --name-only HEAD`

## Instructions
1. Load the task folder (.ai/tasks/[TASK-ID]/)
2. Trace method chains affected by this task
3. Cross-check knowledge.md — flag applicable gotchas
4. Identify risks: concurrency, layer boundaries, records needing tests
5. Update findings.md — do NOT implement anything

Task ID: $ARGUMENTS

! runs shell at invocation. $ARGUMENTS takes the ticket ID. One file serves every ticket.

Step 2 · Additional benefit

Task-Based Planning

The same folder structure that holds skills and context can hold per-task working memory — making multi-session work resumable without re-narration.

Task workflow

Session = stateless.
Task folder = stateful.

Each AI session loads the task folder fresh. Decisions made two days ago, rejected approaches, constraints discovered mid-implementation — all in the files, not the conversation history.

structure
.ai/tasks/TASK-001/
├── context.md    ← what and why (set at kickoff, static after)
├── task-plan.md  ← phases, design decisions, files to touch
├── checklist.md  ← executable steps with stage checkboxes
├── findings.md   ← discoveries during work (append-only, raw)
├── tracker.md    ← Current Focus updated every session
└── summary.md    ← written before the knowledge flush

Task workflow · Five phases

One skill per phase

PhaseWhat happens
/new-taskScan codebase, create task folder with 6 files.
/pre-flightTrace method chains, cross-check knowledge.md, identify risks, update findings.md.
/build-planRead findings, write task-plan.md and checklist.md. Pauses for confirmation before any code is written.
/executeImplement one stage, check off items, run quality gate.
/close-taskAudit checklist, write summary.md, flush to knowledge.md, draft PR description.

The task is not closed until the flush is done.

Task workflow · Zero-warmup resumption

tracker.md — Current Focus

tracker.md
## Current Focus
**Last action**: Pre-flight complete — borrow race condition documented in findings.md
**Next action**: Run /build-plan TASK-001 to generate staged plan
**Open decision**: Should partial failure be rolled back? (non-blocking — log as tech debt)

| Active Role | Architect    |
| Stage       | Pre-planning |

Session-start prompt: "Load tracker.md, checklist.md, findings.md for TASK-001. Resume from Current Focus."

Task workflow · Knowledge lifecycle

Every task makes the next one slightly easier

lifecycle
Task starts         → task folder created (context.md, findings.md)
     ↓
Pre-flight finds race condition
                    → findings.md (written at time of discovery)
     ↓
Implementation confirms the fix
                    → findings.md updated
     ↓
Task closes         → summary.md written → flush to knowledge.md
     ↓
knowledge.md loaded at every future session in this repo
     ↓
Wiki ingest         → promoted to wiki/gotcha/
                       available across all repos and all future engineers

Steps 1 and 2 work well for one tool.

When the team uses Claude Code, Junie, and Cursor on the same codebase, each tool reads its own config file. The context diverges. One engineer's tool sees the updated rule; another's does not.

Step 3 consolidates this into one source.

Step 3

Three tools, three copies of the same context

Before

CLAUDE.md
  — 150 lines of context

.junie/guidelines.md
  — same 150 lines,
    maintained separately

.cursor/rules/
  — same 150 lines,
    different format

Three copies. When one drifts, engineers get different suggestions from different tools.

After

AGENTS.md
  — canonical source
    (≤150 lines)

CLAUDE.md
  — adapter: @-imports

.junie/guidelines.md
  — adapter: @-imports

.ai/
  — the actual content

Update .ai/knowledge.md once. Every tool picks it up.

Step 3 · The standard

December 2025: Linux Foundation forms AAIF

Anthropic, OpenAI, Google, Microsoft, and AWS donate AGENTS.md as the cross-tool canonical standard. All major AI coding tools now read it.

AGENTS.md is the canonical entry point. Tool-specific files are thin adapters that reference .ai/ — they don't duplicate it.

structure
[repo]/
├── AGENTS.md                  ← canonical (≤150 lines, non-inferable only)
├── CLAUDE.md                  ← adapter: @-imports from .ai/
├── .junie/guidelines.md       ← adapter: @-imports from .ai/
└── .ai/                       ← the actual source of truth

Step 3 · AGENTS.md

≤150 lines. Non-inferable only.

AGENTS.md
# library-service — Agent Context

## Build
./gradlew clean build test
Quality gate: .ai/hooks/verify-build.sh

## Non-obvious constraints
- NEVER plain HashMap in @ApplicationScoped — ALWAYS ConcurrentHashMap
- Borrow flip: NEVER get-then-put — ALWAYS ConcurrentHashMap.replace()
- JAX-RS: @Path("/search") BEFORE @Path("/{id}") — or "search" matches as an ID
- Records are immutable — use with*() helpers, never add setters

## Task workflow
Folders: .ai/tasks/[TASK-ID]/ — context.md, checklist.md, findings.md,
         tracker.md, task-plan.md, summary.md
Workflow: /new-task → /pre-flight → /build-plan → /execute → /close-task

## Full context
.ai/knowledge.md, .ai/context/, .ai/skills/

Recap

The full picture

1

Single file. Ask the AI to generate it from the codebase. Works today.

2

Folder breakdown. knowledge.md, context/, skills/, tasks/. Each file one concern, loaded on demand.

3

Vendor-agnostic. AGENTS.md canonical + tool adapters. One source, no drift between tools.

4

Harness engineering. Automated hooks, verification loops. Advice becomes enforcement.

Step 1 is better than nothing. Step 2 is better when the file grows. Step 3 is better when the team uses multiple tools.

Step 4

Harness Engineering

When advice isn't enough. Verification catches what prompts miss. Transition from structured context to automated enforcement.

Harness Engineering

Coding Agent = Model + Guide + Harness

Layer 1

Model

Base intelligence

  • Claude, Gemini, GPT-4
  • Not configurable by you
  • Improves with new releases

Layer 2

Guide

Before execution

  • AGENTS.md
  • skills/
  • knowledge.md
  • task folder context

Layer 3

Harness

During & after execution

  • Hooks (shell scripts)
  • Lint scripts
  • Quality gates
  • Verification checks

The Guide shifts the probability distribution before execution. The Harness verifies the output didn't drift after. Both are fully under your control.

Harness Engineering

Verification beats advice

Advice (Guide layer)

# knowledge.md

- NEVER use single-implementation
  interfaces
- NEVER skip the layer boundary:
  Resource → Service → Store

The agent reads this. It may still drift under a long session or a complex task.

Verification (Harness layer)

# .ai/hooks/check-style.sh

grep -r "interface.*Service" src/ \
  | grep -v "//.*ok" \
  && echo "Single-impl interface found" \
  && exit 1

The build fails. The agent cannot reassure its way past a broken hook.

Harness Engineering

The steering loop

When the agent makes a mistake, the primary response is to update the harness — not just fix the chat.

Agent runs
Session
Mistake caught
Review
Hook + skill
Write rule
knowledge.md
Persist
Every future session
Prevented

From "Prompt Engineering" to Systems Engineering: engineer out the possibility of error rather than describe your way around it.

Beyond the codebase

Knowledge Management

Per-repo context handles one codebase. For knowledge that spans all repos and domains, a different tool is needed.

Knowledge management

Cross-repo knowledge has no home

A debugging session in library-service produces a finding about ConcurrentHashMap race conditions. That finding is relevant in every concurrent service. A design decision made in one microservice informs four others.

The Karpathy LLM Wiki pattern (2024): drop a raw source, the LLM synthesizes it into a typed wiki page. Knowledge is compiled once and kept current.

This wiki links back to the repos it covers. Drop raw material from any repo; any AI tool can ingest and query it. Knowledge is centralized, not scattered across codebases.

Knowledge management · Structure

personal-wiki

structure
personal-wiki/
├── schema.md     ← the spec: page types, frontmatter, naming rules
│                    the LLM reads this to know what to write and where
├── CLAUDE.md     ← loads schema + wiki structure (Claude Code)
├── JUNIE.md      ← same for Junie — any tool, same wiki
└── wiki/
    ├── index.md  ← content catalog — updated on every ingest
    ├── log.md    ← append-only activity log
    ├── repo/     ← per-repo architecture and patterns
    ├── debug/    ← root causes, symptoms, fixes
    ├── gotcha/   ← traps documented once, searchable forever
    ├── decision/ ← architectural choices and their rationale
    └── reference/← runbooks, command references, API summaries

Both CLAUDE.md and JUNIE.md load the schema at session start. Drop raw material and any tool can ingest it correctly.

Knowledge management · Ingest

Drop a raw source. The LLM does the bookkeeping.

Raw source → destination

Meeting noteswiki/meeting/
Debug logwiki/debug/
Jira exportwiki/jira/
Confluence docwiki/reference/
Code reviewwiki/gotcha/

What the LLM does

  1. Read the source
  2. Ask clarifying questions only if ambiguous
  3. Write a typed page with correct frontmatter
  4. Update existing pages touched by new info
  5. Flag contradictions
  6. Update index.md
  7. Append to log.md

Confluence has raw text. The wiki page has the synthesized answer — extracted once, not re-derived on every query.

Research backing

Four independent sources, same structure

SourceFindingMaps to
ETH Zurich (Feb 2026) Human-curated context files +4% task success. Non-inferable only. knowledge.md lean by design
Git Context Controller
arxiv 2508.00031
Cross-session continuity requires no re-teaching when context is in files, not chat history. Task folders over conversation history
Context Engineering
LangChain / JetBrains, 2025
Four strategies: Write · Select · Compress · Isolate. findings.md · skills · summary.md · task folders
Block Engineering
Square, 2025
AI Champions model: AI-authored code +69%, time savings +37%, automated PRs +21×. One context owner per repo

The before / after

What changes in practice

BeforeAfter
Re-explaining the stack every sessionknowledge.md loaded automatically
Same gotcha hits a second timeFlush from TASK-001 is in knowledge.md for TASK-002
Same review comment on every PRCodified as NEVER eq() in testing-standard skill
Writing PR descriptions from scratch/close-task drafts it from checklist and findings
Three tools, three different contextsAGENTS.md + .ai/ — one source, all tools
New engineer needs a week to onboardPoint them at knowledge.md and a task folder

Demo

library-service — Java / Quarkus REST API

  1. Show stage-1-single-file.md — "Generated in 2 minutes."
  2. Show stage-2-growing-file.md — "~150 lines."
  3. Open CLAUDE.md — "20 lines, all @-imports."
  4. Open .ai/knowledge.md — the ConcurrentHashMap gotcha
  5. Show .ai/skills/ listing — domain + workflow + roles
  6. Open pre-flight/SKILL.md! injection, $ARGUMENTS
  7. Open architect/SKILL.md — role shift, same facts
  8. Show setup.sh — one command wires all slash commands
  9. Open AGENTS.md — the vendor-agnostic layer
  10. Open .ai/tasks/TASK-001/findings.md + tracker.md
  11. /build-plan TASK-001 — live, pauses for confirmation
  12. /new-task TASK-003 — finds constraint from knowledge.md without being told

Takeaways

Eight things to carry forward

  1. Start with one file. Ask the AI to generate it. Better than nothing, today.
  2. Inferrable stays brief. Non-inferrable gets the full detail.
  3. Split when it grows. Skills, context, tasks in dedicated files, loaded on demand.
  4. AGENTS.md for multi-tool teams. One source; tool files are thin adapters.
  5. Session = stateless; task folder = stateful. The core mental model.
  6. The flush is not optional. Knowledge accumulates only if discoveries get promoted.
  7. Verification beats advice. Write a hook rather than a rule where possible.
  8. It's a system. Any AI, any IDE, same context, same constraints.

Thank you

Questions?

Karpathy LLM Wiki · gist.github.com/karpathy

AGENTS.md standard · agents.md

ETH Zurich study · arxiv.org/html/2602.20478v1

ibnufirdaus.dev · ibnu.cs2016@gmail.com

← → to navigate · T to toggle theme · Esc for slide picker

Addendum

Prompts you can use today

Copy any of these into your AI coding tool. Each one moves you one level forward.

Addendum · Step 1

Generate a guidelines file from scratch

prompt
Read this codebase and write a CLAUDE.md (or AGENTS.md) that captures:
- How to build and run the project
- The architecture — only what is not obvious from reading the source
- Coding standards and non-obvious constraints
- Anything a developer needs to know that cannot be inferred

Rules:
- Keep it under 100 lines
- Inferrable facts (stack, test framework) can be one-liners
- Non-obvious constraints need the full detail — include the "why" and
  the failure mode, not just the rule
- If something can be read from the code, leave it out

Addendum · Step 2

Restructure a growing file into a folder

prompt
The current guidelines file has grown past ~80 lines.
Restructure it into a .ai/ folder:

1. .ai/knowledge.md — architectural facts, gotchas, non-obvious constraints
2. .ai/context/     — narrow reference sheets for specific concerns
                      (stack, deployment, key patterns). Each file ≤50 lines.
3. .ai/skills/      — repeated workflows and behavioral rules as skill files
                      (testing approach, task lifecycle, code review rules).
                      One topic per file.
4. Rewrite CLAUDE.md as a manifest of @-imports only — under 30 lines,
   no duplicated content, just pointers to .ai/ files.

Do not write any files yet. Propose the folder structure and
show me what goes where. I will confirm before you create anything.

Addendum · Step 3

Add a vendor-agnostic layer with AGENTS.md

prompt
We use more than one AI coding tool in this repo.
Create a vendor-agnostic context layer:

1. Create AGENTS.md at the repo root (≤150 lines):
   - Non-inferable information only — build commands, architecture,
     constraints, key gotchas
   - No tool-specific syntax

2. Make existing tool-specific files thin adapters:
   - CLAUDE.md      → @-imports from .ai/ + Claude-specific settings only
   - .junie/        → @-imports from .ai/ + Junie-specific settings only
   - Nothing should be duplicated across tool files — .ai/ is the source

Show me the proposed AGENTS.md and updated adapter files before writing.

Addendum · Step 4

Add hooks, task folders, and a steering loop

prompt
Level up the AI setup with automated verification and task-based planning:

1. Audit .ai/knowledge.md for prose rules that could be enforced by a script.
   For each one, propose a hook in .ai/hooks/ that exits non-zero on violation.

2. Create .ai/hooks/verify-build.sh — runs the build and tests, prints a
   clear error on failure. Wire it into AGENTS.md: agent must run it before
   marking any task complete.

3. Create a task folder template at .ai/tasks/TEMPLATE/:
   - context.md   — background, links, session protocol
   - checklist.md — implementation steps
   - findings.md  — gotchas discovered during work
   - tracker.md   — last action / next action / open decisions
   - summary.md   — post-task writeup before flushing to knowledge.md

   Add a rule to AGENTS.md: every non-trivial task gets a folder.
   The task is not closed until findings are promoted to knowledge.md.

Propose before writing. Show me the hook scripts and folder template first.

Addendum · Knowledge wiki

Bootstrap a personal engineering wiki

prompt
Create a personal engineering wiki in this directory.
The wiki follows the Karpathy LLM Wiki pattern — knowledge is compiled
once and kept current, not re-derived on every query.

Structure:
  wiki/
  ├── index.md      — content catalog, one line per page
  ├── log.md        — append-only activity log
  ├── repo/         — per-repository knowledge pages
  ├── decision/     — architectural and technical decisions
  ├── debug/        — debugging sessions and root causes
  ├── gotcha/       — non-obvious traps and constraints
  ├── concept/      — established technical concepts
  └── note/         — freeform captures

Rules:
- Every page has YAML frontmatter: type, repo, tags, created, updated
- index.md is always up to date — update it on every change
- log.md is append-only — never edit past entries
- Cross-links use [[category/slug]] Obsidian format
- You (the LLM) write the wiki. I curate the sources.

Create the folder structure, index.md, and log.md now.
Do not create placeholder pages — only real structure.

Addendum · Knowledge wiki

Ingest a source into the wiki

prompt
Ingest the following source into the wiki.

[paste notes, debug log, meeting MoM, ticket, code snippet, or screenshot]

Steps:
1. Read the source and identify the type:
   repo · decision · debug · gotcha · concept · note
2. Write a wiki page in the correct wiki/ subdirectory with full frontmatter
3. Update any existing pages that are touched by this new information
4. Flag any contradictions with existing pages in both pages
5. Update wiki/index.md with all new or significantly changed pages
6. Append a one-line entry to wiki/log.md

Do not invent facts not present in the source.
Mark gaps and open questions directly in the page under ## Open Questions.

Addendum · Full audit

Scan and prioritise everything at once

prompt
Do a full scan of this repo's AI setup and produce a prioritised
list of improvements.

Scan: AGENTS.md, CLAUDE.md, .claude/, .junie/, .ai/ (knowledge.md,
context/, skills/, hooks/, tasks/), and any other agent context files.

For each area assess:
- Presence     — what exists vs what is missing from the four-level framework
- Lean-ness    — inferrable facts wasting context? Non-obvious constraints missing?
- Skills       — repeated workflows codified, or re-explained every session?
- Verification — quality gates enforced as runnable hooks, or just prose advice?
- Task workflow — task folder structure present? Complex enough to need one?
- Vendor layer — does AGENTS.md exist? Do tool files duplicate or delegate?
- Knowledge    — mechanism to promote discoveries into knowledge.md?

Output format:
  ## Current state
  ## P1 — High impact, quick to fix
  ## P2 — High impact, requires more work
  ## P3 — Nice to have
  ## Suggested next step

Do not modify any files. Audit and report only.
AI-Assisted Engineering · April 2026 ← home
01 / 01