Chapter 05 — Self-Refinement

Overview

Self-Refinement is an internal loop where the model critiques and iteratively improves its own output until quality converges or a round limit is reached.

Pros: Improves quality without external labels.
Cons: May reinforce errors; increased latency.

Loop pattern

Generate initial draft.
Critique: list categorized issues + prioritized checklist.
Refine: apply checklist + provide change summary.
Terminate if no material issues OR max rounds reached.

Code evolution example (palindrome)

# Round 0
 def is_palindrome(s):
     return s == s[::-1]

 # Critique:
 # - Case sensitive
 # - Punctuation unaffected
 # - No docstring or examples

 # Round 1
 def is_palindrome(s):
     """Return True if s is a palindrome ignoring case & non-alphanumerics."""
     cleaned = ''.join(ch.lower() for ch in s if ch.isalnum())
     return cleaned == cleaned[::-1]

 # Round 1 critique:
 # - Add type hints
 # - Add test example(s)

 # Round 2
 def is_palindrome(s: str) -> bool:
     """Check palindrome ignoring case & non-alphanumerics.
     Example: 'Racecar!' -> True"""
     filtered = ''.join(c.lower() for c in s if c.isalnum())
     return filtered == filtered[::-1]

Prompt skeleton

[TASK]
 Produce initial draft for: <spec>.

 [CRITIQUE]
 List issues under: correctness, completeness, style, edge cases.
 Provide a MAX 6 item improvement checklist (actionable, ordered).

 [REFINE]
 Apply checklist fully. Output revised draft + CHANGELOG.
 If no improvements possible, emit: NO MATERIAL ISSUES.

Failure modes & mitigations

Echoing flaws: Introduce external tests / validators.
Over-processing: Stop after N stable rounds.
Vague checklist items: Enforce verb + object + constraint.
Imaginary improvements: Require change summary diff.

Checklist

Round cap set?
Rubric categories defined?
External validator present?
Actionable checklist enforced?
Stop condition logged?

Link

Self-Refine (paper)