A walkthrough of the “stem-changing verbs mislabeled as regular” bug: what the learner experienced, where in the pipeline it broke, and the three-layer fix.
Stem-changing verbs like apostar (o→ue, “apuesto”), jugar (u→ue, “juego”) and querer (e→ie, “quiero”) were being taught with a “regular verbs” grammar label — while simultaneously carrying a “stem-changing” label. The two contradict, so a beginner learns to over-generalize the wrong form (quero, apostas, juga). The real cause wasn’t the wording of the lesson — it was that the component-extraction service was hammering a credit-depleted Google AI-Studio key and failing (HTTP 429) on every call, so the step that cleans up contradictory labels never ran. Fix: point that path at the funded Vertex AI endpoint (gemini-3.5-flash), backed by a deterministic guard that drops the “regular” label whenever a stem-change label is present.
The fill-in-the-blank teaches one grammar “pattern” per turn. For these verbs, the displayed pattern was simply wrong:
| Hidden word | What the app labeled it | Reality | Wrong form it teaches |
|---|---|---|---|
| apuesto (apostar) | “saying ‘I’ with regular verbs” | o→ue stem-changer | apostas / aposto |
| juego (jugar) | “saying ‘I’ with regular verbs” | u→ue (the only one in Spanish) | jugo / juga |
| quiero (querer) | “regular -er verb” exemplar | e→ie stem-changer | quero |
Two independent graders (Claude on the text, Gemini on the screenshots) both scored pattern_taught_well at 2 / 5 across stories — the single lowest dimension in the whole app, and a genuine “teaches you the wrong thing” bug rather than cosmetic polish.
Each turn flows through a pipeline. The grammar label the learner reads is produced near the end, by the component extractor:
Step 3 has a “dedup / merge” sub-step whose whole job is to consolidate and correct pattern labels — e.g. collapse “-ar verbs” + “-er verbs” into the family “regular verbs”, and keep stem-changers in their own bucket. That sub-step (and the per-word secondary calls) ran on Google’s AI-Studio Gemini key.
The AI-Studio key’s prepaid credits are depleted. Every Gemini call from the extractor returned 429 RESOURCE_EXHAUSTED — hundreds of times in a single session:
app.adapters.ai.openrouter_component_extractor - ERROR -
Component extraction failed for word 'bebo': 429 RESOURCE_EXHAUSTED.
{'error': {'code': 429, 'message': 'Your prepayment credits are depleted...',
'status': 'RESOURCE_EXHAUSTED'}}
app.orchestration.session - WARNING - [BUFFER] Hidden word optimization failed: 429 ...
app.orchestration.session - ERROR - [BUFFER] Mastery enrichmenter failed: 429 ...
app.api.routes.internal - ERROR - [BUFFER-BG] Buffer generation failed: 429 ...
Consequence: the main extraction (which runs on Anthropic and does work) still produced a first-pass set of labels — including a contradictory “regular verbs” tag on a stem-changer — but the cleanup/merge step that would have removed the contradiction never executed, because its Gemini call 429-failed. Same story for the background buffer that pre-builds upcoming turns: it degraded to a partial result with empty teaching_point fields. The lesson text the learner saw was the uncorrected draft.
The project already had a funded Google Cloud (Vertex AI) path — same google-genai SDK, billed on the live GCP project instead of the dead AI-Studio key. Transcription & grading were already migrated; the component-extraction path was not. We switched it, using gemini-3.5-flash (served on Vertex’s global location):
- self._gemini_client = genai.Client(api_key=self.api_key) # AI-Studio key → 429 - GEMINI_MODEL = "gemini-3-flash-preview" + from app.adapters.ai.vertex_genai import vertex_genai_client + self._gemini_client = vertex_genai_client() # funded Vertex AI + GEMINI_MODEL = "gemini-3.5-flash" # Vertex-served on the global location
Result: the 429 storm stops, the dedup/merge + buffer extraction actually run, and teaching points populate.
LLM cleanup can still slip, so a pure-Python guard runs after extraction. If a verb carries both a “regular verbs”/“-Xr verbs” label and a stem-change/irregular label, it drops the “regular” one — a verb that stem-changes is, by definition, not regular. No dictionary; the model’s own stem-change detection is the source of truth.
def _demote_regular_label_on_stem_change(requirements, hidden_word):
has_stem_change = any(stem_marker in label for label in pattern_labels)
if not has_stem_change: return
for req in pattern_reqs:
if is_regular_conjugation(req) and not is_stem_change(req):
requirements.remove(req) # drop the contradictory "regular" label
Upstream, the phrase generator was also hardened so the problem is less likely to arise in the first place:
Every change is gated by an automatic regression check (a real voice turn must return 200 and keep the answer revealed — no “glitch back to the question”) plus a blind multi-grader jury over a 14-point pedagogy rubric.
| Signal | Before | After |
|---|---|---|
| Component-extraction calls | 429 every call (dead key) | Vertex / gemini-3.5-flash |
| pattern_taught_well (Claude · Gemini) | 2 / 2 | 2 / 1 (was 2 / 2) |
| Contradictory “regular + stem-change” label | present across stories | 6-turn clean run captured |
| Regression guard (voice 200 + answer stays) | PASS | PASS |
Post-fix jury (2026-06-24 11:35:13 UTC): pattern_taught_well = 2 (Claude) / 1 (Gemini), up from 2 / 2 baseline. Captured 6 turns on a quiet box. Run: jury-20260624-113034.
None of this was hand-spotted. LingoClaw runs a two-tier self-improvement loop:
The jury kept flagging pattern_taught_well = 2 across stories; that recurrence is what pointed at a single structural cause rather than 15 separate typos.