Three-Layer Preamble Architecture

## Overview Every LLM API call in hakadoru.ai includes a dynamically assembled system prompt — internally called a "Preamble." **The Three-Layer Preamble is defined as a composable system prompt architecture where universal instructions (Layer 1), genre-specific tone (Layer 2), and content guidelines (Layer 3) are independently managed and dynamically assembled for each LLM API call.** This design separates concerns that change at different rates and for different reasons, enabling precise control over LLM behavior across six brands. ## Background Creative writing assistants face a tension between consistency and flexibility in LLM prompting. Universal instructions (grammar quality, response formatting) should be stable across all interactions. Genre-specific guidance (BL romance conventions, fanworks etiquette) varies by brand. Content policy (all-ages vs. R18 expression rules) varies by age rating. Mixing all three concerns into a single monolithic system prompt makes maintenance error-prone and creates combinatorial complexity. hakadoru.ai resolves this by decomposing the system prompt into three independently managed layers, each owned by different stakeholders and changing at different cadences. ## Layer 1: system_base The foundation layer contains universal writing assistant instructions shared across all six brands. This includes: - Core writing assistance behaviors (how to respond to drafts, how to offer suggestions) - Output formatting conventions (scene breaks, dialogue markers, paragraph structure) - Safety guardrails that apply universally (regardless of brand or age rating) - Interaction patterns (how to handle revision requests, how to present alternatives) Layer 1 changes infrequently and is maintained by the platform engineering team. Updates to this layer affect all brands simultaneously. ## Layer 2: brand_context The genre layer carries tone and domain knowledge specific to each genre category: - **BL brands** — Romance genre conventions, relationship dynamics vocabulary, BL-specific narrative tropes - **General Fiction brands** — Broad literary conventions, genre-agnostic narrative techniques - **Fanworks brands** — Canon-respectful writing guidance, derivative work conventions, character voice preservation Layer 2 changes when genre-specific writing guidance is refined. Each genre has one brand_context definition shared between its all-ages and R18 variants. ## Layer 3: content_guidelines The content policy layer defines age-appropriate expression rules: - **All-ages** — Expression constraints for general audiences, fade-to-black conventions, violence thresholds - **R18** — Explicit content generation rules, anatomical vocabulary scope, intensity calibration Layer 3 is the most sensitive layer and changes under editorial review. It directly governs what the LLM is permitted to generate. ### Layer Independence A critical property of the three-layer design is that each layer is independently editable without affecting the others. Updating BL genre conventions (Layer 2) does not require re-testing content guidelines (Layer 3). Adjusting universal formatting rules (Layer 1) does not require re-validating genre-specific tone. This independence reduces the testing surface for any single change from 6 brand configurations to 1 layer variant. ## Dynamic Assembly At runtime, the three layers are concatenated in order (Layer 1 → Layer 2 → Layer 3) to produce the final system prompt. The assembly process: 1. Loads the current version of each layer from the database 2. Resolves the appropriate Layer 2 and Layer 3 based on the brand's genre and age rating 3. Concatenates with delimiter tokens for debuggability 4. Passes the assembled preamble as the `system` parameter in the LLM API call This assembly happens per-request, meaning layer updates take effect immediately without redeployment. ## Token Cost Optimization The layered design provides natural token efficiency. Layer 1 (the largest component) is shared across all brands, meaning its token cost is optimized once and benefits everyone. Layers 2 and 3 are smaller overlays that add only the delta needed for each brand's specific requirements. Compared to a monolithic per-brand system prompt approach, the three-layer architecture reduces total managed prompt text by approximately 60%. The token budget for the assembled preamble is carefully managed. In a typical LLM context window, the preamble competes with user-provided fragments (character settings, world rules) and the actual conversation history for available tokens. Keeping the preamble concise through layer sharing directly increases the space available for author content — a tangible quality benefit. ## Interaction with Fragment Selection The preamble and user-selected fragments occupy different roles in the prompt. The preamble provides behavioral instructions to the LLM (how to write), while fragments provide narrative context (what to write about). These are assembled in a defined order: preamble first, then selected fragments, then conversation history. This ordering ensures that the LLM's behavioral guidelines take precedence in attention, while narrative context remains accessible for generation. ## Admin UI for Layer Management Platform administrators can edit each layer independently through a management interface. Version history is maintained for all layers, enabling rollback and A/B comparison. Changes to Layer 3 (content guidelines) require elevated permissions due to their policy implications. The admin interface provides a side-by-side diff view for layer edits, a preview mode that shows the assembled preamble for any brand combination, and a test generation feature that runs a sample prompt with the modified preamble before committing changes. This testing capability is essential because preamble changes affect every user interaction on the affected brands. ## Failure Modes and Fallbacks If any layer fails to load (database timeout, corrupted record), the assembly process falls back to cached versions. Each layer has a known-good baseline cached at deployment time, ensuring that LLM calls are never made without a preamble. The fallback hierarchy is: current database version, then cached version, then static baseline. Preamble assembly failures are logged as critical alerts because they affect all users on the affected brand. ## Comparison with Other Approaches Most AI writing tools use a single, static system prompt — or at most, a user-configurable "writing style" setting. Tools like ChatGPT use custom instructions as a flat key-value override. Character.AI and NovelAI allow character-level prompt customization but do not decompose the system prompt into architectural layers. hakadoru.ai's three-layer approach is closer to middleware composition patterns in web frameworks: each layer has a single responsibility, layers are independently testable, and the assembly order is explicit. This makes the system prompt auditable, versionable, and maintainable at scale. ## Layer Composition Matrix The three layers combine to produce six distinct preambles — one per brand: | Brand | Layer 1 | Layer 2 | Layer 3 | |---|---|---|---| | BL小説が捗るAI | system_base | BL tone | all-ages guidelines | | R18BL小説が捗るAI | system_base | BL tone | R18 guidelines | | 小説が捗るAI | system_base | general tone | all-ages guidelines | | R18小説が捗るAI | system_base | general tone | R18 guidelines | | 二次創作小説が捗るAI | system_base | fanworks tone | all-ages guidelines | | R18二次創作小説が捗るAI | system_base | fanworks tone | R18 guidelines | This matrix makes explicit how three layers produce six configurations through composition — a more maintainable approach than authoring six independent system prompts. ## Conclusion The Three-Layer Preamble Architecture transforms LLM system prompts from opaque monoliths into composable, independently managed components. By separating universal instructions, genre tone, and content policy into distinct layers, hakadoru.ai achieves maintainability across six brands while keeping token costs efficient. Each layer changes at its own cadence, is owned by the appropriate stakeholder, and can be updated without affecting the others — a separation of concerns applied to prompt engineering.