1. Overview
The plain-text AICML syntax is human-friendly, but production systems should consume a consistent JSON graph. The structures below are the canonical representation—emit them from every parser, no matter the stack.
2. Canonical JSON AST
Interfaces are shown in TypeScript notation; mirror them as dataclasses, records, or POCOs in your language.
2.1 Top-level document
interface AicmlDocument {
page: Page;
meta: Meta[];
sections: Section[];
entities: Entity[];
dicts: Dict[];
sets: DataSet[];
queries: Query[];
}
2.2 Page & meta
interface Page {
id: string;
lang?: string;
title?: string;
type?: string; // article | docs | landing | faq | ...
}
interface Meta {
name: string;
value: string;
}
2.3 Sections & blocks
interface Section {
id: string;
type?: string;
importance?: "high" | "normal" | "low";
level?: number;
order?: number;
parentId?: string | null;
sets?: string[];
entities?: string[];
blocks: SectionBlock[];
}
type SectionBlock =
| TitleBlock
| ParagraphBlock
| ListBlock
| NoteBlock
| ExampleBlock
| QaBlock
| CodeBlock
| MediaBlock;
interface TitleBlock { kind: "title"; text: string; }
interface ParagraphBlock { kind: "paragraph"; text: string; }
interface ListBlock {
kind: "list";
style: "ordered" | "unordered";
items: ListItem[];
}
interface ListItem {
order?: number;
text: string;
}
interface NoteBlock {
kind: "note";
noteType: "info" | "warning" | "tip";
importance?: "high" | "normal" | "low";
blocks: ParagraphBlock[]; // extend with lists/code if needed
}
interface ExampleBlock {
kind: "example";
blocks: Array;
}
interface QaBlock {
kind: "qa";
question: string;
answer: string | string[];
}
interface CodeBlock {
kind: "code";
lang?: string;
code: string;
}
interface MediaBlock {
kind: "media";
mediaType: "image" | "video" | "audio" | "diagram";
src: string;
alt?: string;
title?: string;
}
2.4 Entities
interface Entity {
id: string;
kind: string; // product | book | event | ...
name?: string;
order?: number;
parentId?: string | null;
role?: string;
properties: Record; // dotted semantic keys
}
2.5 Dict / Set / Row
interface DictField {
alias: string; // pa
path: string; // price.amount
}
interface Dict {
id: string;
kind?: string;
fields: DictField[];
}
interface DataRow {
values: Record; // alias -> value
}
interface DataSet {
id: string;
dictId: string;
kind?: string;
description?: string;
rows: DataRow[];
}
2.6 Queries
type SortDirection = "asc" | "desc";
interface SortSpec {
field: string;
direction: SortDirection;
}
interface Query {
id: string;
fromSetId?: string;
fromEntityId?: string;
usingDictId?: string;
where?: string;
select?: string[];
sort?: SortSpec[];
limit?: number;
}
3. Parser state machine
3.1 Token model
- Headers:
^[A-Z]+:(PAGE, SECTION, ROW, QUERY, ...). - Key/value lines:
key: valueinside a block. - Paragraph and code blocks buffer plain text until the next header.
- Whitespace-only lines inside text blocks should be preserved for CODE, collapsed for P.
3.2 State loop
state = {
currentBlock: null,
currentSection: null,
currentList: null,
currentDict: null,
currentSet: null,
currentRow: null,
currentQuery: null,
paragraphLines: [],
codeLang: null,
codeLines: []
}
for each line in file:
if matchesHeader(line):
flushCurrentBlock()
startBlock(headerName)
else:
appendToBlock(line)
flushCurrentBlock() // handle final block
3.3 Flush semantics & helpers
- P: join
paragraphLines, trim, push ParagraphBlock. - CODE: join
codeLineswith\n, attachcodeLang. - LIST/ITEM: instantiate ListBlock when LIST starts; each ITEM appends immediately.
- NOTE/EXAMPLE/QA: maintain temporary objects just like SECTION; push when a new header begins.
- DICT/SET/ROW: DICT goes to
doc.dicts; SET goes todoc.sets; ROWs append tocurrentSet.rows. - QUERY: parse
selectinto arrays (split commas) andsortintoSortSpectuples.
4. Language skeletons
Each sample references the shared logic above; expand the model classes per Section 2.
4.1 JavaScript / React
const HEADER_RE = /^([A-Z]+):\s*$/;
const KV_RE = /^\s*([a-zA-Z0-9_.]+):\s*(.*)$/;
export function parseAicml(text) {
const lines = text.split(/\r?\n/);
const doc = { page: { id: "" }, meta: [], sections: [], entities: [], dicts: [], sets: [], queries: [] };
let currentBlock = null;
let currentSection = null;
let paragraph = [];
let codeLang;
let codeLines = [];
// ...holders for list/note/example/dict/set/row/query/entity/meta
const flush = () => {
if (currentBlock === "P" && currentSection) {
const text = paragraph.join(" ").trim();
if (text) currentSection.blocks.push({ kind: "paragraph", text });
paragraph = [];
}
if (currentBlock === "CODE" && currentSection) {
currentSection.blocks.push({ kind: "code", lang: codeLang, code: codeLines.join("\n") });
codeLang = undefined;
codeLines = [];
}
if (currentBlock === "SECTION" && currentSection) {
doc.sections.push(currentSection);
currentSection = null;
}
// flush DICT, SET, ROW, QUERY, ENTITY, NOTE, EXAMPLE, QA as needed
currentBlock = null;
};
for (const rawLine of lines) {
const line = rawLine.trimEnd();
const header = line.match(HEADER_RE);
if (header) {
flush();
currentBlock = header[1];
if (currentBlock === "SECTION") currentSection = { id: "", blocks: [] };
if (currentBlock === "P") paragraph = [];
if (currentBlock === "CODE") { codeLang = undefined; codeLines = []; }
// initialize LIST, NOTE, EXAMPLE, QA, DICT, SET, ROW, QUERY, ENTITY, META etc.
continue;
}
const kv = line.match(KV_RE);
if (currentBlock === "PAGE" && kv) { doc.page[kv[1]] = kv[2]; continue; }
if (currentBlock === "SECTION" && kv && currentSection) {
currentSection[kv[1]] = isFinite(+kv[2]) ? Number(kv[2]) : kv[2];
continue;
}
if (currentBlock === "P") { paragraph.push(line.trim()); continue; }
if (currentBlock === "CODE") {
if (kv && kv[1] === "lang") codeLang = kv[2];
else codeLines.push(rawLine);
continue;
}
// handle META, LIST, ITEM, NOTE, ENTITY, DICT, SET, ROW, QUERY with similar guards
}
flush();
return doc;
}
// React usage: const ast = useMemo(() => parseAicml(raw), [raw]);
4.2 Python
HEADER_RE = re.compile(r'^([A-Z]+):\s*$')
KV_RE = re.compile(r'^\s*([a-zA-Z0-9_.]+):\s*(.*)$')
@dataclass
class Page:
id: str = ""
lang: str | None = None
title: str | None = None
type: str | None = None
@dataclass
class AicmlDocument:
page: Page
meta: list[Meta] = field(default_factory=list)
sections: list[Section] = field(default_factory=list)
entities: list[Entity] = field(default_factory=list)
dicts: list[Dict] = field(default_factory=list)
sets: list[DataSet] = field(default_factory=list)
queries: list[Query] = field(default_factory=list)
def parse_aicml(text: str) -> AicmlDocument:
doc = AicmlDocument(page=Page())
current_block = None
current_section = None
paragraph: list[str] = []
code_lines: list[str] = []
code_lang: str | None = None
# initialize holders for dict/set/row/query/list/note/example/qa/meta/entity
def flush():
nonlocal current_block, paragraph, code_lines, code_lang, current_section
if current_block == "P" and current_section:
text = " ".join(paragraph).strip()
if text:
current_section.blocks.append(ParagraphBlock(kind="paragraph", text=text))
paragraph = []
if current_block == "CODE" and current_section:
current_section.blocks.append(CodeBlock(kind="code", lang=code_lang, code="\n".join(code_lines)))
code_lines, code_lang = [], None
if current_block == "SECTION" and current_section:
doc.sections.append(current_section)
current_section = None
# flush DICT/SET/ROW/QUERY/META/ENTITY/etc.
current_block = None
for raw in text.splitlines():
m_header = HEADER_RE.match(raw)
if m_header:
flush()
current_block = m_header.group(1)
if current_block == "SECTION":
current_section = Section(id="", blocks=[])
elif current_block == "P":
paragraph = []
elif current_block == "CODE":
code_lines, code_lang = [], None
# init other block objects
continue
m_kv = KV_RE.match(raw)
if current_block == "PAGE" and m_kv:
setattr(doc.page, m_kv.group(1), m_kv.group(2))
continue
if current_block == "SECTION" and current_section and m_kv:
setattr(current_section, m_kv.group(1).replace('.', '_'), coerce(m_kv.group(2)))
continue
if current_block == "P":
paragraph.append(raw.strip())
continue
if current_block == "CODE":
if m_kv and m_kv.group(1) == "lang":
code_lang = m_kv.group(2).strip()
else:
code_lines.append(raw)
continue
# handle remaining blocks here
flush()
return doc
4.3 C#
public sealed class AicmlParser {
private static readonly Regex Header = new("^([A-Z]+):\\s*$", RegexOptions.Compiled);
private static readonly Regex Kv = new("^\\s*([a-zA-Z0-9_.]+):\\s*(.*)$", RegexOptions.Compiled);
public AicmlDocument Parse(string text) {
var doc = new AicmlDocument();
var lines = text.Split(new[] { "\r\n", "\n" }, StringSplitOptions.None);
string currentBlock = null;
Section currentSection = null;
var paragraph = new List();
var codeLines = new List();
string codeLang = null;
// holders for dict/set/row/query/list/note/example/qa/meta/entity
void Flush() {
if (currentBlock == "P" && currentSection != null) {
var textValue = string.Join(" ", paragraph).Trim();
if (textValue.Length > 0)
currentSection.Blocks.Add(new ParagraphBlock { Kind = "paragraph", Text = textValue });
paragraph.Clear();
}
if (currentBlock == "CODE" && currentSection != null) {
currentSection.Blocks.Add(new CodeBlock { Kind = "code", Lang = codeLang, Code = string.Join("\n", codeLines) });
codeLines.Clear();
codeLang = null;
}
if (currentBlock == "SECTION" && currentSection != null) {
doc.Sections.Add(currentSection);
currentSection = null;
}
// flush DICT/SET/ROW/QUERY/META/ENTITY/etc.
currentBlock = null;
}
foreach (var raw in lines) {
var line = raw.TrimEnd();
var mHeader = Header.Match(line);
if (mHeader.Success) {
Flush();
currentBlock = mHeader.Groups[1].Value;
if (currentBlock == "SECTION") currentSection = new Section { Blocks = new List() };
else if (currentBlock == "P") paragraph.Clear();
else if (currentBlock == "CODE") { codeLines.Clear(); codeLang = null; }
// init other block objects
continue;
}
var kv = Kv.Match(line);
if (currentBlock == "PAGE" && kv.Success) {
doc.Page.Assign(kv.Groups[1].Value, kv.Groups[2].Value);
continue;
}
if (currentBlock == "SECTION" && currentSection != null && kv.Success) {
currentSection.Assign(kv.Groups[1].Value, kv.Groups[2].Value);
continue;
}
if (currentBlock == "P") { paragraph.Add(line.Trim()); continue; }
if (currentBlock == "CODE") {
if (kv.Success && kv.Groups[1].Value == "lang") codeLang = kv.Groups[2].Value;
else codeLines.Add(raw);
continue;
}
// mirror logic for other block types
}
Flush();
return doc;
}
}
4.4 Java
public final class AicmlParser {
private static final Pattern HEADER = Pattern.compile("^([A-Z]+):\\s*$");
private static final Pattern KV = Pattern.compile("^\\s*([a-zA-Z0-9_.]+):\\s*(.*)$");
public AicmlDocument parse(String text) {
AicmlDocument doc = new AicmlDocument();
String[] lines = text.split("\\r?\\n");
String currentBlock = null;
Section currentSection = null;
List paragraph = new ArrayList<>();
List codeLines = new ArrayList<>();
String codeLang = null;
// instantiate dict/set/row/query/list/note/example helpers
Runnable flush = () -> {
if ("P".equals(currentBlock) && currentSection != null) {
String textValue = String.join(" ", paragraph).trim();
if (!textValue.isEmpty()) currentSection.blocks.add(new ParagraphBlock("paragraph", textValue));
paragraph.clear();
}
if ("CODE".equals(currentBlock) && currentSection != null) {
currentSection.blocks.add(new CodeBlock("code", codeLang, String.join("\n", codeLines)));
codeLines.clear();
codeLang = null;
}
if ("SECTION".equals(currentBlock) && currentSection != null) {
doc.sections.add(currentSection);
currentSection = null;
}
// flush DICT/SET/ROW/QUERY/etc.
currentBlock = null;
};
for (String raw : lines) {
String line = raw.trimEnd();
Matcher header = HEADER.matcher(line);
if (header.matches()) {
flush.run();
currentBlock = header.group(1);
if ("SECTION".equals(currentBlock)) currentSection = new Section();
else if ("P".equals(currentBlock)) paragraph.clear();
else if ("CODE".equals(currentBlock)) { codeLines.clear(); codeLang = null; }
// init helpers for other block types
continue;
}
Matcher kv = KV.matcher(line);
if ("PAGE".equals(currentBlock) && kv.matches()) {
doc.page.assign(kv.group(1), kv.group(2));
continue;
}
if ("SECTION".equals(currentBlock) && currentSection != null && kv.matches()) {
currentSection.assign(kv.group(1), kv.group(2));
continue;
}
if ("P".equals(currentBlock)) { paragraph.add(line.trim()); continue; }
if ("CODE".equals(currentBlock)) {
if (kv.matches() && "lang".equals(kv.group(1))) codeLang = kv.group(2);
else codeLines.add(raw);
continue;
}
// handle META, LIST, ITEM, NOTE, ENTITY, DICT, SET, ROW, QUERY
}
flush.run();
return doc;
}
}
4.5 Other runtimes
- Go: structs +
bufio.Scanner; reuse header/KV regex, keep streaming. - Rust: enums/structs +
lines(); prefer manual prefix checks to avoid regex overhead. - Kotlin/Swift: sealed interfaces/data classes map cleanly to
SectionBlockunions. - Whichever language you use, mirror the flush rules exactly so ROW/QUERY boundaries stay deterministic.
5. Next steps
- Publish shared AST typings/packages (npm, PyPI, NuGet, Maven).
- Add schema validation (Zod, Pydantic, FluentValidation, Jakarta Bean Validation) right after parsing.
- Implement an AICQL evaluator module that operates on the JSON AST for analytics and previews.