Strata DDM

The Strata DDM Manual

Joe Hendrix

The Strata Dialect Definition Mechanism (DDM) is a set of tools for defining intermediate representations (IRs) for a wide variety of languages. The DDM combines features of a parser grammar and a datatype definition language to let developers quickly define intermediate representations that are easy to work with both programmatically and textually. Additionally, the DDM can automatically generate serialization and deserialization code for its dialects to enable distributed analysis across language and system boundaries.

The Strata DDM is implemented in Lean. Strata dialects and programs may be embedded in Lean or in standalone dialect files. In a future release, we will support working with Strata from the command line as well as other languages such as .Net-based language, JVM-based languages, Rust, Python and JavaScript/TypeScript.

Table of Contents

  1. 1. Strata Dialect Definitions
  2. 2. Dialect Language Reference
  3. 3. Lean Integration
  4. 4. Command Line Use

1. Strata Dialect Definitions🔗

Dialects are the core mechanism in Strata used to declare and extend program intermediate representations. A Strata dialect defines new syntactic categories, operations, types, functions and metadata. Strata dialects are composable and can be used to define extensions that can be used in multiple languages. There is a builtin dialect, called Init that provides some basic declarations that automatically are included in all other dialects.

1.1. Strata Dialect Concepts🔗

Each Strata dialect introduces declarations that ultimately define the IR. There are five basic types of declarations:

  • A Syntactic Category declaration represents a concept in a Strata dialect definition such as an expression, type, statement, or binding. They are defined by operators.

  • A Operator declaration defines information that may be stored in a syntactic category, along with how it is textually presented. Operators have arguments that may reference categories or types, and use metadata to define how their arguments affect Strata typechecking.

  • A Type declaration declares a new builtin type for the dialect that all programs in a dialect may refer to. Types declarations do not introduce new syntactic categories in a dialect, but rather extend a dialect Init.Type with new types that may be referred to. This allows Strata to support type variables and polymorphism.

  • A Function declaration introduces a new function into Strata's builtin expression syntactic category Init.Expr. Functions have a user- definable syntax like operators, but can be polymorphic and are type checked after parsing.

  • A Metadata declaration introduces a new attribute that may be attached to other Strata declarations. There are builtin metadata attributes used to control parsing and typechecking in dialects. Dialects may introduce new metadata to provide a mechanism for associating properties with declarations.

1.2. Strata Examples🔗

As a simple example, we define a sequence of progressively more complex Strata dialects with Boolean and arithmetic expressions as well as a simple assertion language. Each example provides just enough detail to help readers understand the example code. The Dialect Language Reference contains additional detail on the commands.

dialect BoolDialect;
// Introduce Boolean type
type BoolType;

// Introduce literals as constants.
fn true_lit : BoolType => "true";
fn false_lit : BoolType => "false";

// Introduce basic Boolean operations.
fn not_expr (a : BoolType) : BoolType => "-" a;
fn and (a : BoolType, b : BoolType) : BoolType => @[prec(10), leftassoc] a " && " b;
fn or (a : BoolType, b : BoolType) : BoolType => @[prec(8), leftassoc] a " || " b;
fn imp (a : BoolType, b : BoolType) : BoolType => @[prec(8), leftassoc] a " ==> " b;

// Introduce equality operations that work for arbitrary types.
// The type is inferred.
fn equal (tp : Type, a : tp, b : tp) : BoolType => @[prec(15)] a " == " b;
fn not_equal (tp : Type, a : tp, b : tp) : BoolType => @[prec(15)] a " != " b;

We can then extend these operations with an Integer type and operations.

dialect Arith;
import BoolDialect;

type Int;
fn neg_expr (a : Int) : Int => "- " a;
fn add_expr (a : Int, b : Int) : Int => @[prec(25), leftassoc] a " + " b;
fn sub_expr (a : Int, b : Int) : Int => @[prec(25), leftassoc] a " - " b;
fn mul_expr (a : Int, b : Int) : Int => @[prec(30), leftassoc] a " * " b;
fn exp_expr (a : Int, b : Int) : Int => @[prec(32), rightassoc] a " ^ " b;

fn le (a : Int, b : Int) : BoolType => @[prec(15)] a " <= " b;
fn lt (a : Int, b : Int) : BoolType => @[prec(15)] a " < " b;
fn ge (a : Int, b : Int) : BoolType => @[prec(15)] a " >= " b;
fn gt (a : Int, b : Int) : BoolType => @[prec(15)] a " > " b;

By itself, these dialects do not define a new language. To define a language, one needs to extends a builtin Command category with new commands. This is done with the following dialect that introduces commands for assertions and defining functions:

dialect AssertLang;
import Arith; // Automatically imports BoolDialect dependency of Arith

// This introduces a new operator into the Command category.
op assert (b : BoolType) : Command => "assert " b ";";

// Introduce a category for arguments.
category ArgDecl;
@[declare(var, tp)]
op decl (var : Ident, tp : Type) : ArgDecl => var " : " tp;

category ArgDecls;
op decls (l : CommaSepBy ArgDecl) : ArgDecls => "(" l ")";

@[declareFn(name, args, tp)]
op defFun (name : Ident, args : ArgDecls, tp : Type, @[scope(args)] value : tp)
  : Command => "def " name args "=" value ";";

// Now that we have binders, introduce quantifiers
fn forall_expr (args : ArgDecls, @[scope(args)] b : BoolType) : BoolType =>
  "forall " args ", " b;
fn exists_expr (args : ArgDecls, @[scope(args)] b : BoolType) : BoolType =>
  "exists " args ", " b;

With these command extensions, we are now ready to use the dialects in programs:

program AssertLang;

assert forall (a : Int), exists (b : Int), b > a;
// Assert Fermat's last theorem
assert forall (a : Int, b : Int, c : Int, n : Int),
         a > 0 && b > 0 && c > 0 && n > 2 ==> a^n + b^n != c^n;

// Introduce function
def addMul(a : Int, b : Int, c : Int) = a * b + c;

assert forall (a : Int, b : Int, c : Int),
        b >= c ==> addMul(a, b, b) >= addMul(a, c, c);

2. Dialect Language Reference🔗

The Strata Dialect Definition Language is defined in the rest of this section. The convention is to use monospace text for concrete syntax in the dialect definition language and italic for identifiers and other expressions introduced as part of a specific dialect.

Each dialect definition must begin with the dialect followed by the name of the dialect.

dialect name;.

Additional commands are defined in the following subsections.

2.1. Importing dialects🔗

Imports bring declarations of another dialect into the current dialect being declared. This includes transitive imports of the dialect being imported.

import ident;

Imports the dialect ident.

2.2. Syntactic Categories🔗

Syntactic categories are introduced by the category declaration:

category name;

Declares a new syntactic category name.

2.3. Operators🔗

Operators extend syntactic categories with additional constructors for data. Each Strata operator has a name that is distinct from other names in the dialect, a list of arguments to the operator, the syntactic category being extended, and a syntax definition describing how to pretty-print the operator.

md op name(id1 : k1, id2 : k2, ...) : cat => syntax;

Declares a new operator name where

  • md is optional metadata (see Metadata).

  • (id1 : k1, id2 : k2, ...) are optional bindings and may be omitted if the operator takes no arguments. Each identifier idN is the name of an argument while the kN annotation may refer to a syntax category or type.

  • cat is the syntax category that is extended by this operator.

  • syntax is the syntax definition.

2.3.1. Examples🔗

A simple operator is an assert operator that takes a Boolean expression and extends a hypothetical Statement category.

op assert (b : BoolType) : Statement => "assert" b ";";

For expressions, Strata supports polymorphic types. An example operator that uses this is a polymorphic assignment operator in the Strata Core statement dialect.

op assign (tp : Type, v : Ident, e : tp) : Statement => v " := " e ";";

As an example, the following introduces an operator const that takes a single binding value as argument and produce a Command. The scope and prec metadata attributes are described in the Metadata.

@[scope(b)]
op const (b : Binding) : Command => @[prec(10)] "const " b ";";

2.4. Types and Expressions🔗

As mentioned, the builtin Init.Type and Init.Expr categories are not extended using the op declaration. Instead one introduces new types with the type declaration and extends the expression language with new functions using the fn declaration. The builtin Init.Type and Init.Expr categories have some special support to simplify the creation of typed IRs in Strata.

  • Strata supports type polymorphic functions and operations and has a support for type inference that allows it to automatically infer type arguments as long as an expression argument uses that type.

  • Unlike categories, user types may contain parameters and parameters may themselves be variables. This allows more general functions than can be supported by operators such as a polymorphic length function over lists.

  • Types use a standardized syntax that do not require syntax definitions for each type.

  • After parsing, expressions are type checked to ensure that the syntax is well formed.

2.4.1. Types🔗

type name(id1 : Type, id2 : Type, ...);

Declares a new type with optional parameters with names given by the identifiers id1, id2, ...

The code below declares a type BoolType and a polymorphic Map type.

type BoolType;
type Map (dom : Type, range : Type);  -- Ex. Map Nat BoolType

2.4.2. Expressions🔗

metadata fn name(id1 : k1, id2 : k2, ...) : type => syntax;

This introduces a new function

As an example, the code below introduces a polymorphic if expression.

fn if (tp : Type, c : BoolType, t : tp, f : tp) : tp =>
   "if " c " then " t "else " f;

2.5. Metadata🔗

The Strata Init dialect provides a builtin Init.Metadata category that allows metadata to be declared and attached to other declarations in dialects. Predefined metadata attributes are used in dialect definitions for defining precedence and type checking, but additional metadata attributes can be declared in dialects to build new capabilities on top of Strata. The goal of this is to provide a user-controllable extension mechanism to Strata dialects so that dialects themselves may be repurposed to new use cases.

metadata name (id1 : tp1, id2 : tp2, ... );

Declares a new metadata operation with the given name.

The type annotations in tp1 are currently restricted to Ident (for referencing arguments), natural numbers (Num) and optional versions of each (Option Ident and Option Num). More general support for metadata will be added in a future release.

2.5.1. Syntax Definitions🔗

How operations and functions are represented in Strata's textual format depends on syntax definitions. Each syntax definition is a sequence of tokens that describe the syntax of the operator or function.

  • A string literal such as "assert " outputs the literal text as a token. Spacing should be used to ensure whitespace appears after a token.

  • An argument name arg indicates the associated argument to the function or operator appears in output. The precedence can be set with the syntax arg:prec.

  • For multiline declarations, the function indent(n, tokens) may be used. The tokens will appear indented when they appear on new lines.

There are three metadata annotations in the Strata dialect that may affect precedence in syntax definitions.

metadata prec(p : Nat); -- Controls the precedence of the outermost operator
metadata leftassoc; -- Indicates operator is left-associative
metadata rightassoc; -- Indicates operator is right associative.

As an example, a right associative implies operator can be defined with the syntax:

fn implies (p : Pred, q : Pred) : Pred => @[prec(20), rightassoc] p " ==> " q;

Without the rightassoc operator, this is equivalent to the definition:

fn implies (p : Pred, q : Pred) : Pred => @[prec(20)] p:20 " ==> " q:19;

2.6. Parsing, Type Checking and Pretty Printings🔗

When parsing an operation or expression from source, there is an implicit variable context that contains bindings for variables in scope. Operators may transform this context while expressions are evaluated within a context. The changes to the context are controlled via metadata, and the Strata dialect makes three declarations for affecting this.

-- Specify result scope of argument used to evaluate
-- argument or following code.
metadata scope(name : Ident);
-- Declares a new variable in the resulting scope.
metadata declare(name : Ident, type : TypeOrCat);
-- Declares a function in the resulting scope with name,
-- arguments, and return type determined by the given arguments.
metadata declareFn(name : Ident, args : Ident, type : Ident);
-- Marks a list argument as requiring at least one element.
metadata nonempty;

As an example, Strata Core's variable declaration syntax can be defined in Strata using the following code:

category Decl;
-- Decl represents a single variable declaration that extends the context.
@[declare(v, tp)]
op var (v : Ident, tp : Type) : Decl => v ":" tp;

category DeclList;
@[scope(d)]
op declAtom (d : Decl) : DeclList => d;
@[scope(d)]
op declPush (dl : DeclList, @[scope(dl)] d : Decl) : DeclList => dl "," d;

category Statement;
@[scope(dl)]
op varStatement (dl : DeclList) : Statement => "var " dl ";";

2.6.1. Polymorphic Type Variables🔗

The @[declareTVar] annotation allows polymorphic function declarations where type parameters (like <a, b>) need to be in scope when parsing parameter types and return types. For example, function declarations in Strata.Core are defined as the following:

category TypeVar;
@[declareTVar(name)]
op type_var (name : Ident) : TypeVar => name;

category TypeArgs;
@[scope(args)]
op type_args (args : CommaSepBy TypeVar) : TypeArgs => "<" args ">";

@[declareFn(name, b, r)]
op command_fndecl (name : Ident,
                   typeArgs : Option TypeArgs,
                   @[scope(typeArgs)] b : Bindings,
                   @[scope(typeArgs)] r : Type) : Command =>
  "function " name typeArgs b ":" r ";";

This allows parsing declarations like function identity<a>(x: a): a.

2.7. The Init dialect🔗

The Init dialect introduces a few syntactic categories that cannot currently be defined in user definable dialects.

  • Init.Type and Init.Expr represent types and expressions in Strata. Unlike other categories, these are extended through the specialized type and fn commands rather than op declarations so that expressions can be type checked after parsing. See Types and Expressions for more details.

  • Syntactic categories that in dialects other than Init currently cannot take additional parameters. This support will be added in a future update, but to help users, the Init declares a few builtin parametric categories:

    • Init.Option c denotes an optional value of c. Syntactically, either c may be read in or the empty string.

    • Init.Seq c denotes a sequence of values with category c. Each value of c is separated by whitespace. When formatted, values are concatenated with no separator.

    • Init.CommaSepBy c denotes a comma-separated list of values of type c. When formatted, values are separated by , (comma followed by space).

    • Init.SpaceSepBy c denotes a space-separated list of values of type c. Parsing is identical to Init.Seq, but when formatted, values are separated by a single space.

    • Init.SpacePrefixSepBy c denotes a space-prefix-separated list of values of type c. Parsing is identical to Init.Seq, but when formatted, each value is prefixed with a space.

    All list categories (CommaSepBy, SpaceSepBy, SpacePrefixSepBy, Seq) can be marked with the @[nonempty] metadata attribute to require at least one element during parsing. For example: @[nonempty] items : SpaceSepBy Item will reject empty lists with a parse error.

  • Parsing for primitive literals and identifiers cannot be directly in syntax definitions. To accommodate this, the Init dialect introduces the syntactic categories for this:

    • Init.Ident represents identifiers. These are alphanumeric sequences that start with a letter that are not otherwise used as keywords in a dialect.

    • Init.Str represents string literals. These are delimited by double quotes and use escaping for special characters.

    • Init.Num represents numeric natural number literals.

2.8. Datatypes🔗

The DDM has special support for defining dialects with algebraic datatypes (ADTs) similar to those found in functional programming languages. Datatypes allow one to define custom types with multiple constructors, each of which can have zero or more fields (constructor arguments).

Datatypes differ from other language constructs (e.g. types, operators), since they define several operations simultaneously. For example, an SMT datatype declaration defines (in addition to the type itself and the constructors) tester functions to identify to which constructor an instance belongs and accessor functions to extract the fields of a constructor. Dafny datatypes additionally produce an ordering relation, while Lean inductive types produce eliminator instances defining induction principles. The DDM enables automatic creation of (a subset of) these auxiliary functions via a configurable function template system.

2.8.1. Example🔗

In the Strata Core dialect, the auxiliary functions are testers and accessors. That is, one can define the following datatype in Strata Core:

datatype IntList {
  Nil(),
  Cons(head: int, tail: IntList)
};

This declares a list type with two constructors (Nil and Cons) and two fields (head and tail). The Core dialect automatically generates:

  • Constructors: Nil : IntList and Cons : int -> IntList -> IntList

  • Testers: IntList..isNil : IntList -> bool and IntList..isCons : IntList -> bool

  • Accessors: head : IntList -> int and tail : IntList -> IntList

2.8.2. Defining Datatype Syntax in a Dialect🔗

To support datatypes in a dialect, you must define syntactic categories and operators with appropriate annotations. The Core dialect provides a complete example.

2.8.2.1. Constructor Syntax🔗

Constructors define the variants of a datatype. Constructor fields are specified via Bindings like other function declarations.

category Constructor;
category ConstructorList;

@[constructor(name, fields)]
op constructor_mk (name : Ident, fields : Option (CommaSepBy Binding)) : Constructor =>
  name "(" fields ")";

@[constructorListAtom(c)]
op constructorListAtom (c : Constructor) : ConstructorList => c;

@[constructorListPush(cl, c)]
op constructorListPush (cl : ConstructorList, c : Constructor) : ConstructorList =>
  cl "," c;

The annotations:

  • @[constructor(name, fields)] marks this operation as a constructor definition, where fields is a Bindings argument containing the constructor arguments

  • @[constructorListAtom(c)] marks a single-element constructor list

  • @[constructorListPush(cl, c)] marks an operation that extends a constructor list

2.8.2.2. Datatype Command🔗

The main datatype declaration uses @[declareDatatype] to tie everything together. It is best illustrated with an example (from the Core dialect):

@[declareDatatype(name, typeParams, constructors,
    perConstructor([.datatype, .literal "..is", .constructor], [.datatype], .builtin "bool"),
    perField([.field], [.datatype], .fieldType))]
op command_datatype (name : Ident,
                     typeParams : Option Bindings,
                     @[scopeDatatype(name, typeParams)] constructors : ConstructorList)
  : Command =>
  "datatype " name typeParams " {" constructors "}" ";\n";

@[declareDatatype] declares a datatype command operator given the datatype name, the optional type parameters, the constructor list, and zero or more function templates to expand. Note that the @[scopeDatatype] annotation brings the datatype name and type parameters into scope when parsing the constructors, enabling recursive type references.

2.8.2.3. Function Templates🔗

Function templates specify patterns for generating auxiliary functions from datatype declarations. Currently there are two supported templates: perConstructor generates one function per constructor, while perField generates one function per field (note that the DDM enforces that fields are unique across all constructors in a datatype).

perConstructor(namePattern, paramTypes, returnType)

perField(namePattern, paramTypes, returnType)

Each template has three components:

  • namePattern is an array of name parts: .datatype, .constructor, or .literal "str"

  • paramTypes is an array of type references for parameters

  • returnType is a type reference for the return type

Name patterns consist of the following components:

  • .datatype - Expands to the datatype name

  • .constructor - Expands to the constructor name in perConstructor

  • .field - Expands to the field name in perField

  • .literal "str" - A literal string

Type references consist of the following components:

  • .datatype - The datatype type (e.g., IntList)

  • .fieldType - The type of the current field in perField

  • .builtin "bool" - A built-in type from the dialect

2.8.2.3.1. Example Templates🔗

The Strata Core dialect uses two templates:

The tester template generates IntList..isNil : IntList -> bool:

perConstructor([.datatype, .literal "..is", .constructor], [.datatype], .builtin "bool")

The accessor template generates head : IntList -> int:

perField([.field], [.datatype], .fieldType)

An alternative indexer template could generate IntList$idx$Nil : IntList -> int:

perConstructor([.datatype, .literal "$idx$", .constructor], [.datatype], .builtin "int")

3. Lean Integration🔗

Strata provides two complementary ways to work with dialect ASTs in Lean. The generic AST is a set of dialect-agnostic types that can represent programs in any dialect. It is used by the DDM's parser, pretty-printer, serializer, and other infrastructure that must operate uniformly across dialects. The #strata_gen command takes the opposite approach: given a specific dialect, it generates Lean inductive types with one constructor per operator, enabling direct pattern matching and type-safe construction. Conversion functions (toAst/ofAst) bridge the two representations.

3.1. The Generic AST🔗

The generic AST is defined in Strata.DDM.AST. All types are parameterized by an annotation type α (typically SourceRange for source locations).

The core types are listed below in the same order as the dialect concepts they represent:

  • SyntaxCatF α — a reference to a syntactic category, identified by its qualified name.

  • OperationF α — an application of a dialect operator. Contains the operator's qualified name and an array of arguments.

  • ArgF α — a single argument to an operation. This is a sum type whose variants cover all the kinds of data an operator can receive: nested operations, expressions, types, identifiers, numeric literals, string literals, decimal literals, byte arrays, category references, optional values, and sequences.

  • TypeExprF α — a type expression. Variants cover dialect-defined types (ident), bound and polymorphic type variables (bvar, tvar), global type references (fvar), and function types (arrow).

  • ExprF α — a typed expression. Expressions are represented in curried form: a head (fn for a named dialect function, bvar for a bound variable, or fvar for a free variable) applied to arguments via app. Type arguments found bound and free variables are implicit and omitted.

For the common case where annotations are source locations, Strata defines abbreviations:

abbrev SyntaxCat := SyntaxCatF SourceRange
abbrev Operation := OperationF SourceRange
abbrev Arg       := ArgF SourceRange
abbrev TypeExpr  := TypeExprF SourceRange
abbrev Expr      := ExprF SourceRange

3.1.1. How DDM Concepts Map to the Generic AST🔗

The following table summarizes how the DDM concepts described in earlier sections are represented in the generic AST.

DDM Concept

Generic AST Type

Notes

Syntactic category

SyntaxCatF

Identified by qualified name (e.g. Init.Expr)

Operator application

OperationF

Qualified name + array of ArgF arguments

Type expression

TypeExprF

Concrete types, type variables, function types

Expression

ExprF

Curried: head (fn/bvar/fvar) + app chain

Identifier literal

ArgF.ident

e.g. variable names from Ident arguments

Numeric literal

ArgF.num

From Num arguments

String literal

ArgF.strlit

From Str arguments

CommaSepBy/Seq

ArgF.seq

Array of ArgF with a separator format

Option

ArgF.option

Option (ArgF α)

The generic AST is dialect-agnostic: operators are identified by their QualifiedIdent (a dialect name paired with an operator name) rather than by distinct Lean constructors. This uniformity makes it the natural representation for DDM infrastructure such as the parser, formatter, and serializer, but it means that working with a specific dialect requires string-based matching on operator names.

3.2. The #strata_gen Command🔗

The #strata_gen command bridges Strata dialect definitions and Lean by automatically generating Lean inductive types and conversion functions from a dialect definition. This lets developers work with dialect ASTs as ordinary Lean values — constructing, pattern matching, and transforming them — while retaining the ability to convert back to the generic AST representation for serialization, pretty-printing, and cross-dialect interoperability.

3.2.1. Basic Syntax🔗

import Strata.DDM.Integration.Lean

namespace MyDialect
#strata_gen MyDialect
end MyDialect

#strata_gen requires import Strata.DDM.Integration.Lean and takes a single argument: the name of a dialect that has already been defined (via #dialect ... #end or #load_dialect). The command should typically be placed inside a namespace block so that the generated types and functions are scoped under that namespace.

3.2.1.1. Tracing🔗

To see what #strata_gen generates, enable the Strata.generator trace option immediately before the command:

set_option trace.Strata.generator true in
#strata_gen MyDialect

This prints each generated definition (inductives, toAst, ofAst) and the dependency groups used to order them. For example, running #strata_gen on the AssertLang dialect defined earlier in this manual produces:

[Strata.generator] Generating AssertLangType
[Strata.generator] Generating AssertLangType.toAst
[Strata.generator] Generating AssertLangType.ofAst
[Strata.generator] Generating ArgDecl
[Strata.generator] Generating ArgDecl.toAst
[Strata.generator] Generating ArgDecl.ofAst
[Strata.generator] Generating ArgDecls
[Strata.generator] Generating ArgDecls.toAst
[Strata.generator] Generating ArgDecls.ofAst
[Strata.generator] Generating Expr
[Strata.generator] Generating Expr.toAst
[Strata.generator] Generating Expr.ofAst
[Strata.generator] Generating Command
[Strata.generator] Generating Command.toAst
[Strata.generator] Generating Command.ofAst
[Strata.generator] Declarations group: [Init.Type]
[Strata.generator] Declarations group: [AssertLang.ArgDecl]
[Strata.generator] Declarations group: [AssertLang.ArgDecls]
[Strata.generator] Declarations group: [Init.Expr]
[Strata.generator] Declarations group: [Init.Command]

Each category produces an inductive type, a toAst function, and an ofAst function. The "Declarations group" lines show the topologically sorted groups in the order they are emitted to Lean.

3.2.2. What Gets Generated🔗

For each syntactic category used by a dialect, #strata_gen generates four things:

  1. An inductive type parameterized by an annotation type α.

  2. An Inhabited instance (when possible).

  3. A toAst function that converts from the generated type to Strata's generic AST representation (ExprF, TypeExprF, or OperationF).

  4. An ofAst function that converts from the generic AST back to the generated type (returning OfAstM, an error monad).

Every generated constructor includes an ann : α field that carries source-location or other annotation data through transformations.

3.2.2.1. Determining Which Categories Are Used🔗

Not every category in a dialect's transitive imports needs code generated for it. #strata_gen computes the set of used categories by starting from the categories and types directly declared or referenced in the dialect and following argument dependencies transitively. For example, if a dialect declares only op assert (b : BoolType) : Command => ..., the generator will produce types for Command, Expr (since BoolType is a type in Init.Expr), and Type — but not for categories like Binding that are never referenced.

Primitive categories (Init.Ident, Init.Num, Init.Str, etc.) map directly to Lean types (String, Nat, String, etc.) and do not produce their own inductive definitions.

3.2.2.2. Handling Mutually Recursive Categories🔗

Categories can be mutually recursive. For example:

category MutA;
category MutB;
op opA (b : MutB) : MutA => "opA " b;
op opB (a : MutA) : MutB => "opB " a;

#strata_gen uses Tarjan's algorithm to identify strongly connected components — groups of categories that reference each other. Each group is emitted as a Lean mutual ... end block so that the inductive types, toAst functions, and ofAst functions can refer to one another. Groups are processed in topological order so that a group's dependencies are always defined before the group itself.

3.2.3. Example Walkthrough🔗

Consider a small SQL dialect:

dialect SQL;

op create (name : Ident, c : CommaSepBy Ident) : Command =>
  "CREATE " name "(" c ")" ";\n";

op drop (name : Ident) : Command =>
  "DROP " name ";\n";

category Columns;
op colStar : Columns => "*";
op colList (c : CommaSepBy Ident) : Columns => c;

op select (col : Columns, table : Ident) : Command =>
  "SELECT " col " FROM " table ";\n";

After #strata_gen SQL, the generator produces two inductive types.

Command gets one constructor per operation that targets it:

inductive Command (α : Type) : Type where
  | create (ann : α) (name : Ann String α) (c : Ann (Array String) α) : Command α
  | drop   (ann : α) (name : Ann String α) : Command α
  | select (ann : α) (col : Columns α) (table : Ann String α) : Command α
  deriving Repr

Columns gets its own constructors:

inductive Columns (α : Type) : Type where
  | colStar (ann : α) : Columns α
  | colList (ann : α) (c : Ann (Array String) α) : Columns α
  deriving Repr

Notice that:

  • Ident arguments become Ann String α (a String carrying an annotation).

  • CommaSepBy Ident arguments become Ann (Array String) α.

  • The custom category Columns appears directly as a type in Command.select.

The generated toAst function converts from the dialect-specific type to the generic AST, and ofAst converts back:

-- Convert generated type → generic AST
Command.toAst : {α : Type} → [Inhabited α] → Command α → OperationF α

-- Convert generic AST → generated type
Command.ofAst : {α : Type} → [Inhabited α] → [Repr α]
  → OperationF α → OfAstM (Command α)
3.2.3.1. Expression and Type Categories🔗

Dialects that declare types (via type) or functions (via fn) also get generated Expr and Type inductives. These include builtin constructors in addition to dialect-defined ones. For instance, the Arith dialect's generated Expr type includes:

inductive Expr (α : Type) : Type where
  -- Builtin constructors (always present for Init.Expr)
  | fvar  (ann : α) (idx : Nat) : Expr α
  | bvar  (ann : α) (idx : Nat) : Expr α
  | app   (ann : α) (fn : Expr α) (arg : Expr α) : Expr α
  -- Dialect-defined constructors
  | btrue   (ann : α) : Expr α
  | bfalse  (ann : α) : Expr α
  | intLit  (ann : α) (n : Ann Nat α) : Expr α
  | add     (ann : α) (x : Expr α) (y : Expr α) : Expr α
  | mul     (ann : α) (x : Expr α) (y : Expr α) : Expr α
  -- ... (additional constructors omitted)
  deriving Repr

The fvar, bvar, and app constructors correspond to free variables, bound variables (de Bruijn indices), and function application in Strata's core expression model.

3.2.4. Post-Generation Patterns🔗

After running #strata_gen, there are several common patterns for working with the generated types.

3.2.4.1. Deriving Typeclass Instances🔗

Generated types automatically derive Repr. Other instances can be derived after generation:

deriving instance BEq for Command, Columns

For larger dialects with many categories, this can be done in bulk:

deriving instance BEq for
  SpecConstant, QualifiedIdent, Symbol,
  SMTSort, Term, Command

Support for automatically deriving additional instances during generation will be added in a future release.

3.2.4.2. Verifying Generated Types🔗

Use #print and #check to inspect what was generated:

#print Command    -- Shows inductive definition and constructors
#print Columns
#check Columns.colStar  -- Columns.colStar : {α : Type} → α → Columns α
3.2.4.3. Round-Trip Testing🔗

A simple round-trip test can verify that toAst and ofAst are inverses:

def testRoundTrip (e : Expr Unit) : Bool :=
  match e |> Expr.toAst |> Expr.ofAst with
  | .error _ => false
  | .ok g => e == g

#guard testRoundTrip <| Expr.btrue ()
#guard testRoundTrip <| Expr.app () (.fvar () 0) (.btrue ())
3.2.4.4. Dependent Dialects🔗

When a dialect imports another, #strata_gen automatically resolves the full transitive closure of dependencies. Each invocation generates its own set of types — if ArithChecker imports Arith and uses Init.Expr, it will produce a new ArithChecker.Expr that includes all of Arith's constructors. The two Expr types are isomorphic but distinct:

namespace Arith
#strata_gen Arith      -- Generates Arith.Expr, Arith.ArithType, ...
end Arith

namespace ArithChecker
#strata_gen ArithChecker  -- Generates ArithChecker.Expr, ArithChecker.Command, ...
end ArithChecker
-- ArithChecker.Expr is isomorphic to Arith.Expr (same constructors)
-- but they are separate types.

4. Command Line Use🔗

This will be described once available.