Skip to main content
The Lark TypeScript Generator takes Lark grammar files as input and generates standalone TypeScript parsers using the LALR algorithm.

Features

  • Parses .lark grammar files with full Lark compatibility
  • Import Resolution: Automatically resolves %import statements for common Lark modules
  • Generates efficient LALR(1) parsers in TypeScript
  • Alternative Ordering: Respects the order of alternatives in grammar rules (first match wins)
  • Automatic AST Flattening: Rules prefixed with ? are automatically inlined when they have one child
  • Token Filtering: Automatically removes literal tokens and operators from the AST
  • Zero runtime dependencies

Supported Grammar Features

  • Rules and terminals
  • EBNF operators (?, *, +)
  • Rule modifiers (?, _, !)
  • Aliases for alternatives
  • String and regex literals
  • Priorities
  • Ignored patterns
  • Import statements (%import common.CNAME, etc.)

Installation

cd tools/lark-ts-generator
npm install
npm run build

Usage

Command Line

# Generate a parser from a grammar file
node dist/cli.js <grammar-file> -o <output-file>

# Or use the shell script
./lark-ts-gen <grammar-file> -o <output-file>

# Example: Generate parser from the formulas grammar
node dist/cli.js ../../server/apps/calculations/private/grammar/formulas.lark -o formula-parser.ts

Global Installation

# From the lark-ts-generator directory
npm link

# Now you can use it from anywhere
lark-ts-gen my-grammar.lark -o my-parser.ts

Import Resolution

The parser generator supports automatic import resolution for Lark’s standard common module:
%import common.CNAME
%import common.DECIMAL
%import common.ESCAPED_STRING
%import common.SIGNED_INT

start: expr
// ... rest of grammar

Supported Common Module Terminals

TerminalPatternDescription
CNAME/[a-zA-Z_][a-zA-Z0-9_]*/C-style identifiers
DECIMAL/[+-]?([0-9]+\.[0-9]*|\.[0-9]+)/Decimal numbers
ESCAPED_STRING/"(?:[^"\\]|\\.)* "/Quoted strings with escapes
SIGNED_INT/[+-]?[0-9]+/Signed integers
NUMBER/[0-9]+/Unsigned integers
WORD/[a-zA-Z]+/Word tokens
WS/[ \t\f\r\n]+/Whitespace
WS_INLINE/[ \t\f]+/Inline whitespace
To add new terminals, edit grammars/common.lark.

Project Structure

tools/lark-ts-generator/
├── src/
│   ├── types.ts               # Type definitions
│   ├── import-resolver.ts     # Dynamic import resolution
│   ├── grammar/
│   │   ├── lexer.ts          # Lexer for .lark files
│   │   └── parser.ts         # Parser for .lark files
│   ├── parser/
│   │   └── lalr.ts           # LALR table generator
│   ├── generator/
│   │   └── codegen.ts        # TypeScript code generator
│   ├── cli.ts                # Command-line interface
│   └── index.ts              # Main entry point
├── grammars/
│   └── common.lark           # Standard common module
├── tests/
└── dist/                      # Compiled output

Key Features

Alternative Ordering

The parser correctly implements alternative precedence following Lark’s behavior. When multiple alternatives could match, the first alternative listed wins.
?array: "[]" -> empty_array 
    | "[" ESCAPED_STRING ("," ESCAPED_STRING)* "]" -> string_array
    | "[" SIGNED_INT ("," SIGNED_INT)* "]" -> integer_array  
    | "[" DECIMAL ("," DECIMAL)* "]" -> decimal_array
Results:
  • ["1", "2"]string_array (not integer_array)
  • [1, 2]integer_array (not decimal_array)
  • [1.5, 2.7]decimal_array

Automatic AST Processing

The generated parser automatically handles:
  • Inline Rule Flattening: Rules with ? prefix are flattened when they have one child
  • Token Filtering: Literal tokens are removed from the final AST
  • Alias Handling: Rules with aliases create appropriately named nodes
  • Empty Production Handling: Zero-length productions are handled correctly

Lark Grammar Syntax

Rules

rule_name: expansion1 | expansion2 | expansion3
expr: term | expr "+" term | expr "-" term

Terminals

TERMINAL_NAME: pattern
NUMBER: /[0-9]+/
STRING: /"[^"]*"/

EBNF Operators

OperatorMeaning
?Optional (zero or one)
*Zero or more
+One or more

Rule Modifiers

PrefixEffect
?Inline rule (expand in parent)
_Anonymous rule (hide from tree)
!Keep all tokens

Aliases

expr: expr "+" term -> add
    | expr "-" term -> sub
    | term

Priorities

NUMBER.2: /[0-9]+/     # Higher priority
WORD: /\w+/            # Default priority is 1

Ignored Patterns

%ignore /\s+/           # Ignore whitespace
%ignore /\/\/[^\n]*/    # Ignore comments

Using Generated Parsers

import { parse, Parser, Lexer, Transformer, Visitor } from './generated-parser';

// Parse input
const tree = parse('["test", "array"]');

// Create a transformer to process the AST
class FormulaTransformer extends Transformer<any> {
  string_array(strings: string[]): any {
    return { type: 'string_array', values: strings };
  }
  
  integer_array(numbers: number[]): any {
    return { type: 'integer_array', values: numbers };
  }
}

const transformer = new FormulaTransformer();
const result = transformer.transform(tree);

Feature Support

✅ Fully Supported

  • Rules and terminals
  • EBNF operators (?, *, +)
  • Groups and alternatives
  • Rule modifiers (?, _, !)
  • String literals and regex
  • Terminal priorities
  • Rule aliases
  • %ignore directive
  • Basic imports

⚠️ Partially Supported

  • Simple character ranges ("a".."z")
  • Module imports (limited to predefined modules)

❌ Not Supported

  • Template rules
  • Advanced repetition (item ~ 3, item ~ 2..5)
  • %declare, %override, %extend directives
  • Multiple parsing algorithms (only LALR)
  • Ambiguity handling

Testing

# Generate a parser
node dist/cli.js ../../server/apps/calculations/private/grammar/formulas.lark -o test-parser.ts
npx tsc test-parser.ts

# Run tests
node test-formulas-parser.js

# Test individual formulas  
node tests/parse-formula.js '["test", "array"]'

Development

# Watch mode
npm run dev

# Type checking
npx tsc --noEmit

# Generate and test formula parser
npm run build && node dist/cli.js ../../server/apps/calculations/private/grammar/formulas.lark -o test-parser.ts

Limitations

  • Import resolution currently supports only the common module
  • No support for template rules (parameterized rules)
  • No support for custom lexer callbacks
  • Some advanced Lark features may not be fully implemented