Generic DSL Grammar and Parser


About a year ago I blogged about an idea of an extensible language; http://rogeralsing.com/2009/03/18/an-intentional-extensible-language/ 
Since then, I have been experimenting with this concept quite a bit.

I have now created a more complete grammar which contains hard-coded support for some constructs, e.g. assignments, lambdas, binary and unary operations.
What I ended up with is not a language, but rather a parser for a structured document.
You can think about it like XML, XML gives you structured documents which you can traverse and transform via code.

This is pretty much exactly the same, but the syntax looks like a C derivate rather than a mark up language.

So what’s the point of it?

The idea is to make it possible for developers to create their own textual DSL’s without having to care about any of the gory details of parsing and grammar construction.

A lot of people are currently using XML to define different types of rules or entire scripts, e.g. build scripts or custom business rules.
This works great since it is easy to deal with XML documents via code.

The downside of XML based DSLs is that they are extremely hard to read and edit!
(I know that a lot of people disagree with me here)

Let’s say for the sake of the argument that we want to define some custom business pricing rules.
Using XML that could look something like:

<rules>
   <rule product="50050" >
      <condition>
          <greaterthan>
              <leftoperand>
                    <property value="quantity" />
              </leftoperand>
              <rightoperand>
  <integerliteral value="200" />
              </rightoperand>
          </greaterthan>
      </condition>
      <price>
            <add>
  ... etc etc ...
            </add>
      </price>
   </rule>
</rules>

Such XML rule definition can easily bloat from something fairly simple to something completely horrible when more requirements are thrown at it.

Using the generic DSL grammar that I have created, the above rules would look something like this:

product 50050
{
    when (quantity > 200) then (20 - quantity / bonus);

    when ... //more pricing rules for same product
};

Or you could use it to define Google protobuffer messages:

message OrderPlaced
{
  1 Guid MessageId;
  2 Guid CustomerId;
  3 int OrderId;
  4 List(OrderDetail) Details;
};

Or what about defining some CQRS entity definitions?

public entity Customer
{
    private string Name;

   command Rename(string newName)
    {
       require (newName != null);
       Renamed (newName);
    };

    event Renamed(string newName)
    {
        this.Name = newName;
    };
};

How it works, in short:
The grammar is based on a few simple constructs;
Every line is an “statement” and statements are an expression terminated with “;”.

An expression can be things like an assignment,a lambda, a binary operation or a “Chain”.
A “Chain” consists of zero or more constructs that occurs in a whitespace separated sequence.
Members of a chain can be primitive constructs such as primitives, identifiers, bodies {a;b;c;} and tuples (a,b,c).
Since there is only a handful of constructs, it is very easy to traverse and analyze the parsed documents.

If we take the above “customer entity” definition, we would get a structure like:
A statement (chain) containing: public, entity, customer, body.
Where the first 3 elements are identifiers and the last item is a body.
A body is a structure containing zero or more statements.
So the body of the customer entity would consist of 3 statements, the member variable “name”, the command “Rename” and the event “Renamed”.
Each of those statements are chains containing their own sub items.

I am currently cleaning up the parser and I will also add some convenience methods to make it easier to match structures in the resulting document.
So code and more examples will hopefully be up in a few days.

//Roger

M Grammar Vs. Gold Parser


Even though I bashed M Grammar in my last post, I’m sort of starting to get what the fuzz is all about now.

I still claim that writing grammars is hard, and that the M Grammar language itself doesn’t do much to change this.

But the beauty is not in the parser nor the syntax, it’s in the tools.

The sweet spot of M Grammar is the Intellipad editor shipped with the PDC Bits.

Intellipad, unlike the editors for most other parsers, will give you real time feedback on your progress.
You can alter your grammar and see how this affects the parse tree for the given input.

You can also annotate your grammar with syntax highlighting hints and thus let you see exactly how the parser handles your input text.
Intellipad will aslo show you where your grammar suffers from ambiguity by underlining the bad parts with red squigglies.

In Gold Parser which is the parser framework that I have used the most, you will have to compile your grammar and hold your thumbs that there is no ambiguity in the syntax.

The grammar compilation process in GP is quite slow and will only give you some semi obscure feedback on what ambiguous definitions you have.

So I have to admit that Intellipad beats GP big time with its quick and intuitive feedback system.

I haven’t yet played enough with the M Grammar .NET parser to be able to give a fair comparison between Mg and GP when working with parse trees in your code, I will skip this for a later post.

If you have worked with GP before, you won’t have any problems adapting to Mg, the “grammar grammars” are almost identical, with the exception that Mg is slightly more verbose and GP relies more on symbols.

I was able to port a GP grammar to Mg in just a few minutes.
The ported grammar is the “GP. Simple” grammar.
You can find the original GP grammar definition here.
And the converted Mg grammar definition
here.

At a first glance the Mg grammar might look much more bloated, there are two reasons for this:
1) There are currently no predefined char sets in Mg (AFAIK)
2) The Mg version also contains syntax highlight annotations.

A screen shot of the Intellipad input pane with the “GP. Simple” grammar highlighted is seen here:

dslhighlight

By just looking at the input pane when editing my grammar I can assert that my grammar is somewhat correct, I do not have to analyze the parse tree every time I make a change to the grammar.

So in short; Writing grammars are still hard , M Grammar is a pretty standard EBNF engine, but Mg’s tool support beats GP’s toolsupport..

//Roger

CIL – Compiler construction


I’ve created a little sample on how to make your own .NET compiler.
The compiler uses Gold parser for parsing and Reflection.Emit to generate the compiled .exe file.

Initially I intended to make a sample on how to use Gold parser to parse and then compile Linq expressions, thus the name GoldLinq, however, Linq have now been replaced with Reflection.Emit.

Links:
My compiler source: http://www.puzzleframework.com/Roger/GoldLinq.zip
(C# VS.NET 2008 solution)

Gold parser: http://www.devincook.com/goldparser/
Grammar: http://www.devincook.com/goldparser/grammars/index.htm

How it works:

  • Gold parser – Calitha engine is used to parse the source code into a parse tree
  • The parse tree is transformed into a typed AST
  • The AST is verified using visitor pattern, the verifier handles type inferrence and auto casts.
  • The AST is optimized using visitor pattern, the optimizer replaces constant expressions and statements.
  • The AST is compiled into CIL/MSIL using visitor pattern.
  • If successful, the compiler will generate a file called “output.exe” in the same folder as the compiler

Samples:

Hello world 

display 'Hello World!'

 Have a nice day:

Display 'What is your name?' Read Name 
Display 'Hello, ' & Name & '. Have a nice day!'

Blastoff: 

assign n = 10 
while n >= 1 do 
    display n 
    assign n = n - 1 
end 
display 'Blast off!'

Miles and kilometers: 

Display 
'Do you want to convert 1) Miles to Kilometers or 2) Kilometers to Miles?' 
Read Choice        

if Choice == 1 then 
    Display 'Please enter the number of miles' Read Miles 
    Display Miles & ' Miles = ' & (Miles * 1.609)  & ' Kilometers' 
else 
    Display 'Please enter the number of kilometers' Read Kilometers 
    Display Kilometers & ' Kilometers = ' & (Kilometers / 1.609)  & ' Miles' 
end

Secret number: 

Assign SecretNumber = 64 

Display 'Please guess the secret number' 
Read Guess          

While Guess <> SecretNumber Do 
    If Guess < SecretNumber Then 
        Display 'Your guess was too LOW. Try again.' Read Guess 
    End     
If Guess > SecretNumber Then 
        Display 'Your guess was too HIGH. Try again.' Read Guess 
    End 
End     

Display 'Correct!'

How to compile the samples:

First you need a source file to compile, just create a normal textfile and paste one of the samples above into it.
Once you have the code file you can compile it using the compiler:

c:whatever\bin\debug> GoldSample.exe mysource.txt

When the compiler is done, you can start the compiled application with:

c:\whatever\bin\debug> output.exe

The code is somewhat too big to cover in a blog post, so if you have any questions feel free to ask in the comment section.

//Roger