Linqing M Grammar to CIL


Today I will present my first attempt at a M Grammar language: Mg Basic : -)

Why Mg Basic?

This is of course not meant as a serious project where I try to introduce a new language to the community.
The reason I created this project is simply because I wanted to learn more about the new technologies that were presented to us during the PDC conference.
And hopefully a few others can benefit and learn from this too.

For example; this project makes heavy use of Linq Expressions, examples that generally are somewhat hard to find.
(Actually I haven’t found any examples using the new .NET 4 expressions at all)

It also demonstrates how to parse and transform the M Graph parse tree into a real AST with very little effort.
And It contains a few nifty patterns when using M Grammar from .NET.

So what is Mg Basic?

In short;
it is a statically typed language that compiles down to CIL code.
It does not support custom types nor functions.
It is simply a sequential Basic like language intended for demo purposes.

All in all, not very sexy at all, but hopefully somewhat interesting to dissect and play with : -)

Technologies used:

M Grammar
-
Defining the language grammar.

My grammar is largely based on the “Simple” grammar from GoldParser.
However it has been altered quite a bit and also adapted to fit M Grammar

System.DataFlow.DynamicParser
– 
Parsing the input code.

DynamicParser is simply a generic parser for M Grammar files, it will parse the input code based on your grammar and return an ‘M Graph’

MGraphXamlReader
– 
Deserializing the parse tree to my AST

MGraphXamlReader is a deserializer that deserializes an M Graph into a custom object graph using XAML.

.NET 4 Linq.Expressions
Compiling the AST to CIL.

Linq.Expressions is not really new in .NET 4, but they have been greatly extended to support statements and entire code blocks in .NET 4.

Mg Basic features:

Variables:

string myString = "hello world"
int myInteger = 123
decimal myDecimal = 123.456
bool myBool = true

Expressions:

int myInteger = 1+2*3-x*(y+3)
string myString = "hello " + name + "!"
bool myBool = x < y
string conversion = '1 + 2 = ' + (string) (1 + 2)

User interaction:

string name = input
print name

Loops and conditions:

for int i = 1 to 10
print i
next

int i = 0
while i < 10 do
print i
i=i+1
loop

if i > 100 then
print 'i is greater than 100'
else if i > 50 then
print 'i is greater than 50'
else
print 'i is 50 or less'
end

How it works:

Step 1 – Parsing:

The parser will load the “compiled grammar” for Mg Basic.
A compiled grammar is essentially a look up table for a state machine, once the grammar is loaded the parser will know how to parse your input code.

The input code is then passed into the parser which will return an M Graph parse tree (unless there are syntax errors in the input code).

Step 2 – AST resolution:

This step generally involves quite a bit of hand coding or code generation when using other parser frameworks.
But when working with M Grammar this is very easy;
We simply pass the M Graph from the parser into the “MGraphXamlReader”.

The MGraphXamlReader is an open source project from Microsoft hosted on Codeplex, and I hope that it will be shipped with the final release of Oslo / M Grammar.

The MGraphXamlReader will then transform the M Graph into XAML and then deserialize the AST object graph based on that XAML code.

I did have a few problems in this step before I figured out how to solve them.
The M Graph parse tree will contain tokens exactly as the parser captured them.
Lets say that our grammar supports HEX values and we want to map those values to integers in our AST.

This is not supported by M Grammar itself nor by the MGraphXamlReader since you can only map verbatim values, eg. “true” can be mapped to a .NET Boolean “true”, but “0xCAFE” not be mapped to an integer.

So how can you solve it?

I solved it by adding transformer properties to my AST.
My IntegerLiteral AST nodes will have two properties:

string RawValue” and “int Value”.

The grammar will map the hex vale (or any other token that needs to be mapped) to the “string RawValue” property in the AST.
The setter of that property will then send the raw string value to a method that parses the value into the desired representation, in this case an int32 and store the parsed value in the “int Value” property.

This approach can be used for whatever mapping needs you have, eg mapping to enums or deserializing entire objects from string to object using type converters.

So once I figured that out, it was an easy task to implement it.

Another friction point was that you need to supply an “identifier -> type” mapping for the XAML engine.
This mapping is used so that the the M Graph can be deserialized as objects of a type mapped to an identifier.

This was also a simple task to automate, I used a bit of reflection to pull all the non abstract types from my AST namespace and mapped those to identifiers with the same name as the type.

Step 3 – Compiling the AST into CIL code.

This was probably the easiest step to complete.
Normally you would do this type of thing using Code DOM or Reflection.Emit.

I decided to go for Linq.Expressions, mainly because I wanted to learn more about the new features in .NET 4, but also a bit for the hell of it : -)

I uses the old visitor pattern to accomplish this; I have a visitor that visits all my AST nodes and then transform each node into an Linq.Expression which is returned to the parent node.

Once each node is transformed, the root node expression will be placed in an lambda body.
The lambda expression is then compiled to CIL code using the LambdaExpression.Compile method.

In my case I compile it into a standard parameter less “Action” delegate, but you can easily change this to whatever delegate type you want.

By changing the delegate type, you could implement input arguments for the compiled code.
This could be very useful in a true DSL where you might want to pass business objects/data to the script.

Well that’s pretty much it.

So by combining M Grammar with Linq Expressions you can get a DSL with full .NET integration up and running quite fast.
In my case the whole thing took about 10 hours to implement.
(This does however not include the time I had to spend on learning M Grammar syntax and how to avoid ambiguity in the grammar.. there is still alot of friction there)

Resources:

MGraphXamlReader:
http://code.msdn.microsoft.com/oslo/Wiki/View.aspx?title=MGrammarXAMLSample

Mg Basic Source code:
Gold Parser Sample

Mg Basic Grammar:
MG Basic Grammar

Running the demo:

Open up the MgBasic solution in VS.NET 2010 and run it.
The imput code is hard coded into the program.cs, so there is no fancy user interaction going on here.
It is just a demonstration of how to parse and compile the code, not how to build a good user experience ; -)

Also note that this requires the Oslo preview in order to run.

Note 2: There are currently no warnings or error checks at all in the compiler, if you feed it invalid code, it will blow up.

Enjoy!
//Roger

M Grammar Vs. Gold Parser


Even though I bashed M Grammar in my last post, I’m sort of starting to get what the fuzz is all about now.

I still claim that writing grammars is hard, and that the M Grammar language itself doesn’t do much to change this.

But the beauty is not in the parser nor the syntax, it’s in the tools.

The sweet spot of M Grammar is the Intellipad editor shipped with the PDC Bits.

Intellipad, unlike the editors for most other parsers, will give you real time feedback on your progress.
You can alter your grammar and see how this affects the parse tree for the given input.

You can also annotate your grammar with syntax highlighting hints and thus let you see exactly how the parser handles your input text.
Intellipad will aslo show you where your grammar suffers from ambiguity by underlining the bad parts with red squigglies.

In Gold Parser which is the parser framework that I have used the most, you will have to compile your grammar and hold your thumbs that there is no ambiguity in the syntax.

The grammar compilation process in GP is quite slow and will only give you some semi obscure feedback on what ambiguous definitions you have.

So I have to admit that Intellipad beats GP big time with its quick and intuitive feedback system.

I haven’t yet played enough with the M Grammar .NET parser to be able to give a fair comparison between Mg and GP when working with parse trees in your code, I will skip this for a later post.

If you have worked with GP before, you won’t have any problems adapting to Mg, the “grammar grammars” are almost identical, with the exception that Mg is slightly more verbose and GP relies more on symbols.

I was able to port a GP grammar to Mg in just a few minutes.
The ported grammar is the “GP. Simple” grammar.
You can find the original GP grammar definition here.
And the converted Mg grammar definition
here.

At a first glance the Mg grammar might look much more bloated, there are two reasons for this:
1) There are currently no predefined char sets in Mg (AFAIK)
2) The Mg version also contains syntax highlight annotations.

A screen shot of the Intellipad input pane with the “GP. Simple” grammar highlighted is seen here:

dslhighlight

By just looking at the input pane when editing my grammar I can assert that my grammar is somewhat correct, I do not have to analyze the parse tree every time I make a change to the grammar.

So in short; Writing grammars are still hard , M Grammar is a pretty standard EBNF engine, but Mg’s tool support beats GP’s toolsupport..

//Roger

I call B.S. on the Oslo “M” Language


There was alot of hype around the new Oslo “M” language during the PDC.
It was pretty much explained as a new way to let people create their own domain specific languages.

Since I have a bit of fetish for parsing and DSL’s I attended to the “M Grammar” presentation.

They began by explaining that “M” is so easy that everyone and his mother will now be able to create their own DSL.
And ofcourse they had to show some trivial example that actually wasn’t a DSL at all, but merely a data transformer that transformed a textual list of “contacts” into a structured list.

Maybe I’m just stupid, but when I hear “new” and “easy” I don’t really associate that with old-school LEX and YACC BNF grammars.
But MS apparently do.

Just check this out, this is a small snippet of M Grammar definition of the language itself:

  syntax CompilationUnit
       = decls:ModuleDeclaration*
         =>
           id("Microsoft.M.CompilationUnit")
           {
               Modules { decls }
           };
  
  
   syntax ExportDirective
       = "export" members:ParsedIdentifiers ";"
         =>
         id("Microsoft.M.ExportDirective")
         {
             Names { members }
         };  
  
   syntax ImportAlias
       = "as" alias:ParsedIdentifier
         => alias;
   
   syntax ImportDirective
       = "import" imports:ImportModules ";"
         =>
         id("Microsoft.M.ModuleImportDirective")
         {
             Modules { imports }
         }
                 | "import" targetModule:MemberAccessExpression "{" members:ImportMembers "}" ";"
           =>
           id("Microsoft.M.MemberImportDirective")
           {
               ModuleName { targetModule },
               Members { members }
           };
   
   syntax ImportMember
       = member:ParsedIdentifier alias:ImportAlias?
           => id("Microsoft.M.ImportedName")
           {
               Name { member },
               Alias { alias }
           };

The grammars in “M” was essentially a hybrid of old BNF definitions mixed up with functional programming elements.

I’m not saying that their approach was bad, just that it wasn’t really as easy as they wanted it to be.
There are a few gems in it, but it does in no way lower the compexity of defining a grammar in such a way that everyone will be able to create real DSL’s.

Maybe some people will create a few data transformers using this approach, but I don’t expect to see more “real” DSL’s popping up now than we have seen before..

//Roger

Linq Expressions: Assign private fields (.NET 4)


Hold your horses, you might not see this untill 2010

In one of the PDC sessions I heard that the Linq.Expressions namespace have been extended so that it now contains expressions for pretty much everything that the new DLR can do.

Since my old post “Linq Expressions: Access private fields” is by far my most read blog entry, I figured that I have to throw you some more goodies.

So here it is, the code for assigning private fields using pure Linq Expressions:

public static Action<T, I> GetFieldAssigner<T, I>(string fieldName)
{

   ///change this to a recursive call that finds fields in base classes to..)
    var field = typeof(T).GetField (fieldName, BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); 

    ParameterExpression targetObject =
    Expression.Parameter(typeof(T), "target");

    ParameterExpression fieldValue =
    Expression.Parameter(typeof(I), "value");

    var assignment = Expression.AssignField(targetObject, field, fieldValue);

    LambdaExpression lambda =
    Expression.Lambda(typeof(Action<T, I>), assignment, targetObject, fieldValue);

    var compiled = (Action<T, I>)lambda.Compile();
    return compiled;
}

The code works pretty much the same way that my old field reader did except that this one will return an Action delegate that takes two args: target and fieldValue and returns void.

Some sample usage code:

Person roger = .....
var assigner = GetFieldAssigner<Person,string> ("firstName");
assigner(roger, "Roggan");

By using this approach you get a nice performance boost of about 300 times vs. using reflection with FieldInfo.SetValue.
(Assuming that you cache the assigner that is)

Pretty sweet.

BTW. Anyone know if this is available in the currently released DLR ?
In that case you won’t have to wait for .NET 4.

//Roger

The importance of domain modelling


I’m going to dedicate this post for my somewhat unpleasant experience with the US security controls when going to the PDC :-)

As some of you might know, I changed my last name from “Johansson” to “Alsing” when I married my wife. (ye ye I’m whipped).

So, this summer I had to get a new passport with the new name, so I ordered a new one and reported my old one missing (I couldn’t find it fast enough for our booking).
The police officer that handled all of this told me that I didn’t have to care about finding the old one since it had my old name on it.

The PDC booking was completed and the months flew by and suddenly it was finally time to go to the PDC, my first trip to the US.

So off we went, me and 6 colleagues.
16 hours after liftoff from sweden we finally arrived in the LA Airport.
We lined up for the passport controls.
My colleagues just walked though the control without any problems.
And then it was my turn….

Suddenly I heard the lady behind the counter say:
“Sir, have you lost a passport?”

Somewhat surprised I replied “Ummm no I haven’t? or well I have reported my old one missing”

The response to that was: “Step aside sir”

So, I thought I would just have to explain the situation and everything would be OK.

A police officer approaches me: “Follow me sir”

Officer: “So you have lost a passport?”
Me: “Well yeah, I lost my old one”
Continued by me trying to explain about my changed last name and the reason for reporting the old one missing.

Officer: “No no, THIS passport is reported missing”
Me: “Umm no? I have reported my old one missing, this is the new one”
Officer: “NO! this is a stolen passport, this is not your passport”
Me: “uuh, yes it’s my passport”
Officer: “So when did you find it?”
Me: “What? I didn’t find it , it’s my new passport, it’s the old one that is reported missing”
Officer: “No, this one is stolen!, wait over here”

And I escorted to a waiting room.
So it was me and a few Mexicans waiting there..

Some 30 minutes later he got back to me and escorted me to the police office at the airport.
He also explained that I had to walk on his left side all the time, so I wouldnt grab his gun(!!!)

After this I had to spend about an hour in the police station beeing interrogated by two other officers, luckily, those two were friendly and tried to understand what I was saying.

Eventually they said that they didn’t know if passport notifications from Sweden was assigned to the person owning the passport or to the actual passport in their system and they let me go.

If their system would have been corrrectly modelled, they would have been able to see that it was an old passport that was reported missing.
And they would have saved themselves alot of work and I wouldn’t have had to spend one and a half hour trying to prove that I didn’t do anything wrong.

If this had been a commercial company, they would have lost money trying to figure out what was wrong and they would also have one seriously unpleased customer.

//Roger