O/R Mapping and domain query optimizations

One of the cons of O/R mapping is that the abstraction is a bit too high.
You write object-oriented code and often forget about eventual performance problems.

Take this (somewhat naive) example:

class Customer
{
   ...
  public double GetOrderTotal()
   {
       var total = ( from order in this.Orders
                        from detail in order.Details
                        select detail.Quantity * detail.ItemPrice)
                       .Sum();

       return total;
   }
}

For a given customer, we iterate over all the orders and all the details in those orders and calculate the sum of quantity multiplied with itemprice.
So far so good.

This will work fine as long as you have all the data in memory and the dataset is not too large, so chances are that you will not notice any problems with this code in your unit tests.

But what happens if the data resides in the database and we have 1000 orders with 1000 details each?
Now we are in deep s##t, for this code to work, we need to materialize at least 1 (cust) + 1000 (orders) * 1000 (details) entities.
The DB needs to find those 1 000 001 rows , the network needs to push them from the DB server to the App server and the App server needs to materialize all of it.
Even worse, what if you have lazy load enabled and aren’t loading this data using eager load?
Then you will hit the DB 1 000 001 times… GL with that! :-)

So clearly, we can not do this in memory, neither with lazy load nor eager load.

But what are the alternatives?
Make an ad hoc sql query?
In that case, what happens to your unit tests?

Maybe we want to keep this code, but we want to execute it in the database instead.

This is possible if we stop beeing anal about “pure POCO” or “no infrastructure in your entities”

Using an unit of work container such as https://github.com/rogeralsing/Precio.Infrastructure

We can then rewrite the above code slightly:

class Customer
{
   ...
  public double GetOrderTotal()
   {
  var total = ( from customer in UoW.Query<Customer>() //query the current UoW
                        where customer.Id == this.Id //find the persistent record of "this"
                        from order in customer.Orders
                        from detail in order.Details
                        select detail.Quantity * detail.ItemPrice)
                       .Sum();

       return total;
   }
}

This code will run the query inside the DB if the current UoW is a persistent UoW.
If we use the same code in our unit tests and use an in mem UoW instance, this code will still work, if our customer is present in the in mem UoW that is..

So the above modification will reduce the number materialized entities from 1 000 001 to 1 (we materialize a double in this case)

I don’t know about you , but I’d rather clutter my domain logic slightly and get a million times better performance than stay true to POCO and suffer from a broken app.

UoW / NWorkspace with Linq support

I have blogged about this for quite a while now.
Now I’ve finally cleaned up the code and published it at github:https://github.com/rogeralsing/Precio.Infrastructure

This is a small framework for UoW/Workspace support in .NET with Linq support.

The framework contains a Unit of Work implementation and providers for Entity Framework 4, NHibernate and MongoDB(using NoRM).
There is also a small incomplete Blog sample project included.

Building a Document DB ontop of Sql Server

I’ve started to build a Document DB emulator ontop of Sql Server XML columns.
Sql Server XML columns can store schema free xml documents, pretty much like RavenDB or MongoDB stores schema free Json/Bson documents.

XML Columns can be indexed and queried using XPath queries.

So I decided to build an abstraction layer ontop of this in order to achieve similair ease of use.
I’ve built a serializer/deserializer that deals with my own XML structure for documents (state + metadata) and also an early Linq provider for querying.

Executing the following code:

var ctx = new DocumentContext("main");
var customers = ctx.GetCollection<Customer>().AsQueryable();

var query = from customer in customers
            where customer.Address.City == "abc" && customer.Name == "Acme Inc5"
            orderby customer.Name
            select customer;

var result = query.ToList();
foreach (var item in result)
{
    Console.WriteLine(item.Name);
    Console.WriteLine(item.Address.City);
}

Will yield the following SQL + XPath query:

select *

from documents

where CollectionName = 'Customer' and
   ((documentdata.exist('/object/state/Address/object/state/City/text()[. = "abc"]') = 1) and
    (documentdata.exist('/object/state/Name/text()[. = "Acme Inc5"]') = 1))

order by documentdata.value('((/object/state/Name)[1])','nvarchar(MAX)')

The result of the query will be returned to the client and then deserialized into the correct .NET type.

Entity Framework 4 Enum support in Linq

As many of you might know, Entity Framework 4 still lacks support to map enum properties.
There are countless of more or less worthless workarounds, everything from exposing constants integers in a static class to make it look like an enum to totally insane generics tricks with operator overloading.

None of those are good enough IMO, I want to be able to expose real enum properties and make Linq queries against those properties, so I’ve decided to fix the problem myself.

My approach will be using Linq Expression Tree rewriting using the ExpressionVisitor that now ships with .net 4.
By using the ExpressionVisitor I can now clone an entire expression tree and replace any node in that tree that represents a comparison between a property and an enum value.

In order to make this work, the entities still needs to have an O/R mapped integer property, so I will rewrite the query from using the enum property and enum constant to use the mapped integer property and a constant integer value.

For me this solution is good enough, I can make the integer property private and make it invisible from the outside.

Example

public class Order
{
     //this is the backing integer property that is mapped to the database
  private int eOrderStatus {get;set;}

  //this is our unmapped enum property
  public OrderStatus Status
     {
  get{return (OrderStatus) eOrderStatus;}
            set{eOrderStatus = (int)value;}
     }

     .....other code
}

This code is sort of iffy and it does violate some POCO principles but it is still plain code, nothing magic about it..

So how do we get our linq queries to translate from the enum property to the integer property?

The solution is far simpler that I first thought, using the new ExpressionVisitor base class I can use the following code to make it all work:

namespace Alsing.Data.EntityFrameworkExtensions
{
    public static class ObjectSetEnumExtensions
    {
        private static readonly EnumRewriterVisitor visitor = new EnumRewriterVisitor();
        private static Expression< Func< T, bool>> ReWrite< T>(this Expression< Func< T, bool>> predicate)
        {
            var result = visitor.Modify(predicate) as Expression< Func< T, bool>>;
            return result;
        }

        public static IQueryable< T> Where< T>(this IQueryable< T> self,
            Expression< Func< T, bool>> predicate) where T : class
        {
            return Queryable.Where(self, predicate.ReWrite());
        }

        public static T First< T>(this IQueryable< T> self,
            Expression< Func< T, bool>> predicate) where T : class
        {
            return Queryable.First(self, predicate.ReWrite());
        }
    }

    public class EnumRewriterVisitor : ExpressionVisitor
    {
        public Expression Modify(Expression expression)
        {
            return Visit(expression);
        }

        protected override Expression VisitUnary(UnaryExpression node)
        {
            if (node.NodeType == ExpressionType.Convert && node.Operand.Type.IsEnum)
                return Visit(node.Operand);

            return base.VisitUnary(node);
        }

        protected override Expression VisitMember(MemberExpression node)
        {
            if (node.Type.IsEnum)
            {
                var newName = "e" + node.Member.Name;
                var backingIntegerProperty = node.Expression.Type.GetMember(newName, System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Public)
                    .FirstOrDefault();

                return Expression.MakeMemberAccess(node.Expression, backingIntegerProperty);
            }

            return base.VisitMember(node);
        }
    }
}

The first class, is an extension method class that overwrite the default “where” extension of IQueryable of T.
The second class is the actual Linq Expression rewriter.

By including this and adding the appropriate using clause to your code, you can now make queries like this:

var cancelledOrders = myContainer.Orders.Where(order => order.Status == OrderStatus.Cancelled).ToList();

You can of course make more complex where clauses than that since all other functionality remains the same.

This is all for now, I will make a followup on how to wrap this up in a Linq query provider so you can use the standard linq query syntax also.

Hope this helps.

//Roger

Two flavors of DDD

I have been trying to practice domain driven design for the last few years.
During this time, I have learnt that there are almost as many ways to implement DDD as there are practitioners.

After studying a lot of different implementations I have seen two distinct patterns.

I call the first pattern “Aggregate Graph”:

When applying aggregate graphs, you allow members of one aggregate to have direct associations to another aggregate.
For example, an “Order” entity which is part of a “Order aggregate” might have a “Customer” property which leads directly to a “Customer” entity that is part of a “Customer aggregate”.

 aggregate-graph

According to Evans book this is completely legal, any member of an aggregate may point to the root of any other aggregate.
Evans is very clear on the matter that aggregate root identities are global while identity of non root entities are local to the aggregate itself.

The opposite pattern would be what I call “Aggregate Documents”:

Here the aggregates never relate _directly_ to other aggregate roots.
Instead, the associations may be designed as “snapshots” where you store light weight value object clones of the related aggregate roots.
An “Order” entity would have a “Customer” property which leads to a “CustomerSnapshot” value object instead of a Customer entity.
This way each aggregate instance becomes more of a free-floating document.

aggregate-document

Since I have been applying both of these patterns, I will try to highlight the pros and cons of them in the rest of this post.

Aggregate Graph

The Aggregate Graph pattern is the approach I used when I first started doing DDD and I think that it is the most common way to implement DDD.
Since I was an O/RM developer (NPersist) this felt very natural to me, I could design my object graph in our design tool and then draw a few boxes on top of it and claim that those were my aggregates.
I most often used eager load inside the aggregates and lazy load between aggregates in order to avoid that the entire database was fetches when one aggregate instance was loaded.

This had a very nice “OOP” feel to it, I was working with objects and associations and I could ignore that there even was a database involved.

My “Repositories” were mere windows into my object graph, I could ask a repository to give me one or more aggregate roots and from those object I could pretty much navigate to any other object in the graph due to the spider web nature of the aggregate graph.

repository-window

The pros of this approach is that it is easy to understand, you design your domain model just like any other class model.
It also works very well with O/R mappers, features like Lazy Load and Dirty Tracking makes it all work for you.

However, there are a few problems with this approach too.
Firstly, Lazy Load in O/R mappers is an implicit feature, there is no way for a developer to know at what point he will trigger a roundtrip to the database just by reading the code.
It always looks like you are traversing a fully loaded object graph while you are in fact not.
This often leads to severe performance problems if your development team don’t fully understand this.

I have seen reports over this kind domain models where the implicit nature of Lazy Load have caused some 700 round-trip to the database in a single web page.

This is what you get when you try to solve an explicit problem in an implicit way.

If you are going to use Lazy Load, make sure your team understands how it works and where you use it.

Another problem with this approach arise when you need to fill your entities with data from multiple sources.
Many of the applications I build nowadays relies on data from multiple sources, it could be a combination of services and internal databases.

When using Lazy Load to get related aggregates, there is no natural point where you can trigger calls to the other data sources and fill additional properties.
You will most likely have to hook into your O/R mapper in order to intercept a lazy load and call the services from there.
nowadays, I mostly use the second approach, Aggregate Documents.

Aggregate Document

Aggregate Document approach is much more explicit in its design.
For example, if you want to find the orders for a specific customer;
Instead of navigating the “Orders” collection of “Customer”, you will have to call a “FindOrdersByCustomer” query on the “OrderRepository”.

While I do agree that this looks less object oriented than the first approach, this allows developers to reason about the code in a different way.
They can see important design decisions and hopefully avoid pitfalls like ripple loading.

Another benefit is that since you only work with islands of data, you can now aggregate data from multiple sources much easier.
You can simply let your repositories aggregate the data into your entities.
(If you do it inside the actual repository or let the repository use some data access class that does it for it is up to you)
repo-prism
You don’t have to hook into any O/RM infrastructure since you no longer rely on lazy load between aggregates.

Personally I use eager load inside my aggregates, that is, I fetch “Order” and “Order Detail” together as a whole.
A side effect of this is that since I don’t use Lazy Load between aggregates and don’t use Lazy Load inside my aggregates, my need for O/R mapping frameworks drops.
I can apply this design without using a full-fledged O/R mapper framework.
I’m not saying that you should avoid O/R mapping, just that it is much easier to apply this pattern if you can’t use an O/R mapper for some reason.

This also makes it easier to expose your domain model in an SOA environment.
You can easily expose your entities or DTO versions of them in a service.

Lazy Load and services don’t play that well together.

Maybe it looks like I dislike the first approach, this is not the case, I may very well consider it in a smaller project where there is just one data source and where the development team is experienced with O/R mapping.
You can also create hybrids of the two approaches;
e.g. In Jimmy Nilsson’s book “Applying Domain Driven Design and Patterns” there are examples where an “Order” aggregate have a direct relation to the “Product” aggregate while the same “Order” aggregate uses snapshots instead of direct references to the “Customer” aggregate.

Snapshots also comes with the benefit of allowing you to store historical data.
The snapshot can for example store both the CustomerId and the name of the customer at the time the order was placed.

Thats all for now.

//Roger

Follow

Get every new post delivered to your Inbox.