C# LINQ Performance vs. Iteration

It’s been a while since I have written something related to programming. Time to remedy that.

Just recently my interest for the C# language rose again and to get back up to speed with the fundamentals I swallowed all videos of an absolute beginners guide on Microsoft Virtual Academy. Something that has been touched briefly was LINQ and my initial thought was: how’s the performance of that compared to how I would usually write it in C++ – where my expertise is?

Mind you, I’m not comparing C# vs C++, but merely LINQ vs. old-school iteration. Let’s go and find out.

First off, what would I write in C++? With C++11 there are a few nice features that allow you to write it more nicely than ever before, but behind the scenes it all boils down to iterating the data and perform comparisons. That’ll happen every time, no matter if you iterate manually or use one of the C++ library’s functions in the header like std::copy_if and pass a lambda that does the comparison. The most straight forward implementation would be something like this.

std::list data = get_data_from_somewhere();
for (T value : data) {
    if (value.prop == some_value && value.other.prop == other_value) {
    // Jump in the air having fun
    }
}

Simple iteration, nothing special. Of course, C# offers that as well and that’s what I’ll use. To make it more interesting I’m not using a flat data type like an int or a string, but create my own class with a little hierarchy.

class Artist
{
    public String Name { get; set; }
    public String Country { get; set; }
}

These automatic properties are genius!

enum Genre
{
    BlackMetal,
    DeathMetal,
    MelodicBlackMetal,
    MelodicDeathMetal,
    ThrashMetal,
    GothicMetal,
    PaganMetal,
    FolkMetal,
    DoomMetal,
    ProgressiveMetal,
    PowerMetal
}

A little enum to mix into the otherwise used string comparisons. Here comes the main class that is used.

class Album
{
    public int Year { get; set; }
    public String Title { get; set; }
    public Genre Genre { get; set; }
    public Artist Artist { get; set; }
}

At the start of the main function I’ll create a list of 50 Album objects which will be used to test the performance of LINQ. The data is generated as follows.

static List GenerateData()
{
    return new List()
    {
        new Album() { 
            Title = "Night Visit", 
            Year  = 2004, 
            Genre = Genre.MelodicBlackMetal,
            Artist = new Artist() { Name = "Ancient", Country = "Norway"}
        },
        new Album() {
            Title = "Goi, Rode, Goi", 
            Year  = 2009, 
            Genre = Genre.PaganMetal,
            Artist = new Artist() { Name = "Arkona", Country = "Russia"}
        },
        new Album() {
            Title = "Resurrection through carnage", 
            Year  = 2002, 
            Genre = Genre.DeathMetal,
            Artist = new Artist() { Name = "Bloodbath", Country = "Sweden"}
        },
        // And 47 more
    }
}

All that data has been hand-crafted and it actually didn’t really make much sense. Only a very small portion was used as you’ll see in a moment. But hey, it’s for the greater good, right?

List database = GenerateData();

This gets us the list of 50 lovingly created Album instances. Time is measured using the StopWatch class from the System.Diagnostic namespace and the result is displayed in seconds. The base skeleton of the tests looks like this.

watch.Start();
for (int c = 0; c < 1000000; c++)
{
    // Find some specific albums
}
watch.Stop();

Finding a few objects in a list with 50 objects doesn‘t really take any time and that is why I run the individual tests inside a loop, 10 million times. You’ll start seeing differences at one million and at 10 million you’re in a range that is easy to grasp. Let’s begin with the iteration based approach.

List result = new List();
foreach (Album album in database)
{
    if (album.Genre == Genre.BlackMetal && album.Artist.Country == "Norway")
    {
        result.Add(album);
    }
}

List found = new List();
foreach (var album in result)
{
    found.Add(album);
}
int count = found.Count;

Nothing fancy. I create a list to stuff the findings into and then start iterating over the database collection, throwing every album into that result bucket that is of the Norwegian Black Metal kind – which is all one needs. This is how it would look like with LINQ.

var result = from album in database
             where album.Genre == Genre.BlackMetal
                && album.Artist.Country == "Norway"
             select album;

List found = result.ToList();
int count = found.Count;

This certainly saves some lines over the first test. To complete the project let’s see how it would be coded using the method based syntax.

var result = database.Where(a => a.Genre == Genre.BlackMetal)
                     .Where(a => a.Artist.Country == "Norway");

List found = result.ToList();
int count = found.Count;

That’s actually less lines than the supposed-to-be-cool way from before. But this is just a simple example (with more time spent generating the data than writing the tests), I can’t really tell how it’ll compare doing complex statements.

Now you’re probably curious to see how long each of the three ways of solving the problem takes.

Iteration 1,151s
Cool LINQ 1,778s
Uncool LINQ 1,963s

As you can see, doing 10 million times 50 iterations is 600ms faster than using LINQ. What I find curious is that the method based approach is slower than the cool LINQ. As much as I remember from years ago when this was new the compiler just creates these method calls under the hood. Maybe there’s more to it and using the cool syntax can be much more optimized.

Now, while I’m writing this it comes to me. I’m calling two Where methods and each one takes a lambda. That’s creating an additional object and doing a method call 10 million times more than necessary. I’m not sure why I did it this way but maybe that comes from having worked with Hibernate a few times now. If I rewrite the test to do both checks in one method call, the execution time changes.

First the new code.

var result = database.Where(a => a.Genre == Genre.BlackMetal)
                                 && a.Artist.Country == "Norway");

List found = result.ToList();
int count = found.Count;

And now the new result. This time the difference is in the realm of measuring inaccuracy.

Iteration 1,151s
Cool LINQ 1,778s
Uncool LINQ 1,798s

What have we learned?

Unless you’re doing crucial high-performance number crunching applications where every milliseconds counts (then you’d not be using C#), LINQ looks like a good solution to eliminate some of the boring and not very expressive iteration based loops. Maybe I’ll come around and can convince myself to do a few more complex queries with sorting and other stuff the LINQ syntax provides. I have never gone any deeper than this, so there’s much for me to discover.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s