Home > dotnet > LINQ GroupBy in Depth

LINQ GroupBy in Depth

Using LINQ Groupby & C# to aggregate and arrange data

by

This article first appeared at KillAllDefects on January 21st, 2020.

Let’s talk about how LINQ’s GroupBy method makes grouping data in C# easy. We’ll start basic and then go over every available overload of the GroupBy method to explore advanced use cases.

GroupBy allows you to quickly group collections of related data by specific properties on your data. The grouped data is then arranged by sub-collections of items in those groups.

Note: LINQ provides variants of each method in this article that work with either IEnumerable or IQueryable. These methods are otherwise identical, so for the purposes of this article we will ignore the difference in data source type.

Simple Grouping

Let’s take a look at a sample involving a small data set of books:

[
  {
    "Title": "Sphere",
    "Author": "Michael Crichton",
    "Genre": "ScienceFiction",
    "IsFiction": true
  },
  {
    "Title": "Jurassic Park",
    "Author": "Michael Crichton",
    "Genre": "ScienceFiction",
    "IsFiction": true
  },
  {
    "Title": "Working Effectively with Legacy Code",
    "Author": "Michael Feathers",
    "Genre": "Technical",
    "IsFiction": false
  },
  {
    "Title": "Your Code as a Crime Scene",
    "Author": "Adam Tornhill",
    "Genre": "Technical",
    "IsFiction": false
  },
  {
    "Title": "Software Design X-Rays",
    "Author": "Adam Tornhill",
    "Genre": "Technical",
    "IsFiction": false
  }
]

Assuming I have this data loaded into a collection of IEnumerable<Book> called books, I can then use GroupBy to group items by various properties by specifying the key selector.

For example, I’ll group by author by using books.GroupBy(b => b.Author)

This results in an IEnumerable<IGrouping<string, Book>>. Don’t let that return type scare you – it just means a collection of groups where each group is based on a string value (whatever value your key selector returned) and contains Books objects.

If that is a mouthful, let’s look at the following:

[
    [
        {
        "Title": "Sphere",
        "Author": "Michael Crichton",
        "Genre": "ScienceFiction",
        "IsFiction": true
        },
        {
        "Title": "Jurassic Park",
        "Author": "Michael Crichton",
        "Genre": "ScienceFiction",
        "IsFiction": true
        }
    ],
    [
        {
        "Title": "Working Effectively with Legacy Code",
        "Author": "Michael Feathers",
        "Genre": "Technical",
        "IsFiction": false
        }
   ],
   [
        {
        "Title": "Your Code as a Crime Scene",
        "Author": "Adam Tornhill",
        "Genre": "Technical",
        "IsFiction": false
        },
        {
        "Title": "Software Design X-Rays",
        "Author": "Adam Tornhill",
        "Genre": "Technical",
        "IsFiction": false
        }
    ]
]

Essentially, we split our collection into multiple sub-collections, and each one of those has a Key property of whatever type we grouped on, and itself serves as an IEnumerable<T> that allows us to enumerate over the items in the group.

GroupBy Element Selectors

If we you don’t want the entire body of the grouped item in the sub collection, you can use the overload that also takes in an element selector. The element selector is just a function that selects the part of the object that the grouped element will contain.

For example, if in our earlier example I did books.GroupBy(b => b.Author, b.Title), I would get a collection of groups by author that then contained only string values for titles by that author:

[
    [
        "Sphere",
        "Jurassic Park"
    ],
    [
        "Working Effectively with Legacy Code",
        "Your Code as a Crime Scene",
        "Software Design X-Rays"
    ]
]

Admittedly, the results here look a little odd when serialized to JSON. This is because the group key is not included in the serialized results and so we don’t see the author listed. Rest assured that you could still getting at the author by looking at the Key property of each group.

Result Selectors

Now we’ve seen how to work with key and value selectors, let’s introduce a third type of selector: result selectors.

Result selectors let you customize the generated collection. Instead of working with an IGrouping<TKey, TValue> you can effectively project the collection into whatever shape you’d like it to be in.

Let’s use this to address the serialization quirk we saw with the last example:

var grouped = books.GroupBy(b => b.Author, b => b.Title, (key, value) => new
{
   Author = key,
   Books = value
});

Here we select the author and title for key and value, like we did before, but now we project each group into a new anonymous type, setting an Author property to the key of each group and setting the groups collection to the collection of title values.

The end JSON is much more useful for representing our group:

[
  {
    "Author": "Michael Crichton",
    "Books": ["Sphere", "Jurassic Park"]
  },
  {
    "Author": "Michael Feathers",
    "Books": ["Working Effectively with Legacy Code"]
  },
  {
    "Author": "Adam Tornhill",
    "Books": ["Your Code as a Crime Scene", "Software Design X-Rays"]
  }
}

I should point out an important point here – with this overload, we are no longer returning an IEnumerable<IGrouping<TKey, TValue>> but rather an IEnumerable<TProjected> where TProjected is whatever the result of our result selector is.

Equality Comparer

The last possible parameter to GroupBy is an equality comparer. Comparers are used when determining which group an item belongs in and can be helpful to use if you have data that is not being grouped properly.

For example, let’s say that your data has a few rows with different casing for the same author:

  • Michael Crichton
  • michael crichton
  • Michael CRichton

We can pass in an IEqualityComparer<TKey> that will be used to compare various key values. Since our key, author, is a string value, we need an IEqualityComparer<string>.

Thankfully, .NET ships with several of these built in to the StringComparer class. In our case, we’ll use StringComparer.CurrentCultureIgnoreCase to compare our authors:

var grouped = books.GroupBy(
   b => b.Author,
   b => b.Title,
   (key, value) => new
   {
      Author = key,
      Books = value
   },
   StringComparer.CurrentCultureIgnoreCase);

Using the comparer in this way will ignore any casing differences between author entries.


You may be wondering which key value is used if multiple values can compare to the same group. The answer is that LINQ uses the first value encountered in that group as the official group key. This means that even if you use a StringComparer to ignore casing differences, you could still wind up with a key value that might not match the ideal formatting.

This is why I would advocate that if you consider using an IEqualityComparer you should also think about cleaning up and normalizing your data source instead.

That said, if you need to group elements on some criteria other than reference equality, implementing a custom IEqualtyComparer can be the way to go. I would expect, however, that these cases would be few and far between.

Closing Thoughts

Before writing this article, I found the IGrouping result too inconvenient to work with and iterate over and largely avoided LINQ GroupBy syntax.

Upon fully exploring this method and its overloads, I think there are a number of compelling reasons to use GroupBy, particularly the variant that allows you to project groups into custom objects and formats.

If you still have questions or would like to learn more about the material in question, take a look at MSDN’s documentation on the GroupBy method.

Whether GroupBy joins your set of tools you frequently use or not, it is a powerful and capable component of LINQ and a tool to keep in mind.

Author

  • Matt Eland

    After several decades as a software engineer and engineering manager, Matt now serves as a software engineering instructor at Tech Elevator where he gets to raise up future developers and unleash them upon the world to build awesome things. Matt is an Azure Data Scientist and AI Engineer Associate, runs a data science blog and YouTube channel, is currently pursuing a master's degree in data analytics, and helps organize the Central Ohio .NET Developer Group. In his copious amounts of spare time, Matt continues to build nerdy things and looks for ways to share them with the community.

Leave a Reply

Related Content

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More