LINQ GroupBy in Depth

This article first appeared at KillAllDefects on January 21st, 2020.

Let’s talk about how LINQ’s GroupBy method makes grouping data in C# easy. We’ll start basic and then go over every available overload of the GroupBy method to explore advanced use cases.

GroupBy allows you to quickly group collections of related data by specific properties on your data. The grouped data is then arranged by sub-collections of items in those groups.

Note: LINQ provides variants of each method in this article that work with either IEnumerable or IQueryable. These methods are otherwise identical, so for the purposes of this article we will ignore the difference in data source type.

Simple Grouping

Let’s take a look at a sample involving a small data set of books:

[
  {
    "Title": "Sphere",
    "Author": "Michael Crichton",
    "Genre": "ScienceFiction",
    "IsFiction": true
  },
  {
    "Title": "Jurassic Park",
    "Author": "Michael Crichton",
    "Genre": "ScienceFiction",
    "IsFiction": true
  },
  {
    "Title": "Working Effectively with Legacy Code",
    "Author": "Michael Feathers",
    "Genre": "Technical",
    "IsFiction": false
  },
  {
    "Title": "Your Code as a Crime Scene",
    "Author": "Adam Tornhill",
    "Genre": "Technical",
    "IsFiction": false
  },
  {
    "Title": "Software Design X-Rays",
    "Author": "Adam Tornhill",
    "Genre": "Technical",
    "IsFiction": false
  }
]

Assuming I have this data loaded into a collection of IEnumerable<Book> called books, I can then use GroupBy to group items by various properties by specifying the key selector.

For example, I’ll group by author by using books.GroupBy(b => b.Author)

This results in an IEnumerable<IGrouping<string, Book>>. Don’t let that return type scare you – it just means a collection of groups where each group is based on a string value (whatever value your key selector returned) and contains Books objects.

If that is a mouthful, let’s look at the following:

[
    [
        {
        "Title": "Sphere",
        "Author": "Michael Crichton",
        "Genre": "ScienceFiction",
        "IsFiction": true
        },
        {
        "Title": "Jurassic Park",
        "Author": "Michael Crichton",
        "Genre": "ScienceFiction",
        "IsFiction": true
        }
    ],
    [
        {
        "Title": "Working Effectively with Legacy Code",
        "Author": "Michael Feathers",
        "Genre": "Technical",
        "IsFiction": false
        }
   ],
   [
        {
        "Title": "Your Code as a Crime Scene",
        "Author": "Adam Tornhill",
        "Genre": "Technical",
        "IsFiction": false
        },
        {
        "Title": "Software Design X-Rays",
        "Author": "Adam Tornhill",
        "Genre": "Technical",
        "IsFiction": false
        }
    ]
]

Essentially, we split our collection into multiple sub-collections, and each one of those has a Key property of whatever type we grouped on, and itself serves as an IEnumerable<T> that allows us to enumerate over the items in the group.

GroupBy Element Selectors

If we you don’t want the entire body of the grouped item in the sub collection, you can use the overload that also takes in an element selector. The element selector is just a function that selects the part of the object that the grouped element will contain.

For example, if in our earlier example I did books.GroupBy(b => b.Author, b.Title), I would get a collection of groups by author that then contained only string values for titles by that author:

[
    [
        "Sphere",
        "Jurassic Park"
    ],
    [
        "Working Effectively with Legacy Code",
        "Your Code as a Crime Scene",
        "Software Design X-Rays"
    ]
]

Admittedly, the results here look a little odd when serialized to JSON. This is because the group key is not included in the serialized results and so we don’t see the author listed. Rest assured that you could still getting at the author by looking at the Key property of each group.

Result Selectors

Now we’ve seen how to work with key and value selectors, let’s introduce a third type of selector: result selectors.

Result selectors let you customize the generated collection. Instead of working with an IGrouping<TKey, TValue> you can effectively project the collection into whatever shape you’d like it to be in.

Let’s use this to address the serialization quirk we saw with the last example:

var grouped = books.GroupBy(b => b.Author, b => b.Title, (key, value) => new
{
   Author = key,
   Books = value
});

Here we select the author and title for key and value, like we did before, but now we project each group into a new anonymous type, setting an Author property to the key of each group and setting the groups collection to the collection of title values.

The end JSON is much more useful for representing our group:

[
  {
    "Author": "Michael Crichton",
    "Books": ["Sphere", "Jurassic Park"]
  },
  {
    "Author": "Michael Feathers",
    "Books": ["Working Effectively with Legacy Code"]
  },
  {
    "Author": "Adam Tornhill",
    "Books": ["Your Code as a Crime Scene", "Software Design X-Rays"]
  }
}

I should point out an important point here – with this overload, we are no longer returning an IEnumerable<IGrouping<TKey, TValue>> but rather an IEnumerable<TProjected> where TProjected is whatever the result of our result selector is.

Equality Comparer

The last possible parameter to GroupBy is an equality comparer. Comparers are used when determining which group an item belongs in and can be helpful to use if you have data that is not being grouped properly.

For example, let’s say that your data has a few rows with different casing for the same author:

Michael Crichton
michael crichton
Michael CRichton

We can pass in an IEqualityComparer<TKey> that will be used to compare various key values. Since our key, author, is a string value, we need an IEqualityComparer<string>.

Thankfully, .NET ships with several of these built in to the StringComparer class. In our case, we’ll use StringComparer.CurrentCultureIgnoreCase to compare our authors:

var grouped = books.GroupBy(
   b => b.Author,
   b => b.Title,
   (key, value) => new
   {
      Author = key,
      Books = value
   },
   StringComparer.CurrentCultureIgnoreCase);

Using the comparer in this way will ignore any casing differences between author entries.

You may be wondering which key value is used if multiple values can compare to the same group. The answer is that LINQ uses the first value encountered in that group as the official group key. This means that even if you use a StringComparer to ignore casing differences, you could still wind up with a key value that might not match the ideal formatting.

This is why I would advocate that if you consider using an IEqualityComparer you should also think about cleaning up and normalizing your data source instead.

That said, if you need to group elements on some criteria other than reference equality, implementing a custom IEqualtyComparer can be the way to go. I would expect, however, that these cases would be few and far between.

Closing Thoughts

Before writing this article, I found the IGrouping result too inconvenient to work with and iterate over and largely avoided LINQ GroupBy syntax.

Upon fully exploring this method and its overloads, I think there are a number of compelling reasons to use GroupBy, particularly the variant that allows you to project groups into custom objects and formats.

If you still have questions or would like to learn more about the material in question, take a look at MSDN’s documentation on the GroupBy method.

Whether GroupBy joins your set of tools you frequently use or not, it is a powerful and capable component of LINQ and a tool to keep in mind.

Author

Matt Eland

Microsoft MVP in AI, Author of "Refactoring with C#"

Matt Eland is a software engineering leader and data scientist who has served as a senior engineer, software engineering manager, professional programming instructor, and has helped build enterprise-level software at a variety of organizations before distinguishing himself as a Microsoft MVP in Artificial Intelligence by using technology to accomplish ridiculous things in the name of science and teaching others. Matt makes it his job to learn new things and share them with others through articles, videos, and talks at user groups and conferences covering a wide range of topics from software architecture to programming topics to artificial intelligence and data science. Matt is a current data analytics master's student, an AI Specialist at Leading EDJE, is the author of "Refactoring with C#" and is creating a LinkedIn course and book on Computer Vision on Azure. Matt occasionally sleeps as well.
View all posts

Recent Posts

Submitting conference abstracts that get accepted

Pitching a Tech Book to a Publisher

Writing a Book with Packt

LINQ GroupBy in Depth

Using LINQ Groupby & C# to aggregate and arrange data

Simple Grouping

GroupBy Element Selectors

Result Selectors

Equality Comparer

Closing Thoughts

Author

Related Content

Leave a ReplyCancel reply

Related Content

LINQ SelectMany in Depth

How NOT to Suck as a New Developer

Discover more from The New Dev's Guide