18
May 2007
.NET Using Dictionaries for a Performance Boost
I often see code that loops through List<T>s or uses List<T>.ForEach() to repeatedly find specific items in a list. Take the below example. Rather than hitting the database every time we need to check permissions, they're all loaded upon login and cached into the session. As the user moves around the application, the permissions are checked with code similar to the below.
/// <summary>
/// A cached list of posts with their access levels
/// </summary>
List<Post> posts;
/// <summary>
/// Preload the posts for testing permissions
/// </summary>
private void preloadPosts()
{
posts = DataAccess.Post.GetAll();
}
/// <summary>
/// Returns the access level for a given post
/// </summary>
/// <param name="postID"></param>
/// <returns></returns>
public AccessLevel GetAccessLevel(int postID)
{
// Loop through each post looking for the access level
foreach (Post p in posts)
if (p.ID == postID)
return p.AccessLevel;
// If we didn't find a post, we don't have access to it
return AccessLevel.NoAccess;
}
While this isn't a big deal if the list is small, or the method isn't called much, what started off as simple code could quickly grow.
I was asked to look at the performance of one our applications, and requested the company invest in RedGate Ants Profiler (which I'd highly recommend for anyone doing .NET development - shame there isn't a cheaper Home Edition!). When I ran the profiler it became very obvious very quickly where the problem was. The problem was that our application was taking several minutes to return a single list/table of data. The database had two tables: Person and Answer. The Answer table had a PersonID, and each Person could have hundreds of Answers. There were hundreds of People.
The offending code looped through all People, and then did a .FindAll() on a List<Answer> containing every answer for every positions. This mean with 10,000 positions, each having 1,000 answers, the List<Answer> had 10,000,000 items. This meant we were calling .FindAll() 10,000 times on 10,000,000 items. That's 100,000,000,000 iterations.
Since I wasn't able to modify any of the data access code at the time, I had to work with the two lists (List<Person> = 10,000 items, List<Answer> = 10,000,000 items).
Rather than loop through the 10,000,000 items for every position, I decided to loop through them once. By adding them to a dictionary based on their PersonID, I could quickly retrieve them while looping over the people.
The new code looked like this:
public void OutputData(List<Person> people, List<Answer> answers)
{
// Create a dictionary to looup answers in
Dictionary<int, List<Answer>> answerLookup = new Dictionary<int, List<Answer>>();
// Loop through every answer
foreach (Answer a in answers)
{
// If we haven't hit this Person before, create a lookup for them
List<Answer> myAnswers;
if (!answerLookup.ContainsKey(a.PersonID))
{
myAnswers = new List<Answer>();
answerLookup.Add(a.PersonID, myAnswers);
}
else // Else get the existing list
myAnswers = answerLookup[a.PersonID];
// Add this answer to our lookup
myAnswers.Add(a);
}
// Now output our data
foreach (Person p in people)
{
// Get a list of ONLY the answers for this position
List<Answer> myAnswers = answers[p.PersonID];
Console.WriteLine(p.Name);
// Loop through our answers
foreach (Answer a in myAnswers)
Console.WriteLine(" {0}", a.AnswerValue);
}
}
The resulting code took seconds to run instead of minutes, and the time spent on those loops was now only a small fraction of the total time (the database calls took most if it).
So, next time you find yourself writing a .Find() call or a .ForEach() to just find a particular element, consider the performance implications, and maybe add a Dictionary to make things faster.