Duplicate Detection and Consolidation in CodeRush for Visual Studio

Mark Miller on CodeRush & IDE Productivity

29 November 2011

CodeRush 11.2 introduces a new feature we’re very excited about. It’s called Duplicate Detection and Consolidation (DDC) and it’s revolutionary.

Background

Duplicate code, sometimes referred to as clones, is a cluster of code blocks that are functionally equivalent (or nearly equivalent) spanning across two or more locations within a solution. Duplicate code is expensive to maintain because:

Multiplied bugs. A bug in one clone means there’s a bug in all the copies. This can lead to a continued, repeated release of previously-fixed bugs as each copy of the bug is individually discovered by a customer and then fixed by the team.
Flexibility barriers. If the copied code needs to be more flexible, changes need to be made across all the copies or ideally, the copies need to be consolidated first. Unfortunately consolidation is a high-risk, error-prone, time-consuming activity.
Increased ramp-up time. Copy the code and new developers trying to get up to speed will have twice as much code to read and understand. If discovered, duplicated code tends to be harder to understand than normal code because the reader must not only understand functionality, but also understand the reason behind the duplication.

Reaching into the Future

Because duplicate code is such an expensive problem, for years tools have been developed to detect duplicate code, each achieving various degrees of success. So you might wonder, after so many years of development, what makes CodeRush’s DDC so revolutionary?

Speed. CodeRush has the fastest duplicate code detection available for .NET. It’s 17 times faster than the duplicate code detection currently available in the Visual Studio 11 preview.
Integration. DD runs in a background thread while you work in Visual Studio. Duplicate code is highlighted on screen so you know exactly what you’re working with before you change the code.
Functional Equivalence. The CodeRush duplication detection engine is built to understand functionally equivalent code. In 11.2, we take baby steps in this direction. However, by 12.1 we expect to really be blowing minds in this area, matching structurally distinct yet functionally equivalent blocks of code.
Consolidation. CodeRush is the first and only tool on this planet to offer the ability to immediately consolidate duplication into a single block of code, directly from the IDE. Not only is CodeRush first, but Team CodeRush has delivered an impressive solution.

The Complexity of Consolidation

To give you an idea of just how impressive duplicate consolidation is, let’s take a look at what needs to happen to consolidate:

You need to understand not only the similarities, but also the differences among the code blocks. Differences can be parameterized.
If the duplicate blocks of code are functionally equivalent but structurally distinct, then a decision needs to be made - upon which of the duplicate blocks should we base the consolidation?
You need to offer a variety of consolidation solutions so the developer can pick the one that fits best. For example, you could consolidate to a new method in a class, create a new ancestor class, or create a new helper class.
You need to understand and respect project boundaries. In many cases duplication between projects survives beyond initial discovery because consolidating is time consuming and may require the creation of a new project. If a new project is created, the appropriate references need to be added. Regardless of whether we’re creating a new project or moving code from one existing project to another, we need to verify that we can do this without creating any circular references.
Types need to be correctly resolved. You might have two classes with the same name declared in different namespaces in your solution. The consolidated code must have the correct namespace imports to ensure that all the types are correctly resolved.

As a result, the steps to consolidate by hand are many, complex, error-prone, and high-risk. Only the best developers are brave enough to venture forth in this domain, and if they do, they are best advised to bring a paired programmer along for the ride.

Enter CodeRush 11.2

This changes everything. From this day forward, the quality of code is going up. Let’s take a simple example. I have a class named Mortal, which looks like this:

public class Mortal
{
protected List<Mortal> friends = new List<Mortal>();
public string Name { get; set; }
}
And a descendant named Human, which has a FindHuman method:

public class Human : Mortal
{
public Mortal FindHuman(string name)
{
    Mortal mortal = null;
    foreach (Mortal friend in friends)
    {
      if (String.Compare(friend.Name, name, false) == 0)
      {
        Console.WriteLine("Found a Human: " + friend.Name);
        mortal = friend;
        break;
      }
    }
    return mortal;
}
}
So far so good. But now, let’s do something incredibly dangerous. We’re going to make a copy of this Human class and perform a search and replace, “Human” for “Martian”…

public class Martian : Mortal
{
public Mortal FindMartian(string name)
{
    Mortal mortal = null;
    foreach (Mortal friend in friends)
    {
      if (String.Compare(friend.Name, name, false) == 0)
      {
        Console.WriteLine("Found a Martian: " + friend.Name);
        mortal = friend;
        break;
      }
    }
    return mortal;
}
}

But let’s continue to change the code. Let’s introduce a few local variables, rename “mortal” to “myFavoriteMartian”, remove the unnecessary curly braces around the if-block, and change case-sensitivity of the String.Compare call so it ignores case (as one might expect with Martian names)…

public class Martian : Mortal
{
public Mortal FindMartian(string name)
{
    Mortal myFavoriteMartian = null;
    bool ignoreCase = true;
    string msg = "Found a Martian: ";
    foreach (Mortal friend in friends)
      if (String.Compare(friend.Name, name, ignoreCase) == 0)
      {
        Console.WriteLine(msg + friend.Name);
        myFavoriteMartian = friend;
        break;
      }
    return myFavoriteMartian;
}
}

And now we have some code that is functionally equivalent, but structurally distinct. Can CodeRush find this? It can, providing we drop the Analysis Level down to 2, so the detection engine can see this smaller code block (the default level is set to 3, which is a slightly larger block):

Here’s what the code looks like inside Visual Studio:

On the left of the gutter in each view, you see a thin purple vertical bar. This means duplicate code exists elsewhere. Also, in the bottom right, the “!!” icon signals CodeRush has detected duplicates inside the solution. If you hover over the icon in the bottom right, a hint appears to explain the status.

You can click the icon to see a summary of all duplicate clusters found inside the new Duplicate Code tool window. In this case, our simple example, only one cluster is found:

On the left is a list of all clusters found, sorted by redundancy (the total amount of redundant code that could be removed if consolidated). Note that because we are sorting by redundancy, it’s possible for a very small block of code (duplicated many times) to appear above a very large duplicate block of code (that only appears twice).

On the right is a preview of the duplicate code. You can double-click any duplicate code preview to be taken directly to that location. When you do this, CodeRush immediately presents the consolidation hint, which looks like this:

Notice the mouse is positioned right over the “Next duplicate block” button, in case you want to navigate and compare the blocks. This might be useful when it’s time to consolidate. The block you consolidate from determines the structure of the consolidated block. So let’s consolidate from the FindMartian method. Click the “Next duplicate block” button, then move the mouse over the “to the base class” consolidation option. You’ll see a preview hint that looks like this:

Notice the parameters to FindMartianExtracted – CodeRush takes the code that differs between the blocks and turns them into parameters. Press Enter to apply the consolidation. The target picker will appear letting you select the location of the consolidated method.

The target picker shows the block of code that will be inserted. It’s based upon the structure of the method where we initiated the consolidation (FindMartian). Let’s press Escape to cancel and back out of this, and then see what happens when we initiate consolidation from the other duplicate…

Now consolidate to the base class Mortal. We’ll see the target picker. Notice how the FindHumanExtracted consolidated code in the preview below differs from the FindMartianExtracted consolidated code in the preview shown two screen shots above.

Just use the up/down arrow buttons to select a location and press Enter to accept it. Now we’re left with a small task of coming up with good names for the consolidated method and its parameters.

And the calling code is simplified:

And that’s it (for a simple consolidation example)!

An Even Bigger Example

But what happens when the project is much bigger? The good news is, that the detection scales well for large solutions, taking only seconds (at most a few minutes) to scan thousands of files for duplicates at the default settings (Analysis Level set to 3). Let’s take a look at DDC with the source to Umbraco.

Here’s cluster #4 selected. The Lowest and Highest methods inside the ExsltMath class:

Nearly identical blocks of code. However note the real difference is in the operator used to compare t against min (and max). On the left you have “t < min”, and on the right you have “t > max”. Can CodeRush consolidate this duplicate? It can. Here’s the preview hint:

And after consolidation (and renaming the method, the “min” local, and the “func” parameter), I get this:

And now the quality of the code is much higher than it was before.

Remember, CodeRush has the fastest duplicate code detection available for .NET. It works in the background and shows when you’re working inside a method that is cloned. Consolidation is based on the structure of the method you start from. While CodeRush can consolidate many kinds of duplicate blocks, not all duplicate blocks can be consolidated automatically (trust me, this is an incredibly challenging problem to solve). But we’re working on making the consolidation engine even smarter and even more extensive as we approach the 12.1 release (and we expect to blow your mind again with that one).

Do me a favor, kids: Tell every developer you know about CodeRush’s Duplicate Detection and Consolidation. DDC is truly revolutionary; help us spread the word.

Free DevExpress Products - Get Your Copy Today

The following free DevExpress product offers remain available. Should you have any questions about the free offers below, please submit a ticket via the DevExpress Support Center at your convenience. We'll be happy to follow-up.

Mark Miller (DevExpress)

No Comments

Please login or register to post comments.

Need help or require more information?

Announcing DevExpress Universal v23.2

DevExpress Wins 19 Visual Studio Reader's Choice Awards