Reading Time: minutes remaining

How to find, manage and remove duplicates in Excel

Jason Morrell

by Jason Morrell 
March 12, 2015

Duplicates can cause lots of headaches in Excel.

They can throw out your tallies or counts, and can cause the wrong information to be reported.

If you have the same person listed in your database two or more times, perhaps with the same customer ID or same name, but with two different residential addresses because they moved house, then which one is the current record?

An example of a duplicate entry. Here we have 2 addresses for the same person. Which record should we keep?

Figure 01: An example of a duplicate entry. Here we have 2 addresses for the same person. Which record should we keep?

The likelihood of this occurring increases the larger your database becomes and the more people use it.

It happens.

Excel provides 2 tools for handing this situation and cleaning up your database. First I’ll explain how to identify duplicate entries in your data then I’ll show you how to remove duplicates quickly.

It’s good to know how to deal with duplicates effectively.

1.  How to identify duplicates

The first tool is one of the options on the Conditional Formatting feature.

As its name suggests, there are two components - a condition and a format.

Conditional formatting allows you to check the value of a cell and if a certain condition is true then apply a particular cell format.

For example, 

If the value of a cell is greater than 50, format the cell with white text and a red fill.

When checking for duplicates the test is:

Does this cell have an identical twin?

So, here’s the process:

1.  Highlight the cells that you wish to check. (In a table, this is normally one particular column).

2.  On the Home tab, click the Conditional Formatting icon.

3.  Choose Highlight Cells Rules.

4.  Choose Duplicate Values.

Conditional Formatting icon

Figure 02: Conditional Formatting icon

5.  Select how you would like to format any duplicate cells found.

Choose how you would like to format any duplicate cells found

Figure 03: Choose how you would like to format any duplicate cells found

Once this is done, any cell that has another cell with an identical value is highlighted.

Duplicate cells are highlighted

Figure 04: Duplicate cells are highlighted

2. How to remove duplicates

To remove duplicates from a single column:

1.  Select the cell range (or the entire column if that’s easier)

2.  Click the Data tab.

3.  Select the Remove Duplicates icon.

Remove Duplicates

Figure 05: Remove Duplicates

4.  Tick the box labelled My data has headers if your data has a header row, otherwise leave unticked.

Confirm that you wish to remove the duplicates

Figure 06: Confirm that you wish to remove the duplicates

5.  Click OK.

3. How to group duplicates together

Highlighting and removing duplicates is easy for a single column of data, but it’s very rare that you have just one column. Normally, that data is part of a larger table and you need to remove the entire duplicate row.

On a larger table the first duplicate cell might be on row 5 and the next one on row 105 and the next one on row 505.

So is there an easy way to collect them all together in one place?

Good question. I’m glad you asked.

Yes, there is.

Excel allows you to sort by colour.

Here’s how:

1.  Highlight your duplicate cells (using conditional formatting) as described above. Here’s an example, where a duplicate Customer ID has been identified.

Conditional Formatting applied to the Customer ID column

Figure 07: Conditional Formatting applied to the Customer ID column

2.  Next, click any cell in your table.

3.  Select the Data tab.

4.  Click the large Sort button (not the little AZ or ZA icons)

5.  In the Sort By drop-down list, select the column that contains the highlighted duplicates.

6.  Change Sort On to Cell Color.

7.  Set the Order to the colour you used to highlight the duplicate cells.

Sort the list by the Cell Color for the column that contains duplicates

Figure 08: Sort the list by the Cell Color for the column that contains duplicates

8.  Click OK.

The duplicate rows are now all grouped together at the top of the table. From here you can choose which rows to remove and delete them manually.

Duplicate values are grouped together at the top of the table

Figure 09: Duplicate values are grouped together at the top of the table

Note: When you have several duplicate items, they may not be paired together even though all they are all listed at the top of the table. To rectify this you can add a second sort level that sorts the data by value, smallest to largest.

4. How to check 2 or more columns for duplicates

Consider this example. Two columns have been selected and checked for duplicate values.

How can you check 2 columns in combination for duplicates?

Figure 10: How can you check 2 columns in combination for duplicates?

As you can see Bloggs has been highlighted 3 times - twice for Fred and once for Del.

Also, Smith has been flagged as duplicate, even though one is Joe and one is Fred.

And finally, Fred has been highlighted 3 times - twice for Bloggs and once for Smith.

The only 2 entries that should be flagged here are Fred Bloggs.

To get around this, we can join first name and last name into a single full name using CONCATENATE. In cell D3, write the formula:

=CONCATENATE(B3, " ", C3)

The CONCATENATE function joins together all the items listed inside the brackets. As you can see, the first name and last name have been separated by a space.

Now, simply check the Full Name column for duplicate values.

First name and last name are joined together using CONCATENATE

Figure 10: First name and last name are joined together using CONCATENATE

5. What next?

I hope this has opened up your eyes and given you a few good ideas.

What do you think?

I hope you found plenty of value in this post. I'd love to hear your biggest takeaway in the comments below together with any questions you may have.

Have a fantastic day.

Jason Morrell blog signature
About Jason Morrell

About the author

Jason Morrell


Jason Morrell is a professional trainer, consultant and course creator who lives on the glorious Gold Coast in Queensland, Australia.


He helps people of all levels unleash and leverage the power contained within Microsoft Office by delivering training, troubleshooting services and taking on client projects. He loves to simplify tricky concepts and provide helpful, proven, actionable advice that can be implemented for quick results.


Purely for amusement he sometimes talks about himself in the third person.

SHARE

more from

Office Mastery logo
Read More
Multilevel numbering in Word that works perfectly, first time
Read More
How to Consolidate Data in Excel: 2 Easy Ways to use Excel’s Consolidation Tool

JOIN THE DISCUSSION

    • King_Excel, thank you for providing an automation tool which makes the processes described in the post above a little easier to execute.

      Jason

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    >