Duplicates can cause lots of headaches in Excel.
They can throw out your tallies or counts, and can cause the wrong information to be reported.
If you have the same person listed in your database two or more times, perhaps with the same customer ID or same name, but with two different residential addresses because they moved house, then which one is the current record?
The likelihood of this occurring increases the larger your database becomes and the more people use it.
Excel provides 2 tools for handing this situation and cleaning up your database. First I’ll explain how to identify duplicate entries in your data then I’ll show you how to remove duplicates quickly.
It’s good to know how to deal with duplicates effectively.
1. How to identify duplicates
The first tool is one of the options on the Conditional Formatting feature.
As its name suggests, there are two components - a condition and a format.
Conditional formatting allows you to check the value of a cell and if a certain condition is true then apply a particular cell format.
If the value of a cell is greater than 50, format the cell with white text and a red fill.
When checking for duplicates the test is:
Does this cell have an identical twin?
So, here’s the process:
1. Highlight the cells that you wish to check. (In a table, this is normally one particular column).
2. On the Home tab, click the Conditional Formatting icon.
3. Choose Highlight Cells Rules.
4. Choose Duplicate Values.
5. Select how you would like to format any duplicate cells found.
Once this is done, any cell that has another cell with an identical value is highlighted.
2. How to remove duplicates
To remove duplicates from a single column:
1. Select the cell range (or the entire column if that’s easier)
2. Click the Data tab.
3. Select the Remove Duplicates icon.
4. Tick the box labelled My data has headers if your data has a header row, otherwise leave unticked.
5. Click OK.
3. How to group duplicates together
Highlighting and removing duplicates is easy for a single column of data, but it’s very rare that you have just one column. Normally, that data is part of a larger table and you need to remove the entire duplicate row.
On a larger table the first duplicate cell might be on row 5 and the next one on row 105 and the next one on row 505.
So is there an easy way to collect them all together in one place?
Good question. I’m glad you asked.
Yes, there is.
Excel allows you to sort by colour.
1. Highlight your duplicate cells (using conditional formatting) as described above. Here’s an example, where a duplicate Customer ID has been identified.
2. Next, click any cell in your table.
3. Select the Data tab.
4. Click the large Sort button (not the little AZ or ZA icons)
5. In the Sort By drop-down list, select the column that contains the highlighted duplicates.
6. Change Sort On to Cell Color.
7. Set the Order to the colour you used to highlight the duplicate cells.
8. Click OK.
The duplicate rows are now all grouped together at the top of the table. From here you can choose which rows to remove and delete them manually.
Note: When you have several duplicate items, they may not be paired together even though all they are all listed at the top of the table. To rectify this you can add a second sort level that sorts the data by value, smallest to largest.
4. How to check 2 or more columns for duplicates
Consider this example. Two columns have been selected and checked for duplicate values.
As you can see Bloggs has been highlighted 3 times - twice for Fred and once for Del.
Also, Smith has been flagged as duplicate, even though one is Joe and one is Fred.
And finally, Fred has been highlighted 3 times - twice for Bloggs and once for Smith.
The only 2 entries that should be flagged here are Fred Bloggs.
To get around this, we can join first name and last name into a single full name using CONCATENATE. In cell D3, write the formula:
=CONCATENATE(B3, " ", C3)
The CONCATENATE function joins together all the items listed inside the brackets. As you can see, the first name and last name have been separated by a space.
Now, simply check the Full Name column for duplicate values.
5. What next?
I hope this has opened up your eyes and given you a few good ideas.
What do you think?
I hope you found plenty of value in this post. I'd love to hear your biggest takeaway in the comments below together with any questions you may have.
Have a fantastic day.