Ad image

Ad image

How to delete duplicate in SQL? Explained with Syntax and Examples

Published June 26, 2023

8 Min Read

Database Programming using SQL

Table of Contents

Structured Query Language (SQL) is a popular programming dialect for directing and manipulating database data. One prevailing issue database administrators face is handling duplicate rows in their database tables.

Duplicate rows can cause problems such as data inconsistency and lowered efficiency. In this article, we will examine how to delete duplicate rows in SQL and specify instances of syntax for various scenarios.

Identifying Duplicate Rows

Before deleting duplicate rows in SQL code projects, we need to know where they are. It’s important to find them first. That’s an extensive process that requires an adept approach. In SQL, we take the GROUP BY clause to identify and put together rows in a group. This grouping is done based on the values in singular or multiple columns.

- Advertisement -

Ad image

Ad image

After this, the HAVING command clause can be put to use to sort the groups and return only those groups that have a plural number of rows.

Here is an instance:

“`

SELECT column1, column2,

COUNT(*)

FROM table_name

GROUP BY column1, column2

HAVING COUNT(*) > 1;

“`

In this scenario, we select the contents of the “table_name” table’s columns 1 and 2. By making use of the ‘COUNT’ function, we then group the rows depending on the values in column 1 and proceed to delete duplicate rows in the SQL server.

To only return groups with more than one row, we filter the groups using the ‘HAVING’ specification. Based on the values in columns 1 and 2, the outcome of this query will be a list of all the duplicated rows. Then you can proceed to delete duplicate rows in SQL.

Deleting Duplicate Rows

Once we have identified the duplicate rows, we can delete duplicate rows in SQL using the `DELETE` assertion. There are various ways to erase duplicate rows in SQL, contingent upon the scenario.

In the following departments, we will examine some prevalent synopsizes and provide instances of syntax explaining how to delete duplicate rows in SQL.

Deleting Duplicate Rows Based on a Single Column

We can use the ‘ROW_NUMBER’ function to assign a unique number to each row in the table if the duplicate rows are situated in varied columns.

To illustrate:

“`

WITH CTE AS

( SELECT column1, column2,

ROW_NUMBER() OVER (PARTITION

BY column1 ORDER BY column2) AS

RowNumber

FROM table_name

)

DELETE FROM CTE

WHERE RowNumber > 1;

“`

The values of column 1, column 2, and a row number are taken from the table titled “table_name” in this example using an ordinary table expression (CTE). Each row is given a distinct row number using the ‘ROW_NUMBER’ function, divided by column 1 values and organized by column 2.

As a result, we will delete duplicate rows in SQL from the CTE except for the one with the lowest row number using the ‘DELETE’ declaration. A table based on column 1 and without duplicate rows will be the outcome of this query.

You can check out this video to have a better understanding of the subject: https://www.youtube.com/embed/KBQQFjduFag

Deleting Duplicate Rows based on Multiple Columns

We can apply a related strategy similar to the one described before, but with a slightly different pattern, if the duplicate rows are based on diversified columns. These are the quintessential steps that have to be taken into account while processing ahead to delete duplicate rows in SQL. The SELECT assertion can be used before column 1 and column 2. Here’s an illustration:

“`

WITH CTE AS (

SELECT column1, column2,

column3, ROW_NUMBER() OVER

(PARTITION BY column1, column2

ORDER BY column3) AS RowNumber

FROM table_name

)

DELETE FROM CTE

WHERE RowNumber > 1;

“`

In this case, we are utilizing a CTE to select the values of column1, column2, column3, and a row number from the table chosen `table_name.` This demonstrates how to delete duplicate rows in SQL with multiple columns.

The result concerning this query will be a table with no duplicate rows based on column 1 and column 2.

Deleting Duplicate Rows based on All Columns

If the duplicate rows are based on all columns in the table, we can use the `DISTINCT` keyword to select only the distinct rows and thus use the `INTO` keyword to establish a new table with the distinct rows. This is a substantial process to delete duplicate rows in SQL code projects. Here is a model:

“`

SELECT DISTINCT *

NTO new_table_name

FROM table_name;

DROP TABLE table_name;

EXEC sp_rename ‘new_table_name’,

‘Table_name’;

“`

Using the ‘DISTINCT’ keyword in this scenario, we are just picking certain rows from the table titled ‘table_name’, and then using the ‘INTO’ keyword, we are forming a fresh table titled ‘new_table_name’ with the selected rows.

This can be kept in mind while deleting duplicate rows in SQL projects. The authentic table is then deleted using the ‘DROP TABLE’ statement, and the freshly created table is given the original table name using the stored procedure’sp_rename’.

The result of this arrangement of queries will be a table with no duplicate rows. This is how the extra row can be removed from the SQL code.

SQL developers are in such demand stats suggest 64% possibility of them working in the public sector compared to the private.

To conclude

To delete duplicate rows in SQL server is a common task that database administrators can face. We can identify duplicate rows using the GROUP BY and HAVING specifications and delete them by adopting the DELETE statement with the ROW_NUMBER function and a CTE.

This will help in identifying the duplicate rows, if present, in the code and eliminate them.

The syntax we employ to delete duplicate rows in SQL may vary depending on the synopsis. To avoid data inconsistencies and boost performance, we may make sure that our database tables don’t contain any duplicate rows according to these recommendations. This can greatly help reduce issues such as data inconsistency and lowered efficiency.

Share this Article