One of the principles of data integrity is to reduce data redundancy as much as possible and wipe out any duplicate database table entries. Another reason for this is that you need to add unique indexing to one or a group of table columns and you need the one column or combination of columns to be unique through out all the rows of the table.
There are basically 2 ways to achieve that:
- Identifying duplicate entries, leaving just one of them and deleting the rest.
- Identifying only unique table rows and collect them into a new table.
I’ll just go ahead with the 2nd approach.
There are in turn 2 ways to accomplish this task, namely SELECT DISTINCT … FROM … and SELECT … FROM GROUP BY …
SELECT DISTINCT … FROM …
Rather intuitively, this SQL query selects all distinct combination of columns from a table and leaving out duplicate ones:
CREATE TABLE new_orders SELECT DISTINCT receipt, quantity, price FROM orders
One thing to note is that you have to leave out the original primary key such as ID so that all rows are only compared for uniqueness on necessary data fields. It is after this creation query is successfully completed when you add a primary key indexing column to the new_orders table.
SELECT … FROM GROUP BY …
GROUP BY clause is usually used for statistical functions of MySQL such as COUNT() or AVG() to only take a distinguished group of columns into calculation. In this situation, however, it comes handy that it inherently excludes multiple table records with the same value in the specified column and includes just one of them in the returned results.
For example, if you need to include only those records from the old table orders that’s with distinct values in column receipt:
CREATE TABLE new_orders SELECT * FROM orders GROUP BY receipt
Now all the records in the new table new_orders are with distinctly unique receipt values.
Depending on whether the original records are referenced by other tables via ID, you can regenerate the ID or leave them in the new table new_orders.