MySQL: Find Duplicate Entry / Rows / Records

by Yang Yang on May 19, 2009

A quick tip – I kind of have the feeling that I’ve covered this before, but whatever – well, just the simple SQL query below will do:

CREATE TABLE temp SELECT id, COUNT(id) FROM `book` GROUP BY isbn, subject HAVING COUNT(id) > 1

Wherein each combination of column isbn and column subject in each of the rows from table book should be unique and you want to find out those duplicate ones. You can simply select without creating a temporary table to store the duplicates, but this would come handy when the book table has like 1 million records.

To do more by not just finding duplicate records but also extracting the unique ones, read this.

{ Comments on this entry are closed }

Today I’m optimizing some MySQL tables with large number of records – 1 million of them, yeah, I know – , it’s simply impossible to deal with such big chunks of data without proper indexing. So there I was, adding a variety of indexes to a few of the columns.

It’s expected it’d take a few hours to complete with such mega size tables, but what’s not expected is that after 2 days, the index adding seems to be still going on as all the tables are not responding to any queries. I can’t even select the first record. It takes *forever*, both from phpmyadmin and shell mysql.

What’s the catch? I’ve no idea. But let’s at least free those poor tables from choking first. So after dropping the index I was trying to add from mysql command line, all the tables returned normal.


DROP INDEX culprit ON bigtable

But the ultimate convict still eludes me. Any ideas? I still need to add those indexes.

{ Comments on this entry are closed }

PHP can execute shell commands, which means it can compress selected files into a archive or zipped package:

exec('tar zcf files.tar.gz file1 file2 file3');

Or make a zip file:

exec('zip file1 file2 file3');

You can then use PHP to render and supply download of the zip file to your visitor.

{ Comments on this entry are closed }

There are times when you need to store a file (such as one that you sell for profit) outside of the document root of your domain and let the buyers download it via a PHP script so as to hide the real path, web address or URL to that file. Use of this approach enables you to:

  1. Check for permissions first before rendering the file download thus protecting it from being downloaded by unprivileged visitors.
  2. Store the file outside of the web document directory of that domain – a good practice in web security in protecting sensitive and important data.
  3. Count the number of downloads and collect other useful download statistics.

Now the actual tip. Given that you have put the file to be downloaded via the PHP script in place at /home/someuser/products/data.tar.gz, write a PHP file with the following content in it and put it in the web document directory where your site visitors can access:

$path = '/home/someuser/products/data.tar.gz'; // the file made available for download via this PHP file
$mm_type="application/octet-stream"; // modify accordingly to the file type of $path, but in most cases no need to do so

header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: public");
header("Content-Description: File Transfer");
header("Content-Type: " . $mm_type);
header("Content-Length: " .(string)(filesize($path)) );
header('Content-Disposition: attachment; filename="'.basename($path).'"');
header("Content-Transfer-Encoding: binary\n");

readfile($path); // outputs the content of the file


Now your site visitors can and can only download the protected file via the PHP script.

{ Comments on this entry are closed }

Well, MySQL doesn’t actually have anything called Foreign Key, it’s just my way of saying a alien column from another table.

Let’s say 2 tables are related, 1 is books and the other is subjects. Any book in book table falls into one of the subjects, thus having a subject column referencing the ID of one of the subjects in subject table.

You need to know the number of books from each of the subjects, namely, 243 books in subject Education, 77 books in Science, etc.

Now how do we do this? Simple. Just perform the following SQL query:

SELECT, COUNT( FROM subject, book GROUP BY book.subject HAVING book.subject =

{ Comments on this entry are closed }

As MySQL doesn’t have inherent support for updating more than one rows or records with a single update query as it does for insert query, in a situation which needs us to perform updating to tens of thousands or even millions of records, one update query for each row seems to be too much.

Reducing the number of SQL database queries is the top tip for optimizing SQL applications.

So, is there any other way around with just a few MySQL queries that  equals to millions of single row update queries?

Yes, but you need a temporary table.

Step 1: Create a temporary table

This table should have 2 columns: 1) an ID column that references the original record’s primary key in the original table, 2) the column containing the new value to be updated with.

Creating this table, you can use insert queries as it conveniently inserts more than one rows at a time (with a single query) – actually, as many rows as you need, like 1000 a time.

Step 2: Transfer the new values from the temporary table to the intended table

Use the query below:

UPDATE original_table, temp_table SET original_table.update_column = temp_table.update_column WHERE = temp_table.original_id

To transfer all the new values you have in the temporary table to the original table. After this, you have successfully updated the original table with much much less than a million queries, probably just a hundred or so.

UPDATE =============================================

So here’s perhaps a better idea in most of the cases, use the CASE WHEN ELSE END syntax of MySQL to update multiple rows in the options table of your web application:

UPDATE `options` SET `value` = CASE `name`
WHEN 'site_name' THEN 'My Blog'
WHEN 'site_url' THEN ''
WHEN 'site_email' THEN ''
ELSE `value`

Make sure you include the ELSE `value` because if you don’t, the other rows not mentioned here will be updated to have empty values.

Update multiple records in a table by `ID` =============================================

Below is an example of updating more than one rows in a table by the id field. The field_name field is updated to the corresponding value: value_1, value_2, or value_3 if the id equals to 1, 2, or 3:

UPDATE `table_name` SET `field_name` = CASE `id`
WHEN '1' THEN 'value_1'
WHEN '2' THEN 'value_2'
WHEN '3' THEN 'value_3'
ELSE `field_name`

Yet another way =============================================

There’s also another way of doing this:


{ Comments on this entry are closed }

To find / identify a part of an original string and replace it with another string in a SQL query with MySQL, use REPLACE(), the string replace function.

For example, you may want to replace all spaces in one of the varchar or text fields to a single dash:

UPDATE `table` SET `old_field` = REPLACE(`old_field`, ' ', '-')

Or, you create a new column / field out of the old one and:

UPDATE `table` SET `new_field` = REPLACE(`old_field`, ' ', '-')

{ Comments on this entry are closed }

One of the principles of data integrity is to reduce data redundancy as much as possible and wipe out any duplicate database table entries. Another reason for this is that you need to add unique indexing to one or a group of table columns and you need the one column or combination of columns to be unique through out all the rows of the table.

There are basically 2 ways to achieve that:

  1. Identifying duplicate entries, leaving just one of them and deleting the rest.
  2. Identifying only unique table rows and collect them into a new table.

I’ll just go ahead with the 2nd approach.

There are in turn 2 ways to accomplish this task, namely SELECT DISTINCT … FROM … and SELECT … FROM GROUP BY


Rather intuitively, this SQL query selects all distinct combination of columns from a table and leaving out duplicate ones:

CREATE TABLE new_orders SELECT DISTINCT receipt, quantity, price FROM orders

One thing to note is that you have to leave out the original primary key such as ID so that all rows are only compared for uniqueness on necessary data fields. It is after this creation query is successfully completed when you add a primary key indexing column to the new_orders table.


GROUP BY clause is usually used for statistical functions of MySQL such as COUNT() or AVG() to only take a distinguished group of columns into calculation. In this situation, however, it comes handy that it inherently excludes multiple table records with the same value in the specified column and includes just one of them in the returned results.

For example, if you need to include only those records from the old table orders that’s with distinct values in column receipt:

CREATE TABLE new_orders SELECT * FROM orders GROUP BY receipt

Now all the records in the new table new_orders are with distinctly unique receipt values.

Depending on whether the original records are referenced by other tables via ID, you can regenerate the ID or leave them in the new table new_orders.

{ Comments on this entry are closed }

I joined Dynadot at the commencement of .tel domains and tried my luck to get some domainers to register a domain with them.

Maybe it’s because they just don’t care about expanding via affiliates, the system might be a little buggy or not so affiliate friendly in that once the referred friend signed up an account with them, even if they have clicked through your affiliate link to register the domains at Dynadot, the newly registered domain and its commission will not be credited to you at all.

Therefore, make sure you send your friend or site visitors the affiliate link before they register the account with Dynadot. So, basically, they should register their account through your affiliate link. The other way won’t work for sure, but still I’m not sure if it works in this way.

Feels a little fishy with Dynadot domain registration affiliate program and totally not worth the while.

{ Comments on this entry are closed }

Amazon Affiliate system does a great job presenting you a variety of ways to make associate affiliate links so that you can take credit of their products that you have referred to others. Today I messed around a little bit with their generators and finally come to a simple solution to generate affiliate links dynamically and simply with server side programming.

The simplest way to link to an Amazon product and take the referral credit is to use this link:

Wherein 0029724600 is the ASIN (equally the 10-digit ISBN) of the product (books!) and maawe-20 is my amazon affiliate tracking ID.

Or this:

These would come quite handy when you build a book site with thousands of theme books, possibly on the same subject, and, you have a database of the books that contain the exact ISBN number for each of the book records. Now that you know amazon affiliate links can be built this way, it’s nothing more than a breeze to just insert the 10-digit ISBN number as ASIN into the URL template with your own affiliate ID.

{ Comments on this entry are closed }