Naturally, using regular expressions for SELECT queries to check if certain text strings or text patterns are residing somewhere in large chunks of data is the most resource-intensive option and thus your last option. However it’s somehow unavoidable to practice regular expressions in the SQL queries for complicated patterns. For example, word boundaries are a common reason why you want regular expressions in SQL:
SELECT * FROM articles WHERE content REGEXP '[[:<:]]MySQL efficiency[[:>:]]'
This query searches for any entry in table articles that has a ‘MySQL efficiency’ phrase in the content field. In a table as large as 100,000 rows, it’d definitely take more than 0.5 seconds which is a rather outrageous length and would very probably take down the server if large volumes of searches flood in.
So, let’s consider the LIKE clause:
SELECT * FROM articles WHERE content LIKE '%MySQL efficiency%'
Much better, the query time is instantly reduced to 0.01 seconds or so. The problem is % is simply not the right symbol for a word boundary because it matches:
- theMySQL efficiency-
In this case of ‘MySQL efficiency’, there won’t be much of a problem because the chance of ‘theMySQL efficiency’ to appear is rather slim. However, consider the case of ‘War’, you can’t use LIKE clause for it because it’d also match ‘Edward’ and so forth. You get the idea, using LIKE alone in this manner is incorrect.
Eventually, we’ve got something good to talk about. This solution can address both of the problems, that is, to combine LIKE and REGEXP / RLIKE together in the query.
SELECT * FROM articles WHERE content LIKE '%MySQL efficiency%' AND content REGEXP '[[:<:]]MySQL efficiency[[:>:]]'
This way, not only the query time is reduced to approximately 0.025 seconds or less because of the LIKE clause, but the phrase can also be actually matched to the real thing thanks to the REGEXP clause.
Probably in that MySQL will first process the LIKE clause and then pass the filtered results to be further processed by the REGEXP clause. Though the REGEXP clause is much more resource-intensive, as there’s a lot less to be processed after being helped by the LIKE clause, the query time is considerably reduced.