Categories
PHP Tips & Tutorials Regular Expression Tips & Tutorials WordPress How To

PHP preg_match() First Letter of a WordPress Post for Drop Cap Styling

Drop Cap ExampleWhile CSS3 can target the first letter of text inside an element, it’s still not universally supported across major browsers AND it doesn’t work well for elements that have child elements inside. The bullet proof way to target the first letter of a WordPress post would be to capture the content of the post in WordPress theme and match it by regular expression functions in PHP, such as preg_match().

And here’s the code I’ve come up for this job:

ob_start();
the_content();
$content = ob_get_clean();
$content = preg_replace('@<p>\s*((?:<[^<>]+>\s*)*)([^<>\s])@'
, '<p>$1<span class="drop_cap">$2</span>'
, $content
, 1);

echo $content;

Obviously, this code should reside in single.php of your WordPress theme where the content of the posts is being output. Just replace any “the_content()” function in the post area with this snippet.

The key is the regular expressions:

@<p>\s*((?:<[^<>]+>\s*)*)([^<>\s])@

And:

<p>$1<span class="drop_cap">$2</span>

That finds the 1st non-whitespace, printable character (a letter, a numeral, etc.) of a post and adds a surrounding <span> tag with class “drop_cap” to it.

Now you will add some drop cap styles to style.css as class .drop_cap, and the first letter of your posts will have a nice drop cap style. See this blog Mosso Reviews for example.

Categories
Regular Expression Tips & Tutorials

A small mistake in a regular expression caused connection to reset – (.+)+

Was doing something with a regular expression and very oddly the connection keeps being reset every time I refresh the web page.

I tried to narrow down the problematic line by removing the code in functional chunks. Finally it comes down to a preg_match() instance with a small bit in the regular expression that’s accidentally and wrongly typed in caught my attention:

(.+)+

Got rid of the second plus sign:

(.+)

And it’s all right.

Categories
PHP Tips & Tutorials Regular Expression Tips & Tutorials

Regular Expression for Date and Time Strings

Often we need the users to enter a valid string of date or time in the form. But how do you validate the strings with regular expressions? In PHP, you can use these functions and regular expressions.

RegExp and function to validate against date string:

// Default: YYYY-MM-DD
function isDate($subject, $separator = '-') {
	return preg_match('@^\d{4}'.$separator.'(0[1-9]|1[0-2])'.$separator.'(0[1-9]|1[0-9]|2[0-9]|3[0-1])[email protected]', $subject);
}

RegExp and function to validate against time string:

// Default: HH:MM:SS
function isTime($subject, $separator = ':') {
	return preg_match('@^(0[1-9]|1[0-9]|2[0-4])'.$separator.'(0[1-9]|[1-5][0-9])'.$separator.'(0[1-9]|[1-5][0-9])[email protected]', $subject);
}

If you need to validate against a different format, just change the $separator.

Now that you have the functions to validate date and time, you can combine them to verify date time strings such as 2016-04-30 18:19:05:

function isDateTime($subject) {
	$subject_array = explode(' ', $subject);
	if (count($subject_array) == 2) {
		return isDate($subject_array[0]) && :isTime($subject_array[1]) || $subject == '0000-00-00 00:00:00';
	}
	return false;
}

At Form Kid, these are functions I use for fields that need validation of the date and time.

Categories
PHP Tips & Tutorials Regular Expression Tips & Tutorials

Regular Expressions for Natural Numbers or Positive Integers (1, 2, 3, …), Negative Integers and Non-negative Integers

When I’m developing the online form creator that enables the users to create form fields that accept only certain type of numbers, I need to verify if a given string is a valid natural number such as 1, 2, 3, 4, …. I’m writing the code / functions in PHP but you can literally use the regular expression in other programming languages as well. I use the following function to distinguish strings if they are natural numbers or positive integers.

function isNaturalNumber($subject) {
	return preg_match('|^[1-9][0-9]*$|', $subject);
}

You can add for a leading plus sign as well:

^+?[1-9][0-9]*$

Regular Expression for Negative Integers?

Negative integers are –1, –2, –3, …. Just add a minus sign before the regular expression for positive integers:

^-[1-9][0-9]*$

Regular Expression for Non-negative Integers?

That is, 0, 1, 2, 3, 4, …. By a little help of the isNaturalNumber function, you can use this function to check if a string is a legal non-negative integer:

function isNonNegativeInteger($subject) {
	// @^(0|[1-9][0-9]*)[email protected]
	if ($subject == '0' || isNaturalNumber($subject)) {
		return true;
	}
}

Or if you insist on using a regular expression:

function isNonNegativeInteger($subject) {
	return preg_match('@^(0|[1-9][0-9]*)$@', $subject);
}

PHP functions to check if a string is a valid integer?

Just use the above functions in combination or the native is_integer() function of PHP.

function isInteger() {
	return isNegativeInteger($subject) || isNonNegativeInteger($subject);
}
Categories
Information Security PHP Tips & Tutorials Regular Expression Tips & Tutorials

PHP: Check or Validate URL and Email Addresses – an Easier Way than Regular Expressions, the filter_var() Function

To check if a URL or an email address is valid, the common solution is regular expressions. For instance, to validate an email address in PHP, I would use:

if (preg_match('|^[A-Z0-9._%+-][email protected][A-Z0-9.-]+\.[A-Z]{2,4}$|i', $email)) {
	// $email is valid
}

A simpler and more forgiving one would be:

|^\[email protected]\S+\.\S+$|

Which is usually quite enough for signup forms in preventing stupid typo errors. You get to validate the email by a validation link sent to the address anyway, as a final call whether the address is valid or not. For those who are obsessively curious, this may serve you well.

For URL, you can use this one:

|^\S+://\S+\.\S+.+$|

Or you can use one that is insanely detailed in addressing what a valid URL should be.

The filter_var() function of PHP5

What we are talking about here really is the filter_var() function of PHP5 that simplifies the URL and email validation by a large degree. To validate an email:

if (filter_var($email, FILTER_VALIDATE_EMAIL) !== false) {
	// $email contains a valid email
}

To validate a URL:

if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
	// $url contains a valid URL
}

While filter_var() is meant to return the filtered results of the input according to the filter type specified, such as FILTER_VALIDATE_EMAIL or FILTER_VALIDATE_URL, you can generally use it to see if a valid email or a valid URL can be extracted from something. Better yet, filter and get the results first, use the result if it is good or abandon it when it is false:

$filtered_email = filter_var($email, FILTER_VALIDATE_EMAIL);
if ($filtered_email !== false) {
	// $filtered_email is the valid email got out of $email
} else {
	// nothing valid can be found in $email
}

Same applies to FILTER_VALIDATE_URL. Here’s a full list of filter types of filter_var() you can take advantage of.

Categories
PHP Tips & Tutorials Regular Expression Tips & Tutorials

PHP: Subject String Length Limit of Regular Expression Matching Functions

Here’s a quick tip for those who have encountered this very same problem that all regular expression functions of PHP such as preg_match() and preg_replace() stop working when the input string (subject string to be searched or matched) is too long or large. If you believe your regular expressions should work but didn’t and the string to be searched is perhaps over 100kB in length, you have hit the match string length limit or PCRE’s backtracking limit set by configuration variable pcre.backtrack_limit.

To solve this issue and lift the length limit, to perhaps 10 times the original, you have to reset the default value of pcre.backtrack_limit in one of the following ways:

  1. If you are using cPanel, create a text file named php.ini and put it in the directory wherein you need to break the limit. Append this line in the file:
    pcre.backtrack_limit = 1000000
  2. If you operate your own dedicated / vps server, modify php.ini and put this line at the end of the file:
    pcre.backtrack_limit = 1000000
    Refer to this article to find out where your php.ini is.
  3. Use runtime configuration function ini_set() to set it at runtime:
    ini_set('pcre.backtrack_limit', 1000000)

This seems to be only affecting PHP 5.2.

Categories
PHP Tips & Tutorials Regular Expression Tips & Tutorials WordPress How To

PHP: Generating Summary Abstract from A Text or HTML String, Limiting by Words or Sentences

On index or transitional pages, such as homepage or category pages of WordPress, you don’t want to show the full texts of your deep content pages yet but just a content snippet of the first few sentences or words as a summary with a read more link to the actual article.

This is generally good in terms of SEO as it reduces duplicate content on your site and increases page views. With WordPress you can simply achieve this by using a plugin named Evermore. However, with a home made CMS to select and display content abstracts, you will have to code a little bit on your own.

While you may be better off doing this with a plain SQL which I’m not an expert in, I’ll let in a little trick of PHP to accomplish the same task here.

Full HTML Text
$text = <<<TEXT    
I wrote a <a href="#">blog post</a> yesterday about Chinese web design fonts. What did you think? It appeared that many are very interested. I guess it's the language barriers and cultural differences that make the westerners eager to know more about us. All right then, let me write more about that and maybe start a <strong>brand new domain</strong> for it. Stay tuned!
TEXT;
The Problem – select first sentences

Select and display the first 3 sentences (max) of the full HTML text above.

The Solution
<?php
preg_match('/^([^.!?]*[\.!?]+){0,3}/', strip_tags($text), $abstract);
echo $abstract[0];
?>

Output:

I wrote a blog post yesterday about Chinese web design fonts. What did you think? It appeared that many are very interested.

Stripping out HTML tags for the summary is to prevent it from producing invalid HTML snippets as it’s possible that the process slices HTML elements in half, leaving just part of the tag or only the beginning tag there. However, you can always preserve tags in the abstract, with a slightly more sophisticated algorithm of course.

Another Problem – select first words

You want to distill an abstract of the first 30 words instead of sentences concluded by period punctuations such as ‘.’, ‘!’ and ‘?’.

The Solution

Simply modify the regular expression to:

/^([^.!?\s]*[\.!?\s]+){0,30}/

Output:

I wrote a blog post yesterday about Chinese web design fonts. What did you think? It appeared that many are very interested. I guess it's the language barriers and cultural

There’s an incomplete sentence so you may want to add a trailing of ‘…’ at the end to denote the abstract nature.

In regular expressions, \s stands for all sorts of white spaces including single-byte space, tab and new line.