PHP has a stat() function that returns an array containing the meta information of a file such as owner, size, time of last access, last modification or last change. It’s basically the stat command under Linux that returns and shows the file system meta information of any file or directory:

stat myfile.txt

Which returns:

  File: `myfile.txt'
  Size: 1707            Blocks: 8          IO Block: 4096   regular file
Device: 811h/2065d      Inode: 96909802    Links: 1
Access: (0644/-rw-r--r--)  Uid: (1354144/    voir)   Gid: (255747/pg940032)
Access: 2010-02-16 08:00:00.000000000 -0800
Modify: 2010-02-18 04:16:51.000000000 -0800
Change: 2010-02-18 04:16:51.000000000 -0800

To get the meta information of the current working directory:

stat .

Which returns:

  File: `.'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 811h/2065d      Inode: 96904945    Links: 4
Access: (0755/drwxr-xr-x)  Uid: (1354144/    voir)   Gid: (255747/pg940032)
Access: 2009-08-31 17:07:16.000000000 -0700
Modify: 2009-12-20 05:18:57.000000000 -0800
Change: 2009-12-20 05:18:57.000000000 -0800

{ Comments on this entry are closed }

Googlebot is the indexing program of Google that visits your site to fetch the content to determine your search engine rankings. With a popular website, tens of thousands of pages can be a problem in that Googlebot may visit more than you want because it’s expending your precious bandwidth and even crashing your server. Every crawler bot visit is no different than a user one and your site has to perform all necessary actions and logics to render the web page to the search engine crawler bot such as Googlebot including searching through a database with potentially millions of records which could take a while. Imagine, the Googlebot pays 300,000 visits a month to your site. That’d be a substantial expenditure in bandwidth and server computing resources.

For example, one of my sites have experienced 338,768 hits from just one IP of Googlebot last month. I have no idea why Googlebot is so fanatic about this site because it rarely have any backlinks and Google is neither sending any significant traffic. But one thing is for sure, that this site is creating some serious trouble on my hosting bills because of Gooblebot. While we cannot totally deny Googlebot from visiting our websites, we can do something to slow it down a little bit.

There are 2 things you can do:

  1. Visit Google Webmaster Central: http://www.google.com/webmaster, sign in to webmaster tools, (add if you haven’t, and) select the site in question, select Site configuration, select Settings, select Set custom crawl rate of Crawl rate and adjust the scroll bar to slow the crawl rate down.
  2. Create robots.txt and place it at the root of your website, put these 2 lines inside:
    User-agent: *
    Crawl-delay: 20

    Wherein the value of Crawl-delay is the time in seconds that the search engine bot should wait between requests. 20 is a very slow option. Most search engines such as Google wait less than 1 second to fetch a moderately popular website.

Some argue that Google never respects the Crawl-delay option of robots.txt and the only way to decrease the visiting frequency of Googlebot is to adjust the scroll bar in Google Webmaster Central.

{ Comments on this entry are closed }

Textarea and text input are common html form controls that accept text input. They can be a security challenge as they allow the user to enter anything they want. If you just go about using whatever data the user has entered, your application is anything but secure. Some sort of filtering / white-listing must be in place to protect the integrity of the application and you need to permit or allow only a few special HTML tags in the textarea control of the HTML forms.

The simplest way is to denounce any attempts to add HTML tags in the text box control is the PHP function strip_tags():

$all_tags_filtered = strip_tags($_POST['message']);

Wherein $_POST['message'] is the text just submitted by a user, containing all sorts of HTML tags. Thanks to the function strip_tags(), all the tags are now gone in $all_tags_filtered. The data in $all_tags_filtered is safe to use as it’s plain text.

However, there are times when you want to keep a few simple tags for the user’s convenience, such as <p>, <strong> and <em>. To do this, just feed a second parameter to the function strip_tags():

$some_tags_filtered = strip_tags($_POST['message'], '<p><strong><em>');

So <p> elements, <strong> elements and <em> elements are kept intact while all the other tags are gotten rid of in $some_tags_filtered.

One important thing to note is that strip_tags() does not check the attributes of the allowed HTML tags. The attributes of the allowed HTML elements such as style="" and onmouseover="" are kept as they are in the filtered results which may lead to other security problems. You have to use regular expressions to erase them out and block attached malicious attempts.

{ Comments on this entry are closed }

$7.49 GoDaddy .com renewal coupon code

by Yang Yang on February 12, 2010

Update: Here’s the latest coupon code of Godaddy – $1.49 / year .com

A quick deal for my readers. Found this coupon code of GoDaddy that enables you to renew .com domains at just $7.49:

Zine10

Just used it to renew tens of my .com names at $7.49 (plus ICANN fee $0.18) each. If you ever tried to look for one, you’d know it’s not easy to get a $7.49 .com renewal at GoDaddy now. Previously, the best code for .com renewal is this one for $7.39 each.

Otherwise, you can always find one or two working coupons of GoDaddy with which you can register new .com at $6.99.

{ Comments on this entry are closed }

I’d like to share with you some tips about hardening the database part of your application. Here are a few things you can do in protecting the databases from being compromised in security:

  1. Create separate users with ONLY necessary privileges (as few as possible) to connect to the database for common daily tasks. Never use the database owner / creator or even MySQL root user in your PHP scripts to perform routine tasks.
  2. Protect against SQL injection attacks by escaping ALL incoming input after ensuring data types with a variety of PHP variable type and character type validation functions.
  3. The sprintf() function is both useful and secure in constructing SQL queries because of the built-in type checking. Better yet, use PDO.
  4. Turn off error messages MySQL or PHP outputs when things go wrong so crackers know nothing about the technical details of your build such as database schema. As a matter of fact, a good rule of thumb in web application security is that the less people know about your application’s internal structure, the better.
  5. For advanced SQL developers, extra abstraction layer in SQL in the form of stored procedures can benefit security because you implement yet another depth of defense and hide the schema of the database from the outside world.
  6. For mission critical applications, it goes without saying that custom logging of database accesses can help a lot in identifying security risks.
  7. If the database contains very sensitive data such as credit card information, you are strongly recommended to encrypt these tables or fields. Just use PHP cryptography extensions such as Mcrypt to encrypt any data that are to be stored and decrypt them when they are being retrieved.

{ Comments on this entry are closed }

When you need to include or require a php file that is in the same directory as the currently running one, most people come up with this simple line in the current script:

include('include.php');

While this approach doesn’t present obvious breaches, it is slightly inefficient than the following way:

include(dirname(__FILE__).'/include.php');

You will type a little more but the extra code frees PHP from iterating through the include_path in the attempt to locate ‘include.php’ because dirname(__FILE__) has explicitly returned the absolute path to the file. The constant __FILE__ in PHP always returns the absolute path to the script file that’s currently active – the PHP file in which the code line is being run right now. Function dirname() returns the directory part of the given path.

A better approach would be:

include('./include.php');

Which explicitly commands PHP to look for the file ‘include.php’ in the current directory, yet comes without the overhead of the function dirname(). With large applications, you would prefer storing the path of the primary working directory of the application in some centralized configuration files:

define('APP_DIR', '/home/appuser/appdomain.com/app');

And when you need to include a file in the sub directory ‘class’:

include(APP_DIR.'/class/tobeincluded.php');

Thanks to Gumbo, alexef and Justin at Stack Overflow.

{ Comments on this entry are closed }

That is, to host all static content such as ready-made images, scripts, style sheets on a different domain rather than the primary one that hosts the page of the current URL. For example, if you intend to add static images to the web page located at http://www.example.com/page.html, you should not place the images on www.example.com, instead, put them somewhere else such as example-static.com or static.example.com.

The first reasoning for this is that browsers load web assets one by one or sequentially from a single host. They will not start requesting and downloading the next asset until they are finished with the previous one from the same domain. Therefore, doubling the hosts or domains can accelerate the downloading speed by about 100% because browsers can simultaneously download stuff from 2 different domains.

Another reasoning is that if cookie or session is enabled on your website, the browser would send the session cookie every time it makes a request to the domain, which sort of is useless because it’s static content – the server doesn’t need the cookie at all to serve static content such as images. It’s not only a waste of bandwidth but also a waste of communication time. To avoid this, serve all static content from a domain that is not cookie enabled. For instance, if you have set cookie with example.com, you can host all static content at static.example.com, however, if you have enabled cookie by *.example.com instead of just example.com, you will need to register a whole different domain to host the static content to steer clear of the useless overhead.

Not much for a small site, but this would be a major improvement regarding user experience for established, popular websites.

{ Comments on this entry are closed }

Assuming you’ve logged in as root in Debian 5.0, to install the Go programming language by Google,

  1. Add these environmental variables for Go in .bash_profile:
    export GOROOT=$HOME/go
    export GOARCH=386  # for 32 bit architectures. Use GOARCH=amd64 for 64 bit architectures
    export GOOS=linux
    export GOBIN=$HOME/bin
    PATH=$PATH:$GOBIN
    
  2. Install the Mercurial ‘hg’ command:
    aptitude install mercurial
  3. Fetch the sources of Go and put them at $GOROOT:
    hg clone -r release https://go.googlecode.com/hg/ $GOROOT
  4. Fetch compilers and related utilities:
    aptitude install bison gcc libc6-dev ed make
  5. Create the directory $HOME/bin by:
    mkdir $HOME/bin

    and compile Go:

    cd $GOROOT/src
    make all

Done. You can now go about writing your first hello world program. If you haven’t got a server yet, I recommend Linode VPS and Rackspace Cloud.

{ Comments on this entry are closed }

Rackspace Referral Discount

by Yang Yang on December 23, 2009

By Rackspace I mean Rackspace.com the managed dedicated hosting service, not RackspaceCloud.com the cloud hosting. I’ve been talking a lot lately about Rackspace Cloud so I thought I’d make it clear. 😉

Check them out, they are pretty much the most expensive host you can find on the web. Their most basic server costs over $400 a month with 2TB monthly bandwidth. It’s by all means not a generous offer but it’s probably the most reliable and responsive platform you will ever need to know. They have the best uptime across the entire industry. A profitable online business deserves an uptime of 100%, because every minute of down time costs you money – no matter how solid and perfect your website is, it’s nothing when it’s down.

Rackspace provides the best support to its clients. They are not just a hosting provider, they are the all-around IT company who you can trust your entire IT infrastructure with. They take over everything from creating to optimizing, and monitoring to troubleshooting. When something goes wrong, chances are they have actively found and fixed it even before you know it. Many of the top players use Rackspace for their hosting needs. WHT, the Internet’s No.1 web hosting forum, has been with them for years. In a word, Rackspace is the kind of hosting you need when every second of your business counts.

5% Referral Discount for You

Anyway, as I’m a partner of Rackspace, I can earn or give 5% discount off any of their managed dedicated server packages. Shoot me a message about your intention to host with them, so I can get you the 5% discount and ask them to contact you. I’m currently not relying on this to make a living so, it’s better to pass along the favor than letting my partnership with them sitting in dust.

As per FTC requirement, I’m not making any money off this at all. 😉

Cloud Hosting

If you find their dedicated service a little more privileged than you can afford, their cloud hosting may be a better deal. Use this promo code for Rackspace Cloud to get a discount off their cloud hosting plans.

{ Comments on this entry are closed }

Here’s a quick tip for those who have encountered this very same problem that all regular expression functions of PHP such as preg_match() and preg_replace() stop working when the input string (subject string to be searched or matched) is too long or large. If you believe your regular expressions should work but didn’t and the string to be searched is perhaps over 100kB in length, you have hit the match string length limit or PCRE’s backtracking limit set by configuration variable pcre.backtrack_limit.

To solve this issue and lift the length limit, to perhaps 10 times the original, you have to reset the default value of pcre.backtrack_limit in one of the following ways:

  1. If you are using cPanel, create a text file named php.ini and put it in the directory wherein you need to break the limit. Append this line in the file:
    pcre.backtrack_limit = 1000000
  2. If you operate your own dedicated / vps server, modify php.ini and put this line at the end of the file:
    pcre.backtrack_limit = 1000000
    Refer to this article to find out where your php.ini is.
  3. Use runtime configuration function ini_set() to set it at runtime:
    ini_set('pcre.backtrack_limit', 1000000)

This seems to be only affecting PHP 5.2.

{ Comments on this entry are closed }