Linux: How to ‘find’ and search ONLY text files?

by Yang Yang on January 22, 2011

The ‘find’ command in Linux systems searches through a directory and return files that satisfy certain criteria. For instance, to find a file that contains the string ‘needle text’ in the ‘mydocs’ directory:

find mydocs -type f -exec grep -l "needle text" {} \;

The problem of this approach is that it would search through ALL files in this directory including the binary ones such as images, executables and zip packages. Sensibly, we would only want to search through text files for a specific string. If there are far too many of binary files in present, it’d be a significant waste of CPU usage and time to get what you want, because it’s totally unnecessary to go through the binary files.

To achieve this, use this version of the above command:

find mydocs -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text | cut -d ':' -f1

I asked the question at stackoverflow.com and peoro came up with this solution. It works great.

Basically, the bold part checks each file’s mime type information and only searches the files that have ‘text’ in its mime type description. According to the Linux ‘file’ command manual, we can be fairly sure that files with ‘text’ in its mime type string are text files AND all text files have ‘text’ in its mime type description string.

Thus far the best way to do this

find -type f -exec grep -Il . {} \;

Or for a particular needle text:

find -type f -exec grep -Il "needle text" {} \;

The -I option to grep tells it to immediately ignore binary files and the . option along with the -l will make it immediately match text files so it goes very fast.

hakan August 3, 2012 at 6:49 am

thanks a lot

Martin March 29, 2013 at 8:44 pm

grep has an option -I which is what you are looking for
-I Process a binary file as if it did not contain matching data; this is equivalent to the –binary-files=without-match option.

Comments on this entry are closed.

{ 3 trackbacks }

Previous post:

Next post: