Here's a quick way to strip non-printable characters in PHP. This is pretty handy for cleaning data before putting it in a DB.
$val = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', ' ', $val)
Being a Perl compatible regexp, you can use it in your language of choice so long as it supports PCRE.
UPDATE:
Steve Laniel reminded me that you can use a Posix regexp to do roughly the same thing:
$val = preg_replace( '/[^[:print:]]/', '', $val )
This is a lot simpler for most cases; although, the patterns are slightly different (Posix is \x20-\x7E).
Subscribe to:
Post Comments (Atom)
2 comments:
Doesn't the [:print:] Unicode character class capture the printable characters? So then the nonprintable ones would be [^[:print:]], and you could delete them with
$val = preg_replace( '/[^[:print:]]/', '', $val )
No?
Good point... I guess I'm a sucker for doing things the hard way. I'll update the tip because this is definitely an easier approach. Thanks!
Post a Comment