Validate and sanitize data with PHP Filter, part 1
When coding websites or web applications, working on security is important. To prevent all kinds of injections (XSS, SQL, CSRF…), you have to check all data coming from a foreign source. The typical example is when a user send data to your server from an HTML form. But it’s also the case when getting your visitors’ HTTP REFERER or USER AGENT, a value from a cookie or calling an API.
The basic rule is to never trust data which don’t come from your own code.
To help you keep your applications safe, since its 5.2 version, PHP provides the Filter extension which supplies a couple of handy functions to validate and sanitize data.
The two main functions are filter_var and filter_input. The first simply filters a specified variable with the given filter whereas the second is able to directly target an external variable (such as a $_POST or $_GET key) and to return it, possibly after having filtered it.
Both come with the same list of filters allowing you to validate or sanitize data.
In this first part we’re going to overview the validate filters. The sanitize ones will be exposed in a second part post.
Validate basic values
So, how it works ?
PHP has defined a list of constants which contain the ID (an integer) of all the available filters. Then, just specified the value to validate and the filter ID to the filter_var() function and you’re done.
It’s important to understand that filters don’t act like the PHP is_*() functions. They don’t check the type of the specified value but return the value itself or false if this one is not validate.
Plus, they come with some extra features that are very useful.
Validate an integer
For example, integer values may be validated with the constant FILTER_VALIDATE_INT :
$var = 1; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : int(1) $var = '1024'; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : int(1024) $var = 'abc'; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : bool(false) $var = '42x'; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : bool(false)
Octal and hexadecimal value may also be checked :
$var = 0755; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : int(493) $var = 0xFF; $result = filter_var($var, FILTER_VALIDATE_INT); var_dump($result); // output : int(255)
After defining two simple options, you may also be more restrictive and specified an allowed range of values :
$var = 5;
$result = filter_var(
$var,
FILTER_VALIDATE_INT,
array(
'options' => array(
'min_range' => 1,
'max_range' => 10
)
)
);
var_dump($result); // output : int(5)
$var = 0x1E;
$result = filter_var(
$var,
FILTER_VALIDATE_INT,
array(
'options' => array(
'min_range' => 1,
'max_range' => 100
)
)
);
var_dump($result); // output : int(30)
Validate a float
Float numbers have also their validator, FILTER_VALIDATE_FLOAT :
$var = 3.14; $result = filter_var($var, FILTER_VALIDATE_FLOAT); var_dump($result); // output : float(3.14) $var = '3.14'; $result = filter_var($var, FILTER_VALIDATE_FLOAT); var_dump($result); // output : foat(3.14) $var = '10'; $result = filter_var($var, FILTER_VALIDATE_FLOAT); var_dump($result); // output : float(10) $var = 'Not a float'; $result = filter_var($var, FILTER_VALIDATE_FLOAT); var_dump($result); // output : bool(false)
The decimal separator may be defined with an option :
$options = array('decimal' => ',');
$var = '3,14';
$result = filter_var(
$var,
FILTER_VALIDATE_FLOAT,
array(array('options' => $options))
);
var_dump($result); // output : float(3.14)
Validate a boolean
The FILTER_VALIDATE_BOOLEAN allows to check not only a boolean but more generaly a truthy or falsy value.
$boolean = filter_var($var, FILTER_VALIDATE_BOOLEAN);
The code above will return true if $var has one of the following value :
- true
- 1
- ’1′
- ‘yes’
- ‘on’
- ‘true’
Any others value will result in a false, unless you specify the FILTER_NULL_ON_FAILURE flag in a third argument :
$boolean = filter_var($var, FILTER_VALIDATE_BOOLEAN, array('flags' => FILTER_NULL_ON_FAILURE));
// The following works too
$boolean = filter_var($var, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
Then false will be return if $var is equal to :
- false
- NULL
- 0
- ’0′
- ‘false’
- ‘off’
- ”
Otherwise NULL will be return.
Validate more complex values
The three following validators will make you save some times by avoiding the use of complex regexp to validate URL, email and IP addresses.
Validate an URL
Don’t misunderstand, this validator doesn’t check if an URL exsits or not, it just validates the syntax of the value :
$var = 'http://domain.tld'; $result = filter_var($var, FILTER_VALIDATE_URL); var_dump($result); // output : string(17) "http://domain.tld" $var = 'http://domain'; $result = filter_var($var, FILTER_VALIDATE_URL); var_dump($result); // output : string(13) "http://domain" $var = 'http:domain'; $result = filter_var($var, FILTER_VALIDATE_URL); var_dump($result); // output : bool(false)
If you want to force the URL to have a path or a query string, you may add some flags :
$var = 'http://domain.tld'; $result = filter_var($var, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); var_dump($result); // output : bool(false) $var = 'http://domain.tld/'; $result = filter_var($var, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); var_dump($result); // output : string(18) "http://domain.tld/" $var = 'http://domain.tld/path/'; $result = filter_var($var, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); var_dump($result); // output : string(23) "http://domain.tld/path/" $var = 'http://domain.tld/?id=1'; $result = filter_var($var, FILTER_VALIDATE_URL, FILTER_FLAG_QUERY_REQUIRED); var_dump($result); // output : string(23) "http://domain.tld/?id=1"
Validate an email
As the URL validator, FILTER_VALIDATE_EMAIL allow to check an email address without using a single regexp :
$var = 'email@domain.tld'; $result = filter_var($var, FILTER_VALIDATE_EMAIL); var_dump($result); // output : string(16) "email@domain.tld" $var = 'email@domain'; $result = filter_var($var, FILTER_VALIDATE_EMAIL); var_dump($result); // output : bool(false)
Validate an IP address
FILTER_VALIDATE_IP is also very handy, allowing you to validate IPv4, IPv6 addresses and check if they are not in a private range :
$var = '10.10.1.127'; $result = filter_var($var, FILTER_VALIDATE_IP); var_dump($result); // output : string(11) "10.10.1.127" $var = '2001:0db8:0000:85a3:0000:0000:ac1f:8001'; $result = filter_var($var, FILTER_VALIDATE_IP); var_dump($result); // output : string(39) "2001:0db8:0000:85a3:0000:0000:ac1f:8001" $var = '2001:0db8:0000:85a3:0000:0000:ac1f:8001'; $result = filter_var($var, FILTER_VALIDATE_IP, FILTER_FLAG_IPV4); var_dump($result); // output : bool(false) $var = '192.168.0.2'; $result = filter_var($var, FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE); var_dump($result); // output : bool(false)
Validate inputs with filter_input()
Idealy, your application should never work directly with $_GET, $_POST or $_COOKIE array. The only lines of code that may contain them should be those that do the filtering.
So, let’s see how to use the validate filters to directly interact with input data.
Suppose you have built a newsletter system on your website. Then, you have made an HTML form with a text field allowing your visitors to subscribe by keying their email address. When an email address is submitted, your script has to check that it is valid before store it in the database.
If you use to do something like the following :
if (preg_match('\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b', $_POST['email']) > 0) {
$some_sql_query = "INSERT INTO newsletter (email) VALUES (' {$_POST['email']}')";
$result = mysql_query($some_sql_query);
// ...
}
It’s a bad practice, you should never rely on super global variable. This way you avoid configuration differences and PHP evolution problems. Moreover, the use of an regexp to validate the email address isn’t really safe, unless you are absolutely sure of what it does and that it respects the syntax defined in RFC 5322…
A better practice would be to use PHP built-in filter :
$email = filter_input(INPUT_POST, 'email', FILTER_VALIDATE_EMAIL);
if ($email !== false && $email !== NULL){
$some_sql_query = "INSERT INTO newsletter (email) VALUES ('$email')";
// Do what you want
}
Much simple, isn’t it ?
Go further
For more informations, examples and functions read the official documentation :
Happy reading
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.