Data submitted through web forms may include untrusted values. Properly validating data in PHP is essential for both security and correct processing. Here, we will explain the basic validation methods step by step.

The Need for Data Validation

Since users can freely modify the data entered in forms, unexpected values may be submitted. For example, required fields might be empty, or a field expected to contain a number might contain a string.
To prevent such issues, you need to perform validation on the data received in PHP, checking that it has the correct format and content.

If validation is not performed, the following issues are more likely to occur:

  • Causing server errors
  • Invalid values being stored in the database
  • Increased security risks (such as SQL injection)

Validating Required Fields

If there are fields in a form that users must fill in, you must check whether input has been provided. This is called required field validation. Leaving a required field empty can cause errors in subsequent processing or prevent data from being correctly stored.

In PHP, you mainly use the empty() or isset() functions to check for required input.

if (empty($_POST['username'])) {
    echo "Username is required.";
}

The code above displays the message “Username is required.” if the username value submitted from the form is empty.

Characteristics and Notes on the empty() Function

  • It considers empty strings (“”), NULL, numeric zero (0), and undefined variables as “empty”.
  • It handles cases where the user submits a field blank or the field was not submitted at all.
  • Using empty() for required checks makes it simple and reliable to determine emptiness.
  • ※ Referencing an undefined variable with empty() may trigger a notice depending on the PHP version and error settings. For safety, consider combining it with isset().

Difference from isset() Function

  • isset() checks whether a variable exists.
  • It considers a variable as “existing” even if it is an empty string or NULL, so it may be insufficient for required field checks alone.

Therefore, empty() is generally used for required field validation, but combining it with isset() can be useful in some cases.

Checking that required fields are properly filled is the first step in form data validation. If this is not done correctly, subsequent processing is prone to errors.

Validating by String Length

Checking the length of strings entered in a form is also an important aspect of data validation. By setting character limits, you can prevent excessively long input or input that is too short to be meaningful. This helps maintain data quality while reducing system load and avoiding unexpected issues.

For example, to enforce a rule that usernames must be between 3 and 20 characters, you can write the following in PHP:

$username = $_POST['username'];
if (mb_strlen($username) < 3 || mb_strlen($username) > 20) {
    echo "Please enter a username between 3 and 20 characters.";
}

This code displays an error message if the submitted username is less than 3 or more than 20 characters.

Difference Between strlen() and mb_strlen()

  • strlen() returns the length of a string in bytes.
  • For multibyte characters such as Japanese, the result may differ from the intended character count. For example, strlen("あ") returns 3 bytes even though it’s 1 character.
  • To accurately count multibyte characters, use mb_strlen(). Ensure the mbstring extension is enabled.

Validating string length is a basic and important step for assessing input validity. This ensures system stability and guides users to provide appropriate input.

Validating Numeric Values

When expecting numeric input, it is important to check whether the value is truly a number. For example, fields like “age” or “quantity” are meaningless unless they contain numeric values.

In PHP, you can easily check if a value is numeric using the is_numeric() function.

$age = $_POST['age'];
if (!is_numeric($age)) {
    echo "Please enter a numeric value for age.";
}

This code displays an error message if the submitted age is not numeric.

Features of is_numeric()

  • is_numeric() recognizes integers and floating-point numbers as valid numeric values.
  • For example, “25”, “3.14”, and “-100” are all considered valid numbers.
  • Conversely, “abc” or “12a” are not numeric and will trigger an error.

Validating Numeric Ranges

Beyond checking if a value is numeric, verifying that it falls within a valid range ensures more precise input validation. For example, age may be limited to between 18 and 99.

In PHP, you first cast the value to a numeric type and then check the range.

$age = (int)$_POST['age'];
if ($age < 18 || $age > 99) {
    echo "Please enter an age between 18 and 99.";
}

This code converts $age to an integer and displays an error message if it is less than 18 or greater than 99.

Why Type Casting is Necessary

  • Data from forms is usually a string, so (int) or (float) is used to convert it to a numeric type.
  • If you compare values without casting, unexpected behavior may occur. For example, if ('20' < 5) { ... } will perform a string comparison and may not behave as intended.

By checking both whether a value is numeric and whether it falls within the valid range, you can perform safe and accurate data validation, preventing invalid input and processing errors.

Validating Dates

When accepting date input from users, it is important to ensure that the entered date is in the correct format and actually exists. For example, “2024-02-30” is invalid, as is a date in a different format like “2024/04/01”.

In PHP, the DateTime class allows you to safely and easily validate both the format and correctness of dates.

$date = $_POST['date'];
$d = DateTime::createFromFormat('Y-m-d', $date);
if (!$d || $d->format('Y-m-d') !== $date) {
    echo "Please enter a valid date in YYYY-MM-DD format.";
}

Explanation of This Code

  • DateTime::createFromFormat('Y-m-d', $date) parses the date string according to the specified format (“year-month-day”) and tries to create a DateTime object.
  • If the date format is incorrect or the date does not exist, $d will be false.
  • By formatting the created date again using format('Y-m-d') and comparing it to the original input, you can verify that the input matches the specified format exactly.

Why This Validation is Necessary

  • Simply checking string length or pattern (e.g., with regular expressions) may miss non-existent dates.
  • For example, invalid dates like “2024-02-30” or “2023-13-01” appear to match the “YYYY-MM-DD” format.
  • Using the DateTime class accurately detects such invalid dates, making validation safer.

Validating Numbers Using Regular Expressions

If you want to impose more detailed and precise rules on input data, using regular expressions is effective. This is especially useful for fields like phone numbers or postal codes where specific formats are required.

For example, to allow only digits in a phone number and ensure it is 10 or 11 digits, you can write:

$phone = $_POST['phone'];
if (!preg_match('/^\d{10,11}$/', $phone)) {
    echo "Please enter a 10- or 11-digit phone number.";
}

Meaning of This Code

  • preg_match() checks whether a string matches the specified regular expression pattern.
  • The pattern /^\d{10,11}$/ matches a string that consists of only 10 or 11 digits from start to end.

※ This regular expression is suitable only for Japanese landline and mobile numbers. It does not cover international or some special numbers, so adjust the pattern as needed.

Why Regular Expressions are Useful

  • They allow precise control over input patterns beyond simple checks for length or numeric values.
  • For example, you can allow only phone numbers without hyphens or spaces, or validate postal codes (e.g., 123-4567).
  • Complex patterns can be handled flexibly with regular expressions.

Using regular expressions enables detailed control over number and pattern validation. While it may seem difficult at first, it is a very useful technique for form validation once you get used to it.

Escaping HTML Output

Displaying user input directly on a web page can allow malicious code (scripts) to be injected. If left unchecked, this increases the risk of XSS (Cross-Site Scripting) attacks, potentially exposing user information or allowing unauthorized actions.

Therefore, it is important to always perform escaping when displaying user input as HTML. PHP provides the htmlspecialchars() function to handle this easily.

$name = htmlspecialchars($_POST['name'], ENT_QUOTES, 'UTF-8');
echo "Hello, " . $name . "!";

Key Points and Notes on htmlspecialchars()

  • This function converts special characters that would be interpreted as HTML tags into safe strings.
  • Specifically, <, >, &, ', " are converted to &lt;, &gt;, &amp;, ', " respectively.
  • As a result, the browser displays these characters as text, not as tags or scripts.
  • ENT_QUOTES converts both single and double quotes.
  • 'UTF-8' specifies the character encoding; using UTF-8 is recommended. Different encodings may result in incorrect conversions.
  • htmlspecialchars() is for HTML body output. For HTML attributes or JavaScript output, use appropriate functions like json_encode().

If malicious JavaScript is included in user input, it can execute when other users view the page, leading to data leakage or unauthorized actions. Escaping prevents such attacks and enhances site security.

Escaping is a mandatory security measure in web development and should always be implemented.

Summary

PHP form data validation is essential for security and correct processing. Keep the following points in mind:

  • Required field checks
  • String length limits
  • Numeric checks and range validation
  • Date format validation
  • Detailed pattern validation using regular expressions
  • Escaping output when displaying HTML

Combining these practices allows you to create safe and user-friendly web forms. In actual operation, also consider higher-level security measures such as CSRF protection and logging.