In this article, you’ll learn the basics of “validation” and “sanitization” for safely handling input data in PHP. We’ll explain everything from core concepts to practical code examples in a way that’s easy for beginners to understand.

  • Why data validation is necessary
  • The difference between validation and sanitization
  • Basic implementation methods in PHP
  • Countermeasures against common attacks (XSS / SQL injection)

What is data validation in PHP, and why is it essential for security?

When developing web applications (websites or web services) with PHP, “data validation” is an unavoidable and very important process.

Put simply, validation is “checking whether the input data sent by the user is safe and correct.”

For example, contact forms, user registration, login, and comment posting—any feature where users enter something—are all subject to validation.

Why is data validation indispensable?

If you use user input as-is without any checks, serious problems like the following can occur.

  • System malfunctions and errors: If someone enters “あいうえお” (non-numeric characters) in an age field that expects a number, your calculation process will fail and cause an error.
  • Data inconsistency: If a required field like “email address” is saved as empty in the database, you won’t be able to contact the user later.
  • Security vulnerabilities: Malicious users may inject “program code (scripts)” or “database-manipulating instructions (SQL statements)” into input forms.

If you accept such input without checking, XSS (Cross-Site Scripting) can lead to theft of others’ cookies (such as login information), and SQL injection can result in the theft, tampering, or deletion of information in your database.

In PHP-based web development, data validation is indispensable not only “to make the system behave as expected,” but also “to protect the system and user data from malicious attacks.”

Concrete attack example

For example, it’s dangerous to run code like the following without any checks.

// Outputting user input as-is (dangerous)
// ⚠️ This code outputs user input (the URL query parameter `name` here)
// to the browser as HTML without proper **sanitization**.
// Doing this introduces a **Cross-Site Scripting (XSS)** vulnerability.
echo $_GET['name'];

// Malicious input example //
// If a user visits with a URL like “?name=<script>alert('XSS');</script>”,
// the above `echo` embeds `<script>alert('XSS');</script>` as HTML on the page,
// causing the attacker’s script to run in the viewer’s browser.
?name=<script>alert('XSS');</script>

In this case, JavaScript runs in the browser, and there’s a risk that the site visitor’s cookies and other data could be stolen.

Validation vs. Sanitization: Understand each role

Data checking consists of two distinct roles: “Validation” and “Sanitization.” Understanding the difference is the first step in security.

Validation: checking input data

Verify whether the input data matches predefined rules (formats and conditions).

  • Are required fields not left empty?
  • Is the email address in the correct format?
  • Is the password at least the specified number of characters?
  • Is a “number” entered for the age field?
  • Is the postal code in the “XXX-XXXX” format?

Invalid data is “rejected,” and an error is returned to the user.

Sanitization: neutralizing input data

Remove or convert dangerous code contained in the input so it can be handled safely.

Example: Convert comments containing <script> tags into &lt;script&gt; so they won’t execute in the browser.

  • Validation: “reject” invalid data
  • Sanitization: “neutralize” dangerous parts and accept the data
ItemValidationSanitization
PurposeConfirm the validity of input dataNeutralize dangerous data
When to processRight after receiving the dataRight before displaying or saving the data
ExamplesEmail format checks, required field checks, etc.htmlspecialchars(), strip_tags(), filter_var()

Basic validation implemented in PHP

1. Check required fields

empty() checks whether a variable is “empty.” Empty includes undefined, null, empty string, 0, empty array, etc.

// Check POSTed 'username'
if (empty($_POST['username'])) {
  echo "Name is a required field.";
}

2. Check character length

Process to check password length. mb_strlen() is a multibyte-safe function for counting characters (accurate for Japanese and other multibyte strings).

$password = $_POST['password'];
$minLength = 8;
if (mb_strlen($password) < $minLength) {
  echo "Please enter a password with at least {$minLength} characters.";
}

3. Check numeric values

Processing to check the age input as an integer.
filter_var() validates and filters values.
FILTER_VALIDATE_INT determines whether a value is an integer.

$age = $_POST['age'];
if (filter_var($age, FILTER_VALIDATE_INT) === false) {
  echo "Please enter age as half-width digits (an integer).";
}

4. Check email address format

filter_var() is used to validate values, and FILTER_VALIDATE_EMAIL determines whether it’s a valid email format.

$email = $_POST['email'];
if (filter_var($email, FILTER_VALIDATE_EMAIL) === false) {
  echo "The email address format is invalid.";
}

Format validation using regular expressions

Postal code check

preg_match() checks whether a value matches a pattern using regular expressions. /^\d{3}-\d{4}$/ represents the format “3 digits – 4 digits” (e.g., 123-4567).

$zipcode = $_POST['zipcode'];
$pattern = '/^\d{3}-\d{4}$/';
if (preg_match($pattern, $zipcode) === 0) {
  echo "Please enter the postal code in the format “123-4567”.";
}

Mobile phone number check

^(090|080|070)-\d{4}-\d{4}$ means it starts with “090”, “080”, or “070” followed by 4 digits – 4 digits.

$tel = $_POST['tel'];
$pattern = '/^(090|080|070)-\d{4}-\d{4}$/';
if (preg_match($pattern, $tel) === 0) {
  echo "The mobile phone number format is invalid. (e.g., 090-1234-5678)";
}

Basics of XSS protection: concrete sanitization methods

Common sanitization functions

  • htmlspecialchars(): Escape HTML tags before output
  • strip_tags(): Remove HTML tags entirely
  • filter_var(): Safely convert with specified types and formats
  • intval() / floatval(): Convert to numbers to handle safely

How to use htmlspecialchars()

htmlspecialchars() disables special HTML characters so they can be output safely.
ENT_QUOTES converts both single and double quotes.
'UTF-8' specifies the character encoding (essential in Japanese environments).
This prevents tags like <script> from executing and displays them as plain text.

$comment = "<b>Bold</b><script>alert('Danger');</script>";
echo htmlspecialchars($comment, ENT_QUOTES, 'UTF-8');

Good and bad examples

// Bad example: using htmlspecialchars() at input time
// can cause double-escaping when saving to the database.
$company_name = htmlspecialchars($_POST['company'], ENT_QUOTES, 'UTF-8');
// Good example: keep the input data as-is and use htmlspecialchars() at output time
// to display it safely.
$company_name = $_POST['company'];
echo htmlspecialchars($company_name, ENT_QUOTES, 'UTF-8');

In practice: the basic flow of data validation in a PHP form

<?php
  $errors = [];
  $name = '';
  $email = '';
  if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $name = $_POST['name'];
    $email = $_POST['email'];
    if (empty($name)) {
      $errors['name'] = 'Name is required.';
    }
    if (empty($email)) {
      $errors['email'] = 'Email address is required.';
    } elseif (filter_var($email, FILTER_VALIDATE_EMAIL) === false) {
      $errors['email'] = 'The email address format is invalid.';
    }
    if (empty($errors)) {
      header('Location: thanks.php');
      exit;
    }
  }
?>
<form action="" method="POST">
  <label>Name:</label>
  <input type="text" name="name" value="<?php echo htmlspecialchars($name, ENT_QUOTES, 'UTF-8'); ?>">
  <?php if (isset($errors['name'])) echo htmlspecialchars($errors['name'], ENT_QUOTES, 'UTF-8'); ?>

  <label>Email address:</label>
  <input type="text" name="email" value="<?php echo htmlspecialchars($email, ENT_QUOTES, 'UTF-8'); ?>">
  <?php if (isset($errors['email'])) echo htmlspecialchars($errors['email'], ENT_QUOTES, 'UTF-8'); ?>

  <button type="submit">Send</button>
</form>

Key points for secure form handling

  • Always sanitize right before using input values
  • When saving to a database, use prepared statements (countermeasure against SQL injection)
  • Make error messages user-friendly and avoid leaking internal system details
  • Introduce token-based verification as a CSRF (Cross-Site Request Forgery) countermeasure

Summary: the first step to secure PHP development is data validation

Validation and sanitization are the absolute basics of secure web development in PHP. By handling input data correctly, you can not only prevent bugs and errors but also build a robust system that protects user information.

We recommend using the sample code introduced here as a foundation and adding rules tailored to your actual forms and applications.