What is URL Encoding? A Simple Explanation of Its Basic Meaning and Role

URL encoding is a technique that converts characters that cannot be included in URLs (Uniform Resource Locators) used on the web into a format that can be safely transmitted.
URLs can only use alphanumeric characters and some symbols, so Japanese characters and special characters cannot be used as they are.
By using URL encoding, these characters are replaced with a combination of “%” and hexadecimal numbers for transmission.

For example, a “space” is converted to “%20”.
This conversion allows browsers and servers to correctly receive the information without breaking the meaning of the URL.

Characters Not Allowed in URLs and Why Encoding is Necessary

The characters not allowed in URLs mainly include the following.

  • Non-ASCII characters such as Japanese and kanji
  • Spaces and control characters
  • Certain symbols (e.g., space, "<", ">", "#", "%", "{", "}", "|", "\", "^", "~", "[", "]", "`")

If these characters are included directly in a URL, the URL may be misinterpreted, causing communication errors or unintended behavior.
For example, if a URL contains “#”, the part after it is treated as a “fragment identifier” for a specific position within the page, which can lead to unexpected page transitions or missing information.
Also, since “%” indicates an encoded character, using it as-is can cause confusion.

Why is encoding necessary?
URLs are basically composed of ASCII characters, and some characters serve as delimiters or have special meanings, so including them can cause confusion.
URL encoding replaces these characters with the “%xx” format to safely use them in URLs.

How URL Encoding Works: How Characters Are Converted

URL encoding proceeds as follows.

  1. Convert the target characters into a byte sequence (usually a byte sequence according to a character encoding such as UTF-8).
    (UTF-8 is commonly used on the web.)
  2. Convert each byte to hexadecimal and prefix it with “%”.

Example:

  • “あ” is 3 bytes in UTF-8 (E3 81 82), so it is converted to %E3%81%82.
  • A half-width space is ASCII code 32 (hexadecimal 20), so it is converted to %20.

This conversion allows characters that cannot be used directly in URLs to be safely included.

What is PHP’s urlencode() Function? Usage and Important Notes (Simple Explanation)

PHP has a function called urlencode() that converts strings into a format usable in URLs.

For example, it can be used as follows.

$text = "こんにちは 世界";
$encoded = urlencode($text);
echo $encoded;  // %E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF+%E4%B8%96%E7%95%8C

The key point here is that spaces (blanks) are converted to “+” (plus signs).
In other words, the space in “こんにちは 世界” becomes “+”.

On the other hand, if you want spaces to be encoded as “%20”, use the different function rawurlencode().

$text = "こんにちは 世界";
$encoded = rawurlencode($text);
echo $encoded;  // %E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF%20%E4%B8%96%E7%95%8C

This encodes spaces as “%20” and converts all Japanese characters into hexadecimal notation.

How to Use Them Differently? Important Notes

  • urlencode() is mainly suitable for encoding the “query parameter” part of URLs.
    Query parameters refer to the data passed in key=value format after the “?” in URLs.
    For example, it is used when including search terms or form input data in URLs.
    Spaces are converted to “+”, making it easier to handle in typical web systems.

    This is based on the application/x-www-form-urlencoded specification.
  • rawurlencode() is appropriate for encoding the “path” part of URLs (such as folder and file names after the domain).
    Spaces in the path part are often represented as “%20”, and this encoding complies with RFC 3986.
    Use this when you want stricter encoding.
  • Encoding should only be applied to necessary parts.
    If the whole URL is encoded multiple times, “%” characters get encoded again, making it unreadable.
    For example, applying urlencode() to an already encoded string converts % to %25, which can become irreversible.

    Multiple encoding is a common issue in web development, so be careful.
Use CaseRecommended FunctionSpace Conversion Example
Query parameters (e.g., search terms)urlencode()Space → +
URL path part (folder or file names)rawurlencode()Space → %20

Difference from URL Decoding and Basic Decoding Methods

URL-encoded strings need to be converted back to their original form on the server or browser side. This process is called URL decoding.

In PHP, decoding is done using urldecode() or rawurldecode().

urldecode() reverses spaces encoded as “+”.
It is mainly used to decode query parameters.

rawurldecode() reverses spaces encoded as “%20”.
It is mainly used to decode the path part of URLs.

$encoded = "%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF%20%E4%B8%96%E7%95%8C";
$decoded = urldecode($encoded);
echo $decoded; // こんにちは 世界

Key Points to Understand URL Encoding for Secure Web Communication

URL encoding is fundamental to web communication, and understanding it correctly provides the following benefits.

  • Allows safe transmission and reception of URLs containing Japanese and special characters
  • Prevents misinterpretation and errors caused by invalid URLs
  • Helps improve security by preventing unexpected string injections.
    For example, neglecting proper encoding can increase risks of cross-site scripting (XSS) or SQL injection attacks.

In web development, encoding and decoding strings are especially important for form submissions and API requests.