Semantically-Lossless Normalizations¶
There is a set of normalizations that do not change the semantics of a URL. These are defined as
Normalizer::PRESERVING_NORMALIZATIONS
. The normalizer applies this set of normalizations if no specific
normalizations are requested.
- capitalize percent encoding
- decode unreserved characters
- convert empty http path
- remove default file host
- remove port host
- remove path dot segments
- convert host unicode to punycode
<?php
use webignition\Uri\Normalizer;
use webignition\Uri\Uri;
$uri = new Uri('http//♥.example.com:80/p%61th/../?option=%3f');
$normalizedUri = Normalizer::normalize($uri);
(string) $normalizedUri;
// "http//xn--g6h.example.com:80/path/?option=%3F"
The Normalizer::PRESERVING_NORMALIZATIONS
flag can be used in conjunction with additional normalizations.
<?php
use webignition\Uri\Normalizer;
use webignition\Uri\Uri;
$uri = new Uri('http//♥.example.com:80/p%61th/../?option=%3f&b=bear&a-apple');
$normalizedUri = Normalizer::normalize(
$uri,
Normalizer::PRESERVING_NORMALIZATIONS |
Normalizer::SORT_QUERY_PARAMETERS
);
(string) $normalizedUri;
// "http//xn--g6h.example.com:80/path/?a=apple&bear&option=%3F"