URL Rewrite

URL rewriting refers to the process of modifying a URL in order to make it easier and more accessible to Internet users. Generally, this modification occurs to make the URL shorter and more consistent. This way, users will remember it and will have no trouble reading or writing it when they need to.

Nowadays, Internet users know that there are huge risks on the Internet. And when we come across a long-tail URL composed of numbers and letters, we tend to be suspicious.

That’s why it’s important to ensure visitors’ trust when they first encounter your URL by making it short and meaningful.

If you still can’t understand the concept of URL rewriting better, I invite you to read this article carefully.

Glossary

Chapter 1: What is URL rewriting?

In this chapter we will try to clarify the essential topics to give you a clear idea of what is meant by URL rewriting.

1.1. Definition of URL rewriting

Almost all servers, be it Apache, Nginx, Microsoft’s IIS or others, offer themselves the possibility to modify URLs before they are visible to searchers.

Reecriture URL

This modification usually occurs when a requested document is located elsewhere and the visitor must instead be redirected to the new location.

In addition to these external rewrites where a visitor requests a URL and the server checks for certain redirects that apply to the requested URL, there are also internal rewrites.

Servers can quickly recognize the document or resource that needs to be made available in a URL, regardless of where it is stored in the internal folder structure.

When we take the case of WordPress, for example, each blog post is stored in a database and is assigned an identifier. The different pages can always be requested via this ID.

1.2) How does URL rewriting work?

The URL rewriting function simply places a layer on top of the original address and turns it into something easy to find and meaningful.

 Fonctionnement de la reecriture

From the user’s perspective, after the rewrite, the website URL remains the same in the browser, but more consistent

But behind the scenes, the browser rewrites the URL in this complicated mess and sends a request to the servers.

URL rewrites are also extremely useful when the server structure is changed and resources are moved from one folder to another

In this situation, a system administrator will simply write the part that the friendly URL points to

Basically, since the resource has been moved, it will have a different location. Therefore, a rewrite is required to point the rewritten URL to the new location of the resource

This should not be confused with redirection functions that occur when a resource has been replaced by a different resource.

1.3. How important is it to rewrite a URL?

It is extremely important that URLs make sense and give an idea of the page they refer to, while being easy for search engines to understand.

URL rewriting does not show the user the inner workings and statistics behind a website address, preventing them from seeing the query strings, which is not beneficial to the site.

This process is not only useful for those who consult or read it, but also contributes to the security aspect of the site, preventing to a large extent hacking or access by malicious users

It aims at creating a keyword rich URL, which means including the keyword in the text of the URL, which will effectively help the SEO process

Reecriture d'url (1)

This ensures that the site under development is already optimized to some extent even before it is launched.

The created URLs also tend to hide the links, which most of the time seem to go on and on.

In addition, the same URL may continue to be used, even if there is a change in the starting link.

The URL rewriting process can be performed on any type of site or web content management system, whether it is a site developed by asp.net or those created using PHP technology

The process of URL rewriting is done according to the languages and needs of the websites

This seems to be possible with ASP.NET technology in an efficient way, as it is similar to Internet Information Server (IIS). In fact, ASP.NET is a server-side scripting engine (IIS) that produces interactive web pages.

URL rewriting is also made possible with PHP with the mod rewrite module for Apache server, among others.

Moreover, they also seem to be user-friendly via the interface. It is also useful if a user wants to remove a section of the URL in order to navigate to the next level, which is very beneficial for him.

Let’s say a user lands on the example.com/seo/recritture-url page and wants to go back to the home page. He can simply keep example.com and delete the other parts.

1.4. Why is URL rewriting good for SEO?

Using URL rewriting has several advantages. For one, URL rewrites help with accessibility and improving the user experience

Simply put, when a user looks at a URL in a search engine result, they don’t have to try to figure out what that page or article is about.

Also, messy URLs don’t encourage people to click on them, which can lead to a lower click-through rate. This is generally bad for SEO and site performance.

On the other hand, friendly URLs help optimize content for SEO

A URL that includes the title of the article and the main keyword will help with your Google indexing and the bots’ perception of your article or webpage

As a result, URLs optimized for SEO through URL rewriting lead to better visibility, more credibility with your users and naturally higher traffic volume.

Apart from that, using URL rewrites also helps to maintain consistency in your URL path and page name structure

Finally, rewrites also contribute to performance through user-mode caching and troubleshooting, as they take care of tracking failed requests.

1.5. URL rewrites vs. redirection

Unlike URL rewrites, a redirect is a client-side action, not a server-side action.

Redirection 302 (1)

Basically, a rewrite occurs when the resource address changes, or we need a simpler, more user-friendly layer

This happens behind the scenes and the user is not aware of it. In comparison, a redirect occurs when the resource no longer exists.

Redirects can also occur when we predict how a user will attempt to access a resource and configure redirect functions to ensure that they access the resource correctly.

For example, WWW resolution is a redirect action whereby, regardless of how the user searches for the Twaino home page, they use the WWW. before the domain name or not, they will always land on Twaino.com.

RedirectRewrite
On the client sideOn the server side
The URL changes in the search bar.Here, the URL does not change in the search bar, it is just modified.
The redirection supports the following codes:301: Permanent;302: Found;303: See other;307: Temporary.The redirect status or no code is not applicable.
Useful for search engine optimization by forcing the search engine to update the URL.Also useful for search engines by using a friendly URL to hide a messy URL.
Example: From http://votredomaine.com to http://www.votredomaine.com in the browserExample: https: //www.twaino.com/ to twaino.com
Can redirect to the same site or an unrelated site.Usually rewrites to the same site using a relative path, although if you have the ARR module installed, you can rewrite to a different site. When you rewrite to a different site, the URL rewrite works as a reverse proxy.
The page request flow is:Browser requests a page;Server responds with a redirect status code;Browser makes a second request to the new URL;Server responds to the new URL.The page request flow is:The browser requests a page;The URL is rewritten to make a request for the updated page practically in IIS.

Chapter 2: Why is it important to do URL rewriting?

This chapter is devoted to the reasons why webmasters need to do a rewrite.

2.1. Applications need to be secure

It is important that webmasters protect their websites against all kinds of attacks. Indeed, an individual should not be able to harm your site by modifying a URL that points to your applications

To ensure the security of your site, check all GET variables coming from your visitors.

For example, let’s say we have a simple script that displays all the products in a category. Usually, it looks like this:

  • myapp.php?target=showproducts&categoryid=123

But when a ScriptKiddie(tm) types in his navigation bar myapp.php?target=showproducts&categoryid=youarebeinghacked, many sites will display error messages complaining about using the wrong SQL query, an invalid MySQL resource ID, etc. This just shows that these sites are not as good as they should be

This simply shows that these sites are not secure at all or very secure.

2.2. Applications must be search engine friendly

It is not generally known, but many search engines will not index your site thoroughly if it contains links to dynamic pages like the one mentioned above

They simply take the ”name” part of the URL. That is to say, everything before the question mark, which contains the parameters necessary for the proper functioning of most scripts, and then try to retrieve the content of the page

To make it clear, here are some links from our fictitious page:

  • myapp.php?target=showproducts&categoryid=123 ;
  • myapp.php?target=showproducts&categoryid=124 ;
  • myapp.php?target=showproducts&categoryid=125.

Unfortunately, there is a good chance that some search engines will try to download the myapp.php page.

In most cases, calling a script like this will cause an error or it will not display the appropriate content the link was pointing to

Just try this search on google.com:

”you have an error in your sql syntax” .php -forum

You will notice that there are both huge errors and security threats in the scripts listed

2.3. Applications must be user-friendly

If your website uses an application like

quehttp://www.downloadsite.com?category=34769845698752354, then most of your visitors will have a hard time returning to their favorite category every time they leave the main page of your site

It is even easier for the user to find the URL in the drop-down list of browsers when typing in the ”Location” field, although of course this only works if the user has visited it before.

Chapter 3: What are the rules of URL rewriting?

It is obvious that URL rewriting is an opportunity for sites to make their URLs user-friendly. However, this practice must follow certain rules to be effective and produce the desired effects.

In this chapter we will discuss the rules that must be followed in order to rewrite a site’s URLs correctly.

3.1) How to rewrite a URL with IIS

To rewrite a URL with IIS, you must first install the software that can be downloaded from the Microsoft platform.

Once installed, you will see a new “Url Rewrite” icon in the IIS management console.

Step 1: The IIS management console with URL rewriting added

You can manage URL rewriting at the server level or for individual sites as you see fit

With the URL rewriting module, you will see the ”templates” used. The templates are in one of three modes

  • Exact match
  • Wildcards ;
  • And ECMAScript regular expressions, which are Perl compatible regular expressions.

There are two types of rules: inbound and outbound

Inbound rules examine request URLs and modify them. While outbound rules inspect the sent traffic, look for the URLs it contains and rewrite them if necessary;

This is even more interesting when the content may use an absolute URL that is not the one the user should receive;

One of the advantages of URL rewriting is that it supports a number of different built-in rules that make life easier when you want to do a common rewrite

The full list of built-in rules is :

  • Rule with Rewrite Map: allows you to define a set of paths and their replacements as a simple list ;
  • Query blocking: Prohibit access to a path;
  • User-friendly URL: Quickly create rules to map path segments to query strings;
  • Reverse Proxy: Allow the current server to reverse proxy another;
  • Enforce lowercase URLs: Forces the client to always use lowercase URLs via a 301 ”Permanent” HTTP redirect;
  • Canonical domain name: Uses a HTTP redirect status 301 ”Permanent” to ensure that customers always use the specified domain name;
  • Add or remove the trailing slash symbol: This will always add or remove the trailing slash in a URL path using an HTTP 301 ”Permanent” status redirect.

Step 2: Creating a rule allows you to choose a built-in rule to start from

Built-in rules are great because although they come with a custom wizard, if any, they generate standard rules that you can then adjust or modify as needed.

The friendly URL rule is quite popular for those who don’t have a system that does this automatically

You start by entering an example of the ”ugly” URL that the site really needs

Step 3: Creating a friendly URL rule

Another of the built-in rules that most use is the reverse proxy rule

Again, the system walks you through a competent default configuration in what could be a very complex task

It contains built-in and editable options such as:

  • Whether or not HTTPS responses should always be proxied to standard HTTP;
  • Whether or not you want to use an outbound rule to hide the internal server name

People often use this to be able to set up a central server with virtual web hosts on the same IP address to handle incoming requests that need to be sent to different internal servers.

Figure D: Creating a reverse proxy

The last rule we will discuss is rewrite maps

These allow you to create a list of URLs and translate them into alternate URLs

By itself, a rewrite map is unnecessary, it should be used as part of a larger rule instead of or in conjunction with substitution patterns

These are especially useful when you are designing or reorganizing a site that uses non-predictable URLs

By combining a redirect using HTTP 301 status ”Permanent” with a rewrite map, you can customize your translations in situations where the rules don’t work well.

Once you’ve set up the basic rule, you can modify it as needed. The rule editor breaks things down really well

You start with a rule name and the URL template to match

From there, you can add various conditions, such as:

  • Searching for a particular string in a URL parameter
  • A server environment variable
  • And so on

You can tell it to match all or any of the conditions. If the conditions are not met, the rule will not perform the rewrite

You can also perform substitutions on server variables, which is ideal for enforcing certain behaviors

Server variables encompass a very large list of items to work with, including the rewrite maps you created

Next, you define what the rule should actually do, which is perform a rewrite or redirect that will actually send a redirect HTTP status code to the client to query the new URL

Then you define the template for the rewrite itself. Finally, you have a few options:

  • Adding the request string;
  • Save request;
  • Do not execute any more rules after the request is complete.

3.2. how to rewrite a URL with Apache

Here are a few steps on how you can define URL rewriting rules with Apache:

Apache

Step 1: Install the Apache web server

Before you start, make sure you have the Apache web server package installed on your system. If it is not installed, you can install it with the following command: ”apt-get install apache2 -y”

Once the package installed, start the Apache service with the following command: ”systemctl start apache2”.

Then, open your web browser and type the URL http://your-server-ip to check the Apache web server

Step 2: Enable mod_rewrite

By default, the mod_rewrite module is installed with the Apache package, but it is disabled. So you will need to enable it first.

You can activate it with the following command: ”a2enmod rewrite”.

Then restart the Apache service to apply the changes and check the Apache mod_rewrite module with the following command: apache2ctl -M | grep rewrite_module.

You should get the following output: rewrite_module (shared)

Step 3: Enable .htaccess files

You can set up rewrite rules directly in the main Apache configuration file. However, it is recommended to write rules in the .htaccess file inside each website.

By default, Apache does not allow the .htaccess file to be used. You will need to enable the .htaccess file in your default virtual host configuration file.

To do this, edit the Apache default virtual host configuration file: nano /etc/apache2/sites-available/000-default.conf.

Add the following lines before the line

Options Index FollowSymLinks MultiViews

Allow all override

Require all to be granted

Be sure to save and close the file, then restart the Apache service to apply the changes: systemctl restarts apache2.

Step 4: Configure URL rewrites

To understand how URL rewrites work, we will create a home.html page in the root directory of the Apache document

We will then configure a basic URL rewrite that will access the http://your-server-ip/home page and convert it to the actual http://your-server-ip/home.html page path.

Let’s start by creating a home.html page:

nano /var/www/html/home.html

Add the following content:

Home

Home page

Here is my home page

Save and close the file when you are done.

Next, create an .htaccess file in the website’s default document root directory to test mod_rewrite.

nano /var/www/html/.htaccess

First, add the following line to enable the rewrite engine: RewriteEngine enabled.

Then add the following rewrite rule that redirects visitors to home.html if they request the page http://your-server-ip/home: RewriteRule ^home$ home.html [NC].

Save and close the file when you are done.

A brief explanation of the syntax of the rewrite rules is shown below:

  • ^: This will match any text after the server IP address;
  • $: This will match the end of the URL.
  • home : This matches the actual string home ;
  • home.html: This defines the actual file the visitor is accessing;
  • [NC]: This makes the rule case insensitive.

You can now visit the home page at http://your-server-ip/home on your web browser. Apache will redirect to the home.html page.

3.3. How to do a rewrite with nginx

In nginx, the rewrite directive can be specified in one of three contexts: server, location, if.

3.3.1. Example of nginx rewriting using $1, $2, .

Here is an example of an Nginx rewrite directive: rewrite ^(/data/.*)/geek/(\w+)\.?.*$ $1/linux/$2.html last.

For example :

Url/data/distro/geek/test.php will be rewritten as url/data/distro/linux/test.html.

In this example, when you call the original URL with test.php from the browser, it will be rewritten according to the above rewrite rule and serve the test.html page of /data/distro/linux/

In the above rewrite rule:

  • $1 and $2 capture the appropriate strings from the original URL which does not change;
  • $1 in the replacement string will match anything inside the 1st parenthesis ( ) in the reg-ex. In our example, $1 is /data/ ;
  • Similarly, $2 matches everything inside the 2nd parenthesis ( ) in the reg-ex. So, $2 is (\w+), which is any word that comes after the /geek/ in the original URL;
  • In our example, $2 is a last test. This flag will make sure to stop looking for the rewrite directive in the current location or block and use the modified URL and look for a new location for any other rewrite directive that matches;
  • *$: This indicates the extension in the original URL. Note that here the extension of the original URL will be replaced by .html in the rewritten URL. So even if you call .php in the original URL, it will only serve the .html file in the rewritten URL.

Although Nginx rewrite rules are similar to Apache rewrite rules, there are still many differences in the way you write a rewrite rule in Nginx.

3.3.2. Creating a controller file using Nginx Rewrite

Using rewrite, you can route many incoming origin URLs to a master controller template that will serve those requests.

The following rewrite example explains this:

  • rewrite ^/linux/(.*)$ /linux.php?distro=$1 last ;

In this example, when you call the URL thegeekstuff.com/linux/centos, it will be rewritten using the above rule and it will serve the page with this rewritten URL

  • thegeekstuff.com/linux.php?distro=centos

As you can see above, any URL that matches the pattern here /linux/ in the URL will be served by linux.php, but the last part of the original incoming URL will be used as the value for the distribution argument in the linux.php controller.

Thus, the above rewrite rule will transform the incoming URL as follows:

  • linux/centos becomes linux.php?distro=centos ;
  • linux/debian becomes linux.php?distro=debian ;
  • linux/redhat becomes linux.php?distro=redhat ;
  • etc.

As in the previous example, we use $1 in the replacement string to capture everything inside the 1st parenthesis ( ) in the reg-ex. In this case, it is the last part of the original incoming URL.

We also use the last flag here to instruct nginx to stop looking for other rewrite directives in the current block and move to the next corresponding location for further searching.

3.3.3. Rewriting the break indicator in the context of the location

In this example, we have placed the rewrite condition in the location directive .

In this example, the location directive is /data/, which also matches the $1 in the replacement string given below.

location data/ {

rewrite ^(/data/.*)/geek/(\w+)*$1/linux/$2.html break ;

return 403 ;

}

Here is what would have happened if you had used the “last” flag above:

  • So, if you had ”last” as the flag, after the initial URL rewrite, Nginx will typically look for the next rewrite directive for the new URL ;
  • In this case, Nginx will continue to redirect to the same location data and continue to process the same rewrite rule up to 10 times, and finally return error code 500.

3.3.4. Adding a question mark to the Nginx Rewrite replacement string

If a replacement string includes the new query keywords, the previous query keywords are added after them

If you don’t want to do this, place a question mark at the end of a replacement string to avoid adding them.

In the following example, in the replacement string portion, there is no question mark at the end. That is, no question mark after $1 :

rewrite ^/linux/(.*)$ /linux.php?distro=$1 last ;

In the above example, when the replacement string includes the arguments of the incoming request, the arguments of the previous request were added after them.

When you don’t want this addition to actually happen, you can have another way out.

In the following example, in the replacement string part of the Nginx rewrite, you can add (?) at the end, i.e. there is a question mark after $1

rewrite ^/linux/(.*)$ /linux.php?distro=$1? last;

In the above example, the replacement string includes the arguments of the incoming request, and then the arguments of the previous request are not added after them.

3.3.5 ”if” The context and the rewrite directive

The following few examples illustrate that we can use rewriting inside the ”if” directive.

You can perform conditional rewriting based on a comparison of ”if” conditions using variables such as $scheme, $http_host, $http_user_agent, etc., as shown below:

if ($scheme = “http”) {

rewrite ^ https://www.thegeekstuff.com$uri permanent ;

}

if ($http_host = thegeekstuff.com) {

rewrite (.*) https://www.thegeekstuff.com$1 ;

}

if ($http_user_agent = MSIE) {

rewrite ^(.*)$ /pdf/$1 pause ;

}

Also note that there are better ways to achieve the end result of the above examples

The above examples are just given to show that we can add a rewrite directive inside the ”if” statement in the nginx configuration file.

Please note that you can also set the value of the following two parameters to on or off in your nginx configuration file:

server_name_in_redirect on

port_in_redirect disabled

3.3.6. Capture Nginx rewrite hits in the error log file

By default, whenever Nginx performs a successful rewrite, it does not record it in the error.log file.

Initially, when you write complex rewrite rules, you really need to make sure that Nginx performs the rewrite according to your needs.

To do this, you need to enable the rewrite log, which will write a log entry every time nginx performs a successful rewrite using one of the rewrite directives in the configuration file.

To do this,

  • Use the rewrite_log directive and set it to on ;
  • Add the following two lines to your nginx default.conf:

notice error_log /var/log/nginx/error.log;

rewrite_log to ;

The first line indicates the location of the error_log file where rewrite messages should be written

Please note that a rewrite message is of type notice. So you must add ”note” at the end of this line as shown above.

3.4. What is the real problem with URL rewriting rules?

Web application developers use URL rewriting rules to hide parameters in the URL path structure

This makes it easier for search engines to index all pages of a website, while web browsers receive the URL in a format they understand and that is easy for users to remember.

It is important to ensure that these requests are accepted by the web application and that all URL parameters are parsed correctly

The following summarizes the problems that can occur when automated web vulnerability scanning software attempts to scan websites that use URL rewriting technology and rules:

3.4.1. Parameters in URLs are not scanned

A common problem encountered by web vulnerability scanners when scanning web applications that use URL rewriting technology is that the scanners are unable to identify parameters in URLs

The scanners assume that URLs are directories rather than names or parameter values, and leave them un-scanned.

3.4.2. Prolonged vulnerability scans

This problem can lead to prolonged scans and incorrect scan results

For example, if the web vulnerability scanner scans a tool database that contains 100,000 tools, since the scanner is unable to identify that there is a parameter and a value in the URL, it would think that they are different pages. So it will try to crawl and scan them all.

If memory problems and other exceptions are not handled properly by your scanner, it could also cause your software to start crashing and leave you with no results.

3.4.3. Setting up URL rewriting rules is a difficult process

Since URL rewriting technology has become very popular in web applications, many commercial web vulnerability scanners allow users to configure the scanner. This allows them to identify parameters in URLs and scan them.

But even though web vulnerability scanners can be configured to scan websites using URL rewrite rules, users may encounter several other issues such as:

  • Configuring support for URL rewriting rules is very difficult;
  • The user must know how to write regular expressions;
  • The user must have access to the web server configuration files.

So, if you are not the developer of the web application itself or if you have no in-depth knowledge of the web application, it is impossible to configure URL rewriting rules on the scanner

And, even if you know how to do it, configuring rewrite rules is a very difficult and time-consuming task.

3.4.4. Web applications are not properly scanned for vulnerabilities

Assuming you manage to configure URL rewrite rules in your web vulnerability scanner, there are other problems.

There are a number of limitations to how scanners scan the web application. As a security measure, web applications do not accept HTTP requests that are already “translated”

By default, .NET web applications do not accept HTTP requests

The problem becomes even more important when analyzing MVC web applications, because these applications use a different approach to URL rewriting.

Once you have configured URL rewriting rules in your scanner, the scanner sends a type of HTTP request called translated requests

Even though the web application security scanner reports that the scan was successful, most HTTP requests are denied and URL parameters are not scanned, giving you a false sense of security.

Conclusion

We note that URL rewriting is sometimes crucial to ensure that your address does not give a bad feeling to Internet users.

It is a process that offers many benefits from an SEO and website credibility perspective.

In this content, we have clarified the concept of URL rewriting and we have accompanied it with the best ways to rewrite your URL addresses.

We invite you to share with us your opinions and other resources on the concept of URL rewriting.

Categories R

Leave a comment