Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / nginx

How to Setup NGINX to Serve Several Services through One Port: nginx.conf

5.00/5 (2 votes)
22 Apr 2021Apache6 min read 7.1K  
Jupyter + HDFS + YARN + Spark and only one open port using NGINX
The article has tips for serving several web UI through one port; the pitfalls of hiding a web UI in a sublocation; hints and best practices to create maintainable NGINX configuration.

Teaser

The article covers:

  1. tips for serving several web UI through one port
  2. the pitfalls of hiding a web UI in a sublocation
  3. hints and best practices to create maintainable NGINX configuration

Prologue

Initially, the task appeared because BigData Team moved to the recently released Coursera Labs.

From FAQ for "Create Lab Activities with Coursera Labs" help article:

Q: Is there a mechanism to expose more than one port within a student lab container?

A: Coursera Labs supports single port, single application containers. However, it may be possible to run a NGINX reverse proxy to achieve this. Example: If there are two apps running: app1 at 9000 and app2 at 9001. You could run NGINX to route /app1 to 9000 and /app2 to 9001. In this case, the NGINX port is exposed from the container.

This sounds simple, but if you use Jupyter Notebook as one of the services, NGINX configuration becomes challenging. You can find some tips for Jupyter setup but they are about placing the web service to the root. Firstly, it is not a good idea to place one app at the root level and others either to one-level location ("/app1" is one-level location) or to the root also. Secondly, only adding a one-level location prefix to suggested location does not allow to run the whole thing. The reason is simple: most web services expect that they are located at the root and serve the content as for the connection through the root location. Therefore, some additional configuration is needed. This configuration will be presented below.

As a bonus, the article also discusses how to overcome the issue of the wrong port redirection. For instance, you ask the port 10080, but NGINX redirects to the port 80. It appears when your docker container forwards the local port X to the container’s port Y where X is not equal to Y, Y is the port which NGINX listens to.

Part 1: Basic Configuration

Notes

  1. Including types, for example, allows browsers to render pages with CSS.
  2. Four additional variables (see “New variables” section) were set using the “map” directive so that the requested url can be redirected to the correct web app.
  3. Repeating lines can be defined only once in the named location. Then using http error handling mechanism, the connection data from different locations can be processed according to this one service named location. According to what I have seen on different internet resources, it is the most popular practice.
  4. Sometimes, some guides suggest using $host variable for the "HOST" header. If $host is not what a user enters in the address bar of the browser, the proxying is broken in this case. One of such cases is when port forwarding is used before NGINX (see docker run -p for example). Therefore, I recommend to use $http_host as the value of "HOST" request header.
  5. If a non-http request is proxied, then change X_Forwarded_Proto.
  6. The location blocks order is crucial, because NGINX match uri and the locations in the specific order generated using locations and its modes.
  7. Services can be on/off/temporary off (Spark UI when no context is initialized). A permanent redirection allows a browser to save one of the states (or some substates) into the cache and show it even when the state is actually changed. Thus, 301 (Moved Permanently) and 308 (Permanent Redirect) are not used in the return directive inside the "Location not found" case handler. 302 (Found, previously "Moved temporarily") is not suitable because it forces to change POST requests to GET in some browsers while 307 (Temporary Redirect) saves HTTP method.
  8. Example of the debug header is also provided because it can be quite helpful in many NGINX configuration debugging issues.
user              USER GROUP;
worker_processes  1;

error_log         /var/log/nginx/error.log;
pid               /var/run/nginx.pid;

events {
}

http {
    # HTTP handler basic configuration
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    root /usr/share/nginx/html;
    
    # New variables
    map $http_referer $valid_http_referer {
        ~(/jup|/hdfs|/yarn|/spark) $http_referer;
    }
    map $valid_http_referer $app_link {
        # TODO: set default service location
        "" $scheme://$http_host/default_service;
        ~^([^/]+//[^/]+)(/[^/]+)?/?  $1$2;
    }
    map $uri $no_app_uri {
        ~^/[^/]*(/.*)  $1;
        default  $uri;
    }
    map $uri $requested_app {
        ~^/(([^/]+)/)?  $scheme://$http_host/$2;
    }
    
    # Upstreams
    ...
    
    
    # Servers configuration
    server {
        # Server basic configuration
        listen 80;
        
        # Trailing slash auto-completion
        if (-d $request_filename) {
            rewrite     [^/]$  $scheme://$http_host$uri/ permanent;
        }  # Ensure that you have directories with all app names 
           # in the root selected above
        
        # Zero-level location configuration
        location = / {
            if ($http_referer ~ "^.+$") {
                return  307  $app_link?$args;
            }
            return      301  $scheme://$http_host/jup/;
        }        
        
        # Common additional request headers for webapps
        error_page 599 = @common_proxy_headers;
        location @common_proxy_headers {
            proxy_set_header      HOST                 $http_host;
            proxy_set_header      Referer              $http_referer;
            proxy_set_header      X_Forwarded_For      $remote_addr;
            proxy_set_header      X_Forwarded_Proto    http;
        }        
        
        # Services' locations
        ...        
        
        # "Location not found" case handler
        location ~ / { 
            if ($requested_app = $app_link) {
                return            404;
            }
            return                307                 $app_link$request_uri;
            #add_header           X-debug             "$app_link $requested_app"  always;
        }
    }
}

Part 2: Location Configuration Template for Proxying to the Service

Notes

  1. Paths with and without trailing / are (often quite) different for NGINX, thus pay attention to / placement.
  2. ^~ prevents matching a request address to regexp-locations if the address matches this location. regexp-location is used for handling not matched locations.
  3. return 599; raise HTTP error 599 with the handler we defined earlier (see @common_proxy_headers).
location ^~ /some_service/ {
    proxy_pass            http://some_service_address/;
    return                599;
}

If the service is located at the one-level location, the location should be modified in the following way:

location ^~ /some_service/ {
    rewrite               ^/some_service(.*)$       $1$2 break;
    proxy_pass            http://some_service_address/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  $scheme://$http_host/some_service/$2;
}

rewrite removes the matched location from the uri. break is necessary here because it prevents the new uri rematching. proxy_redirect returns the removed part in order to keep the possibility to call the object(s) on this address.

Part 3: Jupyter Notebook Proxying Configuration

Notes

  1. Two more usage of the "do not repeat yourself" mechanisms (custom error handlers, named locations…) were used here.
  2. The "static" subsection is optional and even can harm because of changes in Jupyter Notebook. However, it can boost the performance because the static files will be served by NGINX but not by proxied service with this section. In the final configuration file, this section is omitted.
  3. Jupyter kernels and terminals use sockets. Thus, connections to them must be "upgraded".

Firstly, put the following block into "Upstream" section:

upstream notebook {
    server localhost:8888;
}

Then, put this block into "Services' locations" block:

location ^~ /jup/ {
    rewrite               ^/jup(.*)$ $1$2           break;
    proxy_pass            http://notebook/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  $scheme://$http_host/jup/$2;
}

# "Static" subsection start
error_page 598 = @jup_static_like;
location ^~ /jup/static {
    return                598;
}
location ^~ /jup/custom {
    return                598;
}
location ^~ /jup/nbextensions/widgets/notebook/js/ {
    root        /opt/conda/share/jupyter/nbextensions/jupyter-jupyter-js-widgets;
    return      598;
}
location @jup_static_like {
    try_files             $no_app_uri $no_app_uri/  =404;
    #add_header           X-debug                   "$no_app_uri"  always;
}
# "Static" subsection end

error_page 597 = @jup_upgrade_to_websocket;
location @jup_upgrade_to_websocket {
    proxy_pass            http://notebook;
    proxy_set_header      HOST $http_host;
    # websocket support
    proxy_http_version    1.1;
    proxy_set_header      Upgrade "websocket";
    proxy_set_header      Connection "Upgrade";
    proxy_read_timeout    86400;
}
location ^~ /api/kernels {
    return                597;
}
location ^~ /terminals {
    return                597;
}

Part 4: HDFS+YARN+Spark Proxying Configuration

Notes

  1. HDFS and Spark proxying is easy, it is just the template usage. For the Spark UI, additional conditional redirection to YARN UI was added. It is necessary because of automatic redirection of Spark UI to YARN UI when a Spark context master is set to 'yarn'. Also, when no Spark context is running, there will be no running web application. At this moment, NGINX will return 502, which will be processed by the added custom error handler. I changed 502 to 200 with additional information.
  2. HDFS Datanode page and YARN UI pages form links after the JS scripts computation, thus the link replacer is needed to be set after other page’s content as yet another JS script.

The content of hdfs_ui_fixer.js:

JavaScript
setTimeout( function() {
   $('#table-datanodes>tbody>tr').each(function(index, element){
        element.innerHTML = element.innerHTML.replace(
            /((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:50075/g, 
            '$1/hdfs_node'
        )}
    );
    console.log('server addresses in hdfs ui are fixed');
}, 2000)

The content of yarn_ui_fixer.js:

JavaScript
setTimeout( function() {
    $('tbody>tr>td').each(function(index, element){
        element.innerHTML = element.innerHTML.replace(
            /((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:8088/g,
            '$1/yarn_ui'
        )
        element.innerHTML = element.innerHTML.replace(
            /((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:8042/g,
            '$1/yarn_node'
        )
        element.innerHTML = element.innerHTML.replace(
            /((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:19888/g,
            '$1/yarn_jobhistory'
        )
    });
    console.log('server addresses in yarn ui are fixed');
}, 2000)
  1. The HDFS node page contain links to another system’s webservice on it using the internal hostname. If REMOTE_HOSTNAME is replaced with the actual internal hostname, the links to the locations defined in nginx.conf will appear on the returned UI webpages.
location = /hdfs_ui/hdfs_ui_fixer.js {}
location ^~ /hdfs_ui {
    rewrite               ^/hdfs_ui(.*)$ $1$2       break;
    proxy_pass            http://localhost:50070/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/hdfs_ui/$2;

    sub_filter            '</body>'    '<script type="text/javascript"
                          src="/hdfs_ui/hdfs_ui_fixer.js"></script></body>';
}
location ^~ /hdfs_node {
    rewrite               ^/hdfs_node(.*)$ $1$2     break;
    proxy_pass            http://localhost:50075/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/hdfs_node/$2;

    sub_filter            'http://REMOTE_HOSTNAME:50075'    '/hdfs_node';

}
location = /yarn_ui/yarn_ui_fixer.js {}
location ^~ /yarn_ui {
    rewrite               ^/yarn_ui(.*)$ $1$2       break;
    proxy_pass            http://localhost:8088/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/yarn_ui/$2;

    sub_filter '</html>' '<script type="text/javascript"
    src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_node {
    rewrite               ^/yarn_node(.*)$ $1$2     break;
    proxy_pass            http://localhost:8042/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/yarn_node/$2;

    sub_filter '</html>' '<script type="text/javascript"
    src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_jobhistory {
    rewrite               ^/yarn_jobhistory(.*)$ $1$2  break;
    proxy_pass            http://localhost:19888/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/yarn_jobhistory/$2;

    sub_filter '</html>' '<script type="text/javascript"
    src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /spark_ui {
    error_page            404 502 =                       @spark_ui_error_page;
    rewrite               ^/spark_ui(.*)$ $1$2            break;
    proxy_pass            http://localhost:4040/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/proxy/(.*)$
                          $scheme://$http_host/yarn_ui/proxy/$2;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/spark_ui/$2;
}
location @spark_ui_error_page {
    default_type          text/plain;
    if (-d $request_filename) {
    return                200 "Please, launch a Spark context firstly";
    }
    return                200 "Please, launch a Spark context firstly
                               or check url correctness";
}
location ^~ /spark_jobhistory {
    rewrite               ^/spark_jobhistory(.*)$ $1$2  break;
    proxy_pass            http://localhost:18080/;
    return                599;
    proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$
                          $scheme://$http_host/spark_jobhistory/$2;
}

Part 5: Full nginx.conf

user              USER GROUP;
worker_processes  1;

error_log         /var/log/nginx/error.log;
pid               /var/run/nginx.pid;

events {
}

http {
    # HTTP handler basic configuration
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    root /usr/share/nginx/html;

    # New variables
    map $http_referer $valid_http_referer {
        ~(/jup|/hdfs|/yarn|/spark) $http_referer;
    }
    map $valid_http_referer $app_link {
        "" $scheme://$http_host/jup;
        ~^([^/]+//[^/]+)(/[^/]+)?/?  $1$2;
    }
    map $uri $no_app_uri {
        ~^/[^/]*(/.*)  $1;
        default  $uri;
    }
    map $uri $requested_app {
        ~^/(([^/]+)/)?  $scheme://$http_host/$2;
    }

    # Upstreams
    upstream notebook {
        server localhost:8888;
    }

    # Servers configuration
    server {
        # Server basic configuration
        listen 80;
        
        # Trailing slash auto-completion
        if (-d $request_filename) {
            rewrite     [^/]$  $scheme://$http_host$uri/ permanent;
        }
        # Zero-level location configuration
        location = / {
            if ($http_referer ~ "^.+$") {
                return  307  $app_link?$args;
            }
            return      301  $scheme://$http_host/jup/;
        }
        
        # Common additional request headers for webapps
        error_page 599 = @common_proxy_headers;
        location @common_proxy_headers {
            proxy_set_header      HOST                      $http_host;
            proxy_set_header      Referer                   $http_referer;
            proxy_set_header      X_Forwarded_For           $remote_addr;
            proxy_set_header      X_Forwarded_Proto         http;
        }

        # Services' locations
        location ^~ /jup/ {
            rewrite               ^/jup(.*)$ $1$2           break;
            proxy_pass            http://notebook/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  $scheme://$http_host/jup/$2;
        }
        
        error_page 597 = @jup_upgrade_to_websocket;
        location @jup_upgrade_to_websocket {
            proxy_pass            http://notebook;
            proxy_set_header      HOST $http_host;
            # websocket support
            proxy_http_version    1.1;
            proxy_set_header      Upgrade "websocket";
            proxy_set_header      Connection "Upgrade";
            proxy_read_timeout    86400;
        }
        location ^~ /api/kernels {
            return                597;
        }
        location ^~ /terminals {
            return                597;
        }

        location = /hdfs_ui/hdfs_ui_fixer.js {}
        location ^~ /hdfs_ui {
            rewrite               ^/hdfs_ui(.*)$ $1$2       break;
            proxy_pass            http://localhost:50070/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  
                                  $scheme://$http_host/hdfs_ui/$2;

            sub_filter            '</body>'    '<script type="text/javascript" 
                                  src="/hdfs_ui/hdfs_ui_fixer.js"></script></body>';
        }
        location ^~ /hdfs_node {
            rewrite               ^/hdfs_node(.*)$ $1$2     break;
            proxy_pass            http://localhost:50075/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  
                                  $scheme://$http_host/hdfs_node/$2;

            sub_filter            'http://REMOTE_HOSTNAME:50075'    '/hdfs_node';
        }
        location = /yarn_ui/yarn_ui_fixer.js {}
        location ^~ /yarn_ui {
            rewrite               ^/yarn_ui(.*)$ $1$2       break;
            proxy_pass            http://localhost:8088/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  
                                  $scheme://$http_host/yarn_ui/$2;

            sub_filter '</html>' '<script type="text/javascript" 
                       src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
        }
        location ^~ /yarn_node {
            rewrite               ^/yarn_node(.*)$ $1$2     break;
            proxy_pass            http://localhost:8042/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  
                                  $scheme://$http_host/yarn_node/$2;
            
            sub_filter '</html>' '<script type="text/javascript" 
            src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
        }
        location ^~ /yarn_jobhistory {
            rewrite               ^/yarn_jobhistory(.*)$ $1$2  break;
            proxy_pass            http://localhost:19888/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$  
                                  $scheme://$http_host/yarn_jobhistory/$2;
            
            sub_filter '</html>' '<script type="text/javascript" 
            src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
        }
        location ^~ /spark_ui {
            error_page            404 502 =                       @spark_ui_error_page;
            rewrite               ^/spark_ui(.*)$ $1$2            break;
            proxy_pass            http://localhost:4040/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/proxy/(.*)$  
                                  $scheme://$http_host/yarn_ui/proxy/$2;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$        
                                  $scheme://$http_host/spark_ui/$2;
        }
        location @spark_ui_error_page {
            default_type          text/plain;
            if (-d $request_filename) {
            return                200 "Please, launch a Spark context firstly";
            }
            return                200 "Please, launch a Spark context firstly 
                                       or check url correctness";
        }
        location ^~ /spark_jobhistory {
            rewrite               ^/spark_jobhistory(.*)$ $1$2  break;
            proxy_pass            http://localhost:18080/;
            return                599;
            proxy_redirect        ~^([^/]*://[^/]*)?/(.*)$   
                                  $scheme://$http_host/spark_jobhistory/$2;
        }        
        
        # "Location not found" case handler
        location ~ / { 
            if ($requested_app = $app_link) {
                return            404;
            }
            return                307           $app_link$request_uri;
            #add_header           X-debug       $app_link $requested_app"  always;
        }
    }
}

Epilogue

In this article, we walked through nginx.conf which configures NGINX for proxying Jupyter and HDFS, YARN, Spark UIs. It can be necessary because some packages can be not included by default. Consider, for example, subfilter packages which are presented only in the widest default build. Moreover, building from the source helps to get rid of unnecessary packages. Some configurational scripts is also presented. They assist to adjust the environment according to nginx.conf and vise versa. Finally, as a bonus, the YARN Web UI hostname issue is considered.

Listed here configuration file was utilized in the “bigdatateam/hysh” docker image.

In the comment section, leave your thoughts on improving the configurational file and suggestions of how to include proxying to SSH.

Thanks Alexey Dral for editing!

Author: Nikolay Veld from BigDataTeam

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0