The article has tips for serving several web UI through one port; the pitfalls of hiding a web UI in a sublocation; hints and best practices to create maintainable NGINX configuration.
Teaser
The article covers:
- tips for serving several web UI through one port
- the pitfalls of hiding a web UI in a sublocation
- hints and best practices to create maintainable NGINX configuration
Prologue
Initially, the task appeared because BigData Team moved to the recently released Coursera Labs.
From FAQ for "Create Lab Activities with Coursera Labs" help article:
Q: Is there a mechanism to expose more than one port within a student lab container?
A: Coursera Labs supports single port, single application containers. However, it may be possible to run a NGINX reverse proxy to achieve this. Example: If there are two apps running: app1 at 9000 and app2 at 9001. You could run NGINX to route /app1 to 9000 and /app2 to 9001. In this case, the NGINX port is exposed from the container.
This sounds simple, but if you use Jupyter Notebook as one of the services, NGINX configuration becomes challenging. You can find some tips for Jupyter setup but they are about placing the web service to the root. Firstly, it is not a good idea to place one app at the root level and others either to one-level location ("/app1" is one-level location) or to the root also. Secondly, only adding a one-level location prefix to suggested location does not allow to run the whole thing. The reason is simple: most web services expect that they are located at the root and serve the content as for the connection through the root location. Therefore, some additional configuration is needed. This configuration will be presented below.
As a bonus, the article also discusses how to overcome the issue of the wrong port redirection. For instance, you ask the port 10080, but NGINX redirects to the port 80. It appears when your docker container forwards the local port X to the container’s port Y where X is not equal to Y, Y is the port which NGINX listens to.
Part 1: Basic Configuration
Notes
- Including types, for example, allows browsers to render pages with CSS.
- Four additional variables (see “New variables” section) were set using the “
map
” directive so that the requested url can be redirected to the correct web app. - Repeating lines can be defined only once in the named location. Then using http error handling mechanism, the connection data from different locations can be processed according to this one service named location. According to what I have seen on different internet resources, it is the most popular practice.
- Sometimes, some guides suggest using
$host
variable for the "HOST
" header. If $host
is not what a user enters in the address bar of the browser, the proxying is broken in this case. One of such cases is when port forwarding is used before NGINX (see docker run -p
for example). Therefore, I recommend to use $http_host
as the value of "HOST
" request header. - If a non-http request is proxied, then change
X_Forwarded_Proto
. - The location blocks order is crucial, because NGINX match uri and the locations in the specific order generated using locations and its modes.
- Services can be on/off/temporary off (Spark UI when no context is initialized). A permanent redirection allows a browser to save one of the states (or some substates) into the cache and show it even when the state is actually changed. Thus, 301 (Moved Permanently) and 308 (Permanent Redirect) are not used in the return directive inside the "Location not found" case handler. 302 (Found, previously "Moved temporarily") is not suitable because it forces to change
POST
requests to GET
in some browsers while 307 (Temporary Redirect) saves HTTP method. - Example of the debug header is also provided because it can be quite helpful in many NGINX configuration debugging issues.
user USER GROUP;
worker_processes 1;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
}
http {
# HTTP handler basic configuration
include /etc/nginx/mime.types;
default_type application/octet-stream;
root /usr/share/nginx/html;
# New variables
map $http_referer $valid_http_referer {
~(/jup|/hdfs|/yarn|/spark) $http_referer;
}
map $valid_http_referer $app_link {
# TODO: set default service location
"" $scheme://$http_host/default_service;
~^([^/]+//[^/]+)(/[^/]+)?/? $1$2;
}
map $uri $no_app_uri {
~^/[^/]*(/.*) $1;
default $uri;
}
map $uri $requested_app {
~^/(([^/]+)/)? $scheme://$http_host/$2;
}
# Upstreams
...
# Servers configuration
server {
# Server basic configuration
listen 80;
# Trailing slash auto-completion
if (-d $request_filename) {
rewrite [^/]$ $scheme://$http_host$uri/ permanent;
} # Ensure that you have directories with all app names
# in the root selected above
# Zero-level location configuration
location = / {
if ($http_referer ~ "^.+$") {
return 307 $app_link?$args;
}
return 301 $scheme://$http_host/jup/;
}
# Common additional request headers for webapps
error_page 599 = @common_proxy_headers;
location @common_proxy_headers {
proxy_set_header HOST $http_host;
proxy_set_header Referer $http_referer;
proxy_set_header X_Forwarded_For $remote_addr;
proxy_set_header X_Forwarded_Proto http;
}
# Services' locations
...
# "Location not found" case handler
location ~ / {
if ($requested_app = $app_link) {
return 404;
}
return 307 $app_link$request_uri;
#add_header X-debug "$app_link $requested_app" always;
}
}
}
Part 2: Location Configuration Template for Proxying to the Service
Notes
- Paths with and without trailing / are (often quite) different for NGINX, thus pay attention to / placement.
- ^~ prevents matching a request address to
regexp
-locations if the address matches this location. regexp
-location is used for handling not matched locations. return 599;
raise HTTP error 599 with the handler we defined earlier (see @common_proxy_headers
).
location ^~ /some_service/ {
proxy_pass http://some_service_address/;
return 599;
}
If the service is located at the one-level location, the location should be modified in the following way:
location ^~ /some_service/ {
rewrite ^/some_service(.*)$ $1$2 break;
proxy_pass http://some_service_address/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$ $scheme://$http_host/some_service/$2;
}
rewrite
removes the matched location from the uri. break
is necessary here because it prevents the new uri rematching. proxy_redirect
returns the removed part in order to keep the possibility to call the object(s) on this address.
Part 3: Jupyter Notebook Proxying Configuration
Notes
- Two more usage of the "do not repeat yourself" mechanisms (custom error handlers, named locations…) were used here.
- The "static" subsection is optional and even can harm because of changes in Jupyter Notebook. However, it can boost the performance because the static files will be served by NGINX but not by proxied service with this section. In the final configuration file, this section is omitted.
- Jupyter kernels and terminals use sockets. Thus, connections to them must be "upgraded".
Firstly, put the following block into "Upstream
" section:
upstream notebook {
server localhost:8888;
}
Then, put this block into "Services
' locations" block:
location ^~ /jup/ {
rewrite ^/jup(.*)$ $1$2 break;
proxy_pass http://notebook/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$ $scheme://$http_host/jup/$2;
}
# "Static" subsection start
error_page 598 = @jup_static_like;
location ^~ /jup/static {
return 598;
}
location ^~ /jup/custom {
return 598;
}
location ^~ /jup/nbextensions/widgets/notebook/js/ {
root /opt/conda/share/jupyter/nbextensions/jupyter-jupyter-js-widgets;
return 598;
}
location @jup_static_like {
try_files $no_app_uri $no_app_uri/ =404;
#add_header X-debug "$no_app_uri" always;
}
# "Static" subsection end
error_page 597 = @jup_upgrade_to_websocket;
location @jup_upgrade_to_websocket {
proxy_pass http://notebook;
proxy_set_header HOST $http_host;
# websocket support
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
proxy_read_timeout 86400;
}
location ^~ /api/kernels {
return 597;
}
location ^~ /terminals {
return 597;
}
Part 4: HDFS+YARN+Spark Proxying Configuration
Notes
- HDFS and Spark proxying is easy, it is just the template usage. For the Spark UI, additional conditional redirection to YARN UI was added. It is necessary because of automatic redirection of Spark UI to YARN UI when a Spark context master is set to 'yarn'. Also, when no Spark context is running, there will be no running web application. At this moment, NGINX will return 502, which will be processed by the added custom error handler. I changed 502 to 200 with additional information.
- HDFS Datanode page and YARN UI pages form links after the JS scripts computation, thus the link replacer is needed to be set after other page’s content as yet another JS script.
The content of hdfs_ui_fixer.js:
setTimeout( function() {
$('#table-datanodes>tbody>tr').each(function(index, element){
element.innerHTML = element.innerHTML.replace(
/((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:50075/g,
'$1/hdfs_node'
)}
);
console.log('server addresses in hdfs ui are fixed');
}, 2000)
The content of yarn_ui_fixer.js:
setTimeout( function() {
$('tbody>tr>td').each(function(index, element){
element.innerHTML = element.innerHTML.replace(
/((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:8088/g,
'$1/yarn_ui'
)
element.innerHTML = element.innerHTML.replace(
/((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:8042/g,
'$1/yarn_node'
)
element.innerHTML = element.innerHTML.replace(
/((>|href=)[^<]*?)(http:)?(\/\/)?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:19888/g,
'$1/yarn_jobhistory'
)
});
console.log('server addresses in yarn ui are fixed');
}, 2000)
- The HDFS node page contain links to another system’s webservice on it using the internal hostname. If
REMOTE_HOSTNAME
is replaced with the actual internal hostname, the links to the locations defined in nginx.conf will appear on the returned UI webpages.
location = /hdfs_ui/hdfs_ui_fixer.js {}
location ^~ /hdfs_ui {
rewrite ^/hdfs_ui(.*)$ $1$2 break;
proxy_pass http://localhost:50070/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/hdfs_ui/$2;
sub_filter '</body>' '<script type="text/javascript"
src="/hdfs_ui/hdfs_ui_fixer.js"></script></body>';
}
location ^~ /hdfs_node {
rewrite ^/hdfs_node(.*)$ $1$2 break;
proxy_pass http://localhost:50075/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/hdfs_node/$2;
sub_filter 'http://REMOTE_HOSTNAME:50075' '/hdfs_node';
}
location = /yarn_ui/yarn_ui_fixer.js {}
location ^~ /yarn_ui {
rewrite ^/yarn_ui(.*)$ $1$2 break;
proxy_pass http://localhost:8088/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_ui/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_node {
rewrite ^/yarn_node(.*)$ $1$2 break;
proxy_pass http://localhost:8042/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_node/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_jobhistory {
rewrite ^/yarn_jobhistory(.*)$ $1$2 break;
proxy_pass http://localhost:19888/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_jobhistory/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /spark_ui {
error_page 404 502 = @spark_ui_error_page;
rewrite ^/spark_ui(.*)$ $1$2 break;
proxy_pass http://localhost:4040/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/proxy/(.*)$
$scheme://$http_host/yarn_ui/proxy/$2;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/spark_ui/$2;
}
location @spark_ui_error_page {
default_type text/plain;
if (-d $request_filename) {
return 200 "Please, launch a Spark context firstly";
}
return 200 "Please, launch a Spark context firstly
or check url correctness";
}
location ^~ /spark_jobhistory {
rewrite ^/spark_jobhistory(.*)$ $1$2 break;
proxy_pass http://localhost:18080/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/spark_jobhistory/$2;
}
Part 5: Full nginx.conf
user USER GROUP;
worker_processes 1;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
}
http {
# HTTP handler basic configuration
include /etc/nginx/mime.types;
default_type application/octet-stream;
root /usr/share/nginx/html;
# New variables
map $http_referer $valid_http_referer {
~(/jup|/hdfs|/yarn|/spark) $http_referer;
}
map $valid_http_referer $app_link {
"" $scheme://$http_host/jup;
~^([^/]+//[^/]+)(/[^/]+)?/? $1$2;
}
map $uri $no_app_uri {
~^/[^/]*(/.*) $1;
default $uri;
}
map $uri $requested_app {
~^/(([^/]+)/)? $scheme://$http_host/$2;
}
# Upstreams
upstream notebook {
server localhost:8888;
}
# Servers configuration
server {
# Server basic configuration
listen 80;
# Trailing slash auto-completion
if (-d $request_filename) {
rewrite [^/]$ $scheme://$http_host$uri/ permanent;
}
# Zero-level location configuration
location = / {
if ($http_referer ~ "^.+$") {
return 307 $app_link?$args;
}
return 301 $scheme://$http_host/jup/;
}
# Common additional request headers for webapps
error_page 599 = @common_proxy_headers;
location @common_proxy_headers {
proxy_set_header HOST $http_host;
proxy_set_header Referer $http_referer;
proxy_set_header X_Forwarded_For $remote_addr;
proxy_set_header X_Forwarded_Proto http;
}
# Services' locations
location ^~ /jup/ {
rewrite ^/jup(.*)$ $1$2 break;
proxy_pass http://notebook/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$ $scheme://$http_host/jup/$2;
}
error_page 597 = @jup_upgrade_to_websocket;
location @jup_upgrade_to_websocket {
proxy_pass http://notebook;
proxy_set_header HOST $http_host;
# websocket support
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
proxy_read_timeout 86400;
}
location ^~ /api/kernels {
return 597;
}
location ^~ /terminals {
return 597;
}
location = /hdfs_ui/hdfs_ui_fixer.js {}
location ^~ /hdfs_ui {
rewrite ^/hdfs_ui(.*)$ $1$2 break;
proxy_pass http://localhost:50070/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/hdfs_ui/$2;
sub_filter '</body>' '<script type="text/javascript"
src="/hdfs_ui/hdfs_ui_fixer.js"></script></body>';
}
location ^~ /hdfs_node {
rewrite ^/hdfs_node(.*)$ $1$2 break;
proxy_pass http://localhost:50075/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/hdfs_node/$2;
sub_filter 'http://REMOTE_HOSTNAME:50075' '/hdfs_node';
}
location = /yarn_ui/yarn_ui_fixer.js {}
location ^~ /yarn_ui {
rewrite ^/yarn_ui(.*)$ $1$2 break;
proxy_pass http://localhost:8088/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_ui/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_node {
rewrite ^/yarn_node(.*)$ $1$2 break;
proxy_pass http://localhost:8042/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_node/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /yarn_jobhistory {
rewrite ^/yarn_jobhistory(.*)$ $1$2 break;
proxy_pass http://localhost:19888/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/yarn_jobhistory/$2;
sub_filter '</html>' '<script type="text/javascript"
src="/yarn_ui/yarn_ui_fixer.js"></script></html>';
}
location ^~ /spark_ui {
error_page 404 502 = @spark_ui_error_page;
rewrite ^/spark_ui(.*)$ $1$2 break;
proxy_pass http://localhost:4040/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/proxy/(.*)$
$scheme://$http_host/yarn_ui/proxy/$2;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/spark_ui/$2;
}
location @spark_ui_error_page {
default_type text/plain;
if (-d $request_filename) {
return 200 "Please, launch a Spark context firstly";
}
return 200 "Please, launch a Spark context firstly
or check url correctness";
}
location ^~ /spark_jobhistory {
rewrite ^/spark_jobhistory(.*)$ $1$2 break;
proxy_pass http://localhost:18080/;
return 599;
proxy_redirect ~^([^/]*://[^/]*)?/(.*)$
$scheme://$http_host/spark_jobhistory/$2;
}
# "Location not found" case handler
location ~ / {
if ($requested_app = $app_link) {
return 404;
}
return 307 $app_link$request_uri;
#add_header X-debug $app_link $requested_app" always;
}
}
}
Epilogue
In this article, we walked through nginx.conf which configures NGINX for proxying Jupyter and HDFS, YARN, Spark UIs. It can be necessary because some packages can be not included by default. Consider, for example, subfilter packages which are presented only in the widest default build. Moreover, building from the source helps to get rid of unnecessary packages. Some configurational scripts is also presented. They assist to adjust the environment according to nginx.conf and vise versa. Finally, as a bonus, the YARN Web UI hostname issue is considered.
Listed here configuration file was utilized in the “bigdatateam/hysh” docker image.
In the comment section, leave your thoughts on improving the configurational file and suggestions of how to include proxying to SSH.
Thanks Alexey Dral for editing!
Author: Nikolay Veld from BigDataTeam