HTTP resolution

Introduction

Starting from version 2.2.0 ShowVoc, by means its backend SemanticTurkey, added support for HTTP resolution and content negotiation. The management of these two features is based on sets of rules for activating the resolution and for rewriting resource URIs. The related configuration panel is located in the Administration dashboard page, under "Http resolution" tab.

HTTP resolution activation rules

The main configuration panel of the HTTP resolution shows a table with a row for each dataset in ShowVoc.

Here it is possible to enable HTTP resolution for a specific dataset and to provide regular expressions that need to be matched in order to let the dataset handle URIs.

Under the Active column a button indicates the status of the resolution: the black cross button tells that the resolution is disabled, if clicked it turns to a green "checkmark" button that tells that it is enabled. Once enabled, beside such botton, an input field allows you to provide a regular expression that needs to be match in order to mapping an input IRI to the specific dataset (a default regular expression based on the dataset baseURI is automatically suggested). Multiple regular expressions can be specified for the same dataset.

At the bottom of the table there is a checker for testing the activation rules. Given an IRI, the checker tells which Dataset will handle it. The checker is also able to detect if an input URI is matched by multiple regular expression, useful for preventing undesired behaviours.

By clicking in the cog button of each row it is then possible to configure the content negotiation by providing rewriting rules.

Rewriting rules

The editor that opens allows you to configure of content negotiation through the definition of Rewriting rules and Inverse rewriting rules.

The Rewriting rules describe how a request for a resource URI gets transformed into a URI pointing to the HTML page or a RDF format-serialized description of the resource (according the content of the Accept request header).

E.g.:

http://example#resource_123

Each rewriting rule is defined by a triple:

As per the activation rules, also in this case a tester is available for checking a rule. It is indeed possible to insert a URI and see how it is transformed to a target URI.

The Inverse rewriting rules define the inverse conversion, namely from the HTML page URL, or the RDF page URL, to the original resource URI.

E.g.:

http://example/resource_123.html
or
http://example/resource_123.ttl
http://example#resource_123

Each rule is a triple:

Proxy configuration

The configurations we have seen so far are necessary for instructing the system, namely SemanticTurkey, to know how to process requested URIs, but in order to involve SemanticTurkey in the whole HTTP resolution and content negotiation workflow, a Proxy server is supposed to be properly configured.

Let's briefly recap the steps of the workflow:

  1. A client requests a resource URL (e.g. http://www.fao.org/landandwater/c_121)
  2. The system according the Accept request header responds with a redirect response (303) transforming the resource URL to a target URL. This URL could be:
    1. a link for requesting the RDF description (e.g. http://www.fao.org/landandwater/c_121.rdf), whereas the Accept contains a MIME type of an RDF format.
    2. a reference to an HTML page (e.g. http://www.fao.org/landandwater/c_121.html), otherwise (e.g. Accept: text/html, application/xhtml+xml)
  3. Client requests the target URL and gets the final result which is, depending on the previous steps
    1. the RDF serialization of the resource description
    2. an HTML page

For this purpose, SemanticTurkey exposes two services:

So, it is necessary to configure a Proxy in order to involve such services in the workflow.

The first step is a proxy pass for diverting the requests for the resource to the HttpResolution/contentNegotiation service

http://www.fao.org/landandwater/c_121 → http://ST_HOST/semanticturkey/it.uniroma2.art.semanticturkey/st-core-services/HttpResolution/contentNegotiation?resURI=http://www.fao.org/landandwater/c_121

Supposing that the request contains an RDF format MIME type in the Accept header, such service would return a redirect with Location like http://www.fao.org/landandwater/c_121.rdf. So, the next step should be to "capture" this address and proxy passing to HttpResolution/rdfProvider.

http://www.fao.org/landandwater/c_121.rdf → http://ST_HOST/semanticturkey/it.uniroma2.art.semanticturkey/st-core-services/HttpResolution/rdfProvider?rdfResURI=http://www.fao.org/landandwater/c_121.rdf

Note: for details on SemanticTurkey services address structure, please refer to this documentation.

Now, in case the Accept doesn't contain an RDF format MIME type, the Location returned by HttpResolution/contentNegotiation would have been something like http://www.fao.org/landandwater/c_121.html. In such case we need to add a further proxy redirection in order to let ShowVoc handle the rendering of c_121 resource. At this purpose we implemented a dedicated page in ShowVoc that responds at path /data and gets the resource URI reference through the URL parameter resURI. So a further proxy rule should do the following:

http://www.fao.org/landandwater/c_121.html → http://SV_HOST/showvoc/#/data?resURI=http://www.fao.org/landandwater/c_121.html;

In order to clarify the above steps, the next section presents actual examples of proxy configuration.

HTTP resolution in action

Now, let's see an actual use case with example of configuration in a proxy server.

We will resolve resources of LandAndWater thesaurus, the same we took as example so far, and we will use Nginx as proxy server. Obviously we cannot intercept requests to domain http://www.fao.org/landandwater/ (the base URI of Land And Water), so we simulate the use case by requesting resources to a localhost address, so URIs will be something like http://localhost/c_121.

The following is a working configuration in Nginx

            
    server {
        listen       80;
        server_name  localhost;
        
        location /showvoc {
            proxy_pass http://localhost:1979/showvoc/;
        }
        
        location /semanticturkey {
            proxy_pass http://localhost:1979/semanticturkey;
        }
        
        # *1* - fired when URL doesn't start with showvoc/ or semanticturkey/ and ends with an RDF extension (.rdf, .ttl, ...)
        location  ~ ^/(?!showvoc/|semanticturkey/)(.+?)\.(rdf|ttl|owl|jsonld|n3|nt)$ {
            set $rdfResURI $scheme://$http_host$request_uri;
            proxy_pass http://localhost:1979/semanticturkey/it.uniroma2.art.semanticturkey/st-core-services/HttpResolution/rdfProvider?rdfResURI=$rdfResURI;
        }
        
        # *2* - fired when URL doesn't start with showvoc/ or semanticturkey/ and ends .html
        location  ~ ^/(?!showvoc/|semanticturkey/)(.+?)\.html$ {
            set $resURI $scheme://$http_host$request_uri;
            return 303 http://localhost/showvoc/#/data?resURI=$resURI;
        }
        
        # *3* - fired when URL doesn't start with showvoc/ or semanticturkey/ (and the previous two don't match)
        location  ~ ^/(?!showvoc/|semanticturkey/)(.+)$ {
            set $resURI $scheme://$http_host$request_uri;
            proxy_pass http://localhost:1979/semanticturkey/it.uniroma2.art.semanticturkey/st-core-services/HttpResolution/contentNegotiation?resURI=$resURI;
        }
    }
            
        

The sole activation rule configured for the project is http://localhost/.*. This means that each request for URL starting with http://localhost/ will be handled according the rewriting rules configured for the LandAndWater dataset.

There is only one rewriting rule configured for transforming URLs starting with http://localhost/ to a URL in the form ${sourceURI}.${format}. So, for example, URL http://localhost/c_121, according the Accept, will be rewrited as follow:

and so on...

The inverse rewriting rule for restore the original LandAndWater URL is the following.

The rule restore the original URI by appending the first capturing group ($1) to the base URI http://www.fao.org/landandwater/$1. Moreover it detects the format through the named capturing group format (anything that comes after the last ".").

Given these configurations, let's see two scenarios: in the first one user requests the HTML description of the resource, while in the second one request the N-Triples serialization of the same.

  1. User requests http://localhost/c_121 with Accept: text/html, application/xhtml+xml (default in browsers)
  2. The last rule (*3*) in proxy configuration "proxypasses" the request to .../HttpResolution/contentNegotiation?resURI=http://localhost/c_121
  3. Internally, the above service uses the activation rules and detects that such URI needs to be handled by LandAndWater dataset (URI matches the RegExp http://localhost/.*). So, it uses the rewriting rule configured in such dataset and, given the Accept, redirects (303) to http://localhost/c_121.html.
  4. Rule *2* of the proxy is fired and redirects user to http://localhost/showvoc/#/data?resURI=$resURIhttp://localhost/c_121.html
  5. At this page, ShowVoc communicates with SemanticTurkey in order to get the Dataset to access and the resource URI to focus on.

Let's see the second scenario

  1. User request http://localhost/c_121 with Accept: application/n-triples
  2. The last rule (*3*) in proxy configuration "proxypasses" the request to .../HttpResolution/contentNegotiation?resURI=http://localhost/c_121
  3. Internally, the above service uses the activation rules and detects that such URI needs to be handled by LandAndWater dataset (URI matches the RegExp http://localhost/.*). So, it uses the rewriting rule configured in such dataset and, given the Accept, redirects (303) to http://localhost/c_121.nt.
  4. Rule *1* of the proxy is fired and proxypasses http://localhost/c_121.nt to .../HttpResolution/rdfProvider?rdfResURI=http://localhost/c_121.nt
  5. Service HttpResolution/rdfProvider uses again the activation rule for detecting the Dataset. Then, it uses inverse rewriting rules for getting back the original URI (http://www.fao.org/landandwater/) and detecting the format (nt).
  6. Finally returns the N-Triples description of the resource.