We explain here how to publish/update website contents on the web hosting infrastructure of DIGIT/C1 ISHS.
DIGIT/C offers a high availability and high performance hosting infrastructure that is being comprised, among other elements, of back-end web server instances and application servers for hosting and serving both static and dynamic sites.
The dynamic sites supported by the standard Apache web servers are mainly sites based on Coldfusion and in some cases also sites using CGI scripts.
Dynamic sites based on particular technologies (i.e. Weblogic) are being hosted on individual application servers and are being integrated with the other related sites using reverse proxy mappings.
Direct HTTP access to the back-end web servers hosting the static sites is denied by the standard web server configuration.
Thus under the standard configuration it is possible to access the hosted sites only through a reverse proxy URL. (i.e. http://europa.eu/epso).
Purely static sites (sites without Coldfusion or CGI elements) depending on their user audiences (internal/external) are being hosted on one of the two core shared hosting infrastructures, namely the europa.(DOT)eu and Intracomm hosting infrastructure.
Each static web hosting infrastructure is comprised of two web hosting environments.
Notably an acceptance (staging) and a production hosting environment constitute the minimum infrastructure necessary for website publishing operations.
The underlying static sites hosting infrastructure can be fully failed-over between two remote Data Centres in Luxembourg.
Dynamic sites currently associated with the standard Apache web servers can fall under one of the following categories:
- Coldfusion dynamic sites (the majority, currently supported version is Coldfusion MX 7, but new developments should start with Coldfusion 8)
- Java dynamic sites (Weblogic)
- CGI dynamic sites (a minority, mostly legacy applications)
For CGI sites in addition to the shared acceptance and production hosting environments, a dedicated development hosting environment is also available, mainly to facilitate the maintenance of the existing CGIs.
Development web server hosting environments are also offered for the implementation of Coldfusion solutions which are preferable over CGI solutions even in simple cases like the construction of contact web forms where no database layer is involved.
Apart from the acceptance and production hosting environment types it is also possible to request other hosting environment types (i.e. test, training, etc). It should be mentioned that for each Coldfusion server instance a peer Apache web server instance is configured as the entry point to the Coldfusion applications hosted by the peer Coldfusion instance.
Thus the web servers are being configured to function as a front-end to Coldfusion servers.
Finally dynamic sites can be also developed for hosting by the application server infrastructure technologies not related to the standard Apache web servers (i.e. Weblogic or Oracle Application Server).
Use on EUROPA websites
For ease of management and maintenance, we strongly recommend to follow a set of rules. These rules are not necessarily of any significance for the normal user browsing the web pages. However an easier management and maintenance will lead to reducing errors which in the end will benefit the user.
These recommendations should benefit all parties involved (users, webmasters and the hosting infrastructure services staff).
Keep it simple
- Simpler is more stable and less prone to error.
- Simpler is more compatible.
- Simpler is easier to maintain.
- Simpler is easier to use.
Dynamic contents VS static contents
Static HTML pages will always be served faster than the same HTML information generated through a programmed interface. This is especially true for pages that are requested frequently and can be cached.
We strongly recommend using applications only to serve information that is really dynamic. A HTML page that changes once a day cannot be considered dynamic information.
The DIGIT/C1 hosting services run on UNIX based servers. File names are consistently case sensitive on UNIX systems, unlike on some proprietary systems. Creating hyperlinks with UPPERcase or mixed case might create problems when the pages are transferred from a Windows system to the web server. We strongly recommend using only lowercase in filenames and hyperlinks, even when generating data on a UNIX system, because this data might later be maintained from a Windows system.
Application.cfm is an important exception. This file is the base of every ColdFusion application, and this filename is most definitely case sensitive. In addition, special characters other than the underscore ( _ ) and the dot ( . ) characters should not be used and the length of filenames should be kept as short as possible and never be longer than 255 characters. Furthermore the length of URLs should be also kept as short as possible and never exceed 2000 characters.
Each static site should have a file called "index.html" or "index.htm" as the first entry point into it. This allows a user to return to the entry page of a site by simply truncating the URL. Or, the user can access a site using a shorter, truncated, URL. It also makes the entry point to the document visible for the maintainer of the data. For example, the web server will respond to the URL "http://europa.eu/mysite/" with the data in file "/ec/prod/app/web/euroots/europa.eu/htdocs/mysite/index.html".
If there would not be an "index.html" file in that sub directory, then the user would receive either a directory listing all files within the "mysite" directory or a 'Not Found' (404) or a 'Forbidden' (403) page depending on the configuration of the underlying web server.
For multilingual sites, index.html should be a splash page with links to the individual language index pages (index_en.htm, index_fr.htm, index_el.htm, etc.)
The full list of default index file names is:
- for the DOTEU static site servers: index.htm, index.html, default.htm, default.html, home.htm, home.html, index_en.htm, index_en.html
- for the Coldfusion servers: index.html, index.htm, index.cfm
"Incomplete" URLs pointing to directories instead of to files should have an "ending slash" (e.g. "/publishing/" instead of "/publishing"). Upon reception of an "incomplete" URL without "ending slash", the web server will respond with a "redirect", telling the browser to request the URL with the "ending slash". In other words, an "incomplete" URL will trigger an extra request to the server causing longer response times for the end user.
Use relative links (intra-domain links)
Relative links should be used instead of absolute links when linking between pages within the same site domain.
Do not include "http://www.cc.cec/", "http://europa.eu", "http://ec.europa.eu/" and the likes in the 'href' links referring to pages located within the same site.
The consistent usage of relative links makes sites easier to maintain, avoid broken links and do not require adaptations in the occasion of site domain or URL context changes.
When to use absolute links (inter-domain links)
When adding links to pages hosted outside your own site domain then an absolute link has to be used.
The frond-end reverse proxy domain names should be always used and never the back-end web server names.
Also make sure that the target domain address in the URL exists and can be resolved by the DNS servers throughout the Commission's network as well as on Internet (for Europa sites).
Although the absolute link might work internally it will not work for users from the Internet since "wlseures.cc.cec.eu.int" can not be resolved by the DNS servers.
The caching policy currently implemented on the DOTEU (europa.eu, ec.europa.eu) reverse proxies is the following:
- If the file extension is .do .cfm .php. rss .jsp .faces ----> NO CACHE.
- If file extension is .gif or .jpg ----> cache with a maximum expiration time of 8 hours.
- For all other file types ----> cache with a maximum cache expiration time of 15 minutes.
For webmasters that can not wait for the 8 hours maximum expiration time to elapse before a new image is refreshed in the proxy cache, it is possible to force the cache refresh by issuing in Firefox a Shift+Refresh request for the image from their web browser.
Reverse proxy servers BlueCoat proxy SG 8100-C or 8100-20 (Managed by DIGIT/C2 SNET team)
If your site contents are not suitable to be indexed by search engines (i.e. frequently changing dynamic or outdated contents), then a robots.txt file should exist at the document root of your site (i.e. http://ec.europa.eu/robots.txt).
The access to dynamic sites from the agents of search engines (i.e. Google) knows as robots increases the number of requests received and in extension increases the work load of the back-end web server that can become overloaded and impact the site accessibility.
Here is the contents of robots.txt under the document root of ec.europa.eu:
Although it is possible that some non-standard robots will ignore the robots.txt file, most of them will not and the number of requests received by web robots should be kept under control.
Further information about robots.txt and usage examples can be found at the following page http://www.robotstxt.org/orig.html.
Whenever an RSS file is moved/renamed or simply withdrawn from production, the original RSS file name should not disappear because RSS clients will continue polling for the missing feed indefinitely and will cause an impact to the web server where the missing RSS file used to exist.
Separating static pages from applications
In the past, it was a common practice for EUROPA site webmasters to mix their dynamic sites (i.e. Coldfusion) with the associated static sites.
This site management practice results to the distribution of the static contents over the numerous application server environments and it is not possible to have an efficient centralised management of the static sites.
To make this separation possible, a site should be structured so that it will be easy to map the dynamic part as a subsite of the static site.
For example, if the static site xyz, hosted on the doteu infrastructure, is accessible with URL http://europa.eu/xyz the associated dynamic part, hosted on a separate application server, should be accessible as http://europa.eu/xyz/application thanks to an individual reverse proxy mapping to the application server that will seamlessly integrate the static and dynamic site parts.