Web servers record all the data of visits in log files. They are huge laundry lists of everything from every visitor like IP address, browser type, address of visited page, site referral, time and date visited and much more.
Incidentally, recorded data may vary dramatically based on the type and behaviour of browsers and telecom equipment being used to exchange the web traffic.
e-Commerce sites statistics request more precise and reliable data: a piece of program code is inserted in every page to be loaded in the visitor machine and capture additional visitor information. This technique requires specialized software running on a dedicated server.
Rather than being concerned with the absolute values of these statistics, we should pay attention to the evolution of these indicators over time.
Input data in web logs: proxy servers record every browser's requests from visitors in web logs. They contain a lot of information related to the web audience, such as visitor network address, browser type, address of visited page, referred page, site referral, search keywords, date and time visited and much more…
Pre-processing: every log file is first processed in order to translate the network IP address into corresponding domain name. Ex: 126.96.36.199 becomes businesslinkwessex.co.uk. This conversion is only accepted by 65% of domains. It allows indicating country of visitors.
Processing: the application starts every night at 01:30 and process the log files of the past day. Main steps of the application perform extraction, transformation and loading of data into the tables of the EUROPA Webmart (specialized database for web audience data). Due to data volume (~15 Gbytes/day) this process actually (May 2012) takes more than 15 hours.
Reporting can start immediately after processing. Reports of every individual site are produced in sequence depending a dynamic order based on the consultation by webmasters. This order allows the completion of reporting for sites whose statistics are most consulted. These sites' reports are available as soon as processing is terminated.
The EUROPA Analytics service (previously called EUROPA Statistics) is operational since February 2004. Initially based on SAS WebHound solution, it was migrated to SAS Web Analytics on January 2010. Traffic data of the previous day is processed to generate fresh audience reporting for each of ~150 main sites hosted on EUROPA. Main key figures per site allow audience analysis by metrics like visits, visitors, pages and files, errors, all URLs, etc.
Access to data
Public access to standard reports is available via the IPG pages Simplified interface (kiosk) together with access to archived reports (since 2003).
All EC staff can request access to the corporate reporting tool (see Reporting tool), providing reports about EUROPA sites. External access is also possible provided an authentication by an internal procedure (CUD). Contact COMM EUROPA MANAGEMENT for more information.
In the case of confidentiality issues, access to reports of a particular sites can be limited to owner groups. A motivated restriction request can be sent to COMM EUROPA MANAGEMENT.
General technical information
EUROPA Analytics data are stored in webmarts dedicated to EUROPA sites following configuation files provided by DGs and services. At this stage, only EUROPA sites hosted at the Data Centre in Luxembourg are processed.
Another application almost identical to EUROPA Analytics has been developed for sites under MyIntracomm portal since mid-2004. MyIntracomm data are stored in its own webmart.
Detailed data stored on webmarts are kept for a period of 24 months (every single page view can be retrieved during this period). For the aggregated data (calculated key figures for a given period) the following retention periods have been fixed:
- daily key figures: 365 days
- monthly key figure: 18 months
- yearly key figure: 5 years
EUROPA Analytics' service is provided by DG Communication, EUROPA site Unit (COMM.A.5).
Contact: COMM EUROPA MANAGEMENT
The service covers the following aspects in maintenance of production application:
- ad hoc coaching
- analysis assistance to trained users
- configuration of a site analytics
- future specifications and developments
Please check the Europa Analytics FAQ page before submitting a question.
All the previous analytics solutions related tools, procedures and figures have been abandoned since 2004.
The major impact introduced by the current (SAS Web Analytics) and previous (SAS Web Hound) applications resides in its ability to separate the part of audience driven by spiders and indexation robots from individual visitors.
This part represents ~ 60% of total traffic overtime.