The FSDA tool will support and facilitate the management of large data sets© EU, 2012
New data analysis software addresses anomalies
In co-operation with the Italian University of Parma, the JRC Institute for the Protection and Security of the Citizen (IPSC) has developed a new software tool in support of a robust and efficient analysis of data sets, ensuring an output unaffected by anomalies in the provided data. The tool is useful in detecting in data potential anomalies (outliers), even when they occur in groups.
The software, Forward Search for Data Analysis (FSDA), was developed for wide applicability; it has already been used in antifraud applications and it is expected that it will be used in chemometrics, a scientific discipline that addresses problems in chemistry, biochemistry, medicine, biology and chemical engineering, detection of computer network intrusions, e-commerce and credit cards frauds, customer and market segmentation and detection of spurious signals in data acquisition systems.
The FSDA developers conducted demonstrations using real datasets related to the 6th Italian census on agriculture and to a bio-pharmaceutical problem concerning the determination of cut-off points for chemical assays on the basis of serum sample measurements. The datasets were analysed in co-operation with the end users and the FSDA was used in the preliminary exploration of the data for checking robustly deviations from normality, finding appropriate transformations for non-normal and contaminated data and, finally, for detecting multivariate outliers. The objective of FSDA is to simplify these statistical tasks.
FSDA was released in early January 2012, and is copyright of the European Union and the University of Parma. It is protected under European Union Public Licence (EUPL), which is a free software license granting recipients rights to modify and redistribute the code. It is based on the programming language MATLAB. Developed by the multi-national corporation MathWorks, MATLAB is intended primarily for numerical computing and it enables users to perform intensive tasks faster than with traditional programming languages such as C, C++, and FORTRAN.