Fellegi-Sunter and Jaro Approach to Record Linkage (Method)


The Fellegi and Sunter method is a probabilistic approach to solve record linkage problem based on decision model. Records in data sources are assumed to represent observations of entities taken from a particular population (individuals, companies, enterprises, farms, geographic region, families, households...). The records are assumed to contain some attributes identifying an individual entity. Examples of identifying attributes are name, address, age and gender when dealing with people; style (or name) of a firm, legal form, address, number of local units, number of employees, turnover value when dealing with businesses. According to the method, given two (or more) sources of data, all pairs coming from the Cartesian product of the two sources has to be classified in three independent and mutually exclusive subsets: the set of matches, the set of non-matches and the set of pairs requiring manual review. In order to classify the pairs, the comparisons on common attributes are used to estimate for each pair the probabilities to belong to both the set of matches and the set of non-matches. The pair classification criteria is based on the ratio between such conditional probabilities. The decision model aims to minimize both the misclassification errors and the probability of classifying a pair as belonging to the subset of pairs requiring manual review.


