For years scientists have been working on systems to better classify organisms using genetic "barcodes", i.e. DNA sequences that are used as unique identifiers for each species and that are found at specific locations of their genomes. For plants, for example, the currently used DNA barcodes are found in the plastids, small organelles within the plant cells that carry out photosynthesis. The genome of the plastid is very small in comparison to the plant genome, which has made the identification of DNA barcodes easier. The downside is that the plastid barcodes are insufficient for satisfactory plant species identification, since they do not contain enough differences to discriminate all the different plants and their varieties.
The novel approach presented by JRC scientists works with the entire genomes of plants, which are orders of magnitude larger and more complex than the plastids. For this reason, they are an excellent source for the discovery of novel DNA barcodes. Their size and complexity, however, have until now limited the success of such attempts.
The study shows how to take advantage of the ongoing improvements in the availability of both plant genome sequences, bioinformatics tools and computing power. A stepwise strategy, implemented as a bioinformatics pipeline on the JRC high performance computing platform, automatically identifies potential novel DNA barcoding regions in the genomes of plants.
To show that the strategy works, one of the identified DNA barcode was then tested in a set of laboratory experiments, including testing of strawberries and of a box of muesli bought at the local supermarket, and the expected species (as indicated on the label) were detected.
Detecting correctly the species present in a sample is important in many context, e.g. to identify the responsible agent (bacterium or virus) in case of an infection, or for food safety and quality-control to properly label the ingredients and to fight labelling fraud.