Success story: The tomato genome decoded - FOOD-CT-2006-016214
Tomato fruits occupy a central place in the global diet and human nutrition. They are widely appreciated by consumers throughout the world and they contain lots of components that greatly benefit human health. Tomatoes are a highly valuable crop being a multi billion € industry for both fresh market and processed food industries. Tomato is also a very useful system for studying fruit development and ripening, as well as plant genetics. Decoding the genetic blueprint of tomato, i.e. the order of the nucleotide bases –adenine, guanine, cytosine, and thymine- in the DNA (DNA-sequencing) will therefore be extremely useful in providing the information necessary for future crop improvement.
Tomato is a member of the Solanaceae, the nightshades. Other members of this medium-sized family of about 4,000 species, include eggplants, potatoes, petunias, peppers, and tobaccos. This family encompasses species with diverse appearances some are shrubs, others bushes and some even trees. They can survive in very challenging growing conditions including living in deserts and on mountains. Some Solanaceous species have been used in medicine. Despite the astonishing diversity of these plants, they share a remarkably similar genetic code raising the question, how do similar sets of genes and proteins generate such different plants? This is one of the major questions being answered by the International Solanaceae Genome Project (SOL), an initiative bringing together partners from across the world to create a coordinated network of knowledge about the Solanaceae
EU-SOL and sequencing the tomato genome The worldwide SOL project was initiated in 2005 with research groups from 10 countries aiming to unravel the encoded information present in the tomato genome. Each country secured funding to sequence one or more of tomato’s 12 chromosomes through their national funding agencies. The five participating European countries, Spain, Italy, France, United Kingdom and the Netherlands were also funded through their participation in the EU-SOL project (FOOD-CT-2006-016214). This project has as an overall objective to develop high quality Solanaceous crops for consumers, processors and producers by exploitation of natural biodiversity. In this project, 53 research and industrial organizations derived from 15 different countries carry out a multi-disciplinary research programme geared to improving the quality of tomato and potato in Europe. Part of the EU-SOL research programme supported the sequencing of the tomato genome, both in producing sequence data and also in the development of computational methods to assembly, analyze and store the obtained DNA sequences.
Sequencing strategy: The first steps
The tomato selected to be sequenced was an old American processing-type cultivar named Heinz 1706. This particular tomato was selected because novel resources were already made for this cultivar and thus provided a rapid start to sequencing the tomato genome (DNA in the chromosomes). Genome sequencing is a great challenge as each chromosome is made up of a very long stretch of DNA (about 100 million base pairs),. Therefore, initially chromosomes were split into manageable chunks of approximately 100,000 nucleotides or base pairs that could be grown in bacteria, called Bacterial Artificial Chromosomes (BACs). These BACs were then split further and the building blocks read (sequenced). Overlapping chunks of sequence and overlapping BACs are then stitched together using dedicated computer software. This has been carried out for the 12 tomato chromosomes with long chains of overlapping BAC-sequences being produced, see figure. Due to cost constraints it was decided to focus the tomato sequencing project on gene rich regions ~ one quarter of the genome which is referred to as the euchromatin and which is known to contain approximately 90% of all the tomato genes. These gene sequences were considered as the most important targets to produce as the genes underlie all traits that make a tomato such a nutritious commodity.
After 3 years, approximately 50% of the DNA sequencing was completed. This represented very good progress from a diverse multinational effort but it became clear that further progress in certain regions would be difficult as BAC clones suitable for sequencing could not be generated for specific parts of the tomato genome and existing gaps in the euchromatin would remain.
Next Generation Sequencing
To overcome this the consortia led by EU countries adopted the use of novel DNA sequencing technologies that had emerged. These are collectively referred to as Next Generation Sequencing (NGS) technologies. Compared to the traditional sequencing technology, the NGS technologies allow more than a thousand-fold increase in sequencing speed and sequencing capacity. Moreover, the price per sequenced nucleotide has dropped from about $ 5 per kb (1,000 bases) to about $ 0.20 per kb using NGS technologies. This Tomato Next Generation Sequencing Initiative was launched in October 2008 and aimed to produce a so called “whole genome shotgun sequence” of the complete 950 Mb (950 million nucleotides/base pairs) tomato genome. This new sequencing initiative was carried out by the five participating countries from the EU-SOL project, Spain, Italy, France, UK and the Netherlands, in cooperation with research groups from India, Japan and from the United States. Next Generation Sequencing data was produced using two state of the art technologies (SOLiD and 454) and in addition, new computer software was adapted to suit the needs of the tomato sequencing project. The latter was carried out in the framework of the Bioinformatics module of the EU-SOL research program.
The approach taken in the Tomato Next Generation Sequencing Initiative was highly successful. The first assembly of the newly produced DNA sequences, supplemented with only a small fraction of the already available BAC sequences, resulted in providing information of about 800 Mb of the entire tomato genome. The assembly process resulted in about 7,000 individual parts, the “scaffolds”, which are contiguous stretches of DNA ranging in size from 50 bases to approximately 10 million of bases . More than 95% of the assembled 800 Mb of tomato genome appears to be present on only 250 large scaffolds. A preliminary analysis of the tomato genome shows that approximately 34,000 genes are present.
This first draft was released in January 2010.
Currently, the tomato sequence assembly is further improved and updated versions of the sequence are made available to the international scientific community at regular intervals. The latest version of the assembly (V1.03) can be downloaded from the following URLs:
http://mips.helmholtz-muenchen.de/plant/tomato/index.jsp and http://solgenomics.net/
René Klein Lankhorst1, Gerard Bishop2, Huib de Vriend3
1) Wageningen University, Wageningen, The Netherlands;
2) Imperial College London, London, UK;
3) LIS Consult, Rijswijk, The Netherlands;
High quality Solanaceous crops for consumers, processors and producers by exploration of exploration of natural biodiversity.
European Commission scientific officer:
Project coordinator: Dr. René Klein Lankhorst
Address: Wageningen University, Centre for Biosystems Genomics, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
Tel: +31 317 480938