The internet is currently made up of around 50 billion pages, linked to form a vast, virtual landscape. Our interaction provides data which, when broken down and analysed, can help us understand a wide range of human activities from the cultural to the economic.
Funded by the EU’s FP7 under the Future and Emerging Technologies scheme, the New tools and Algorithms for DIrected NEtwork analysis (NADINE) project is contributing to the development of new types of search engines, putting Europe in the lead in this important area.
‘We are trying to map the net to show how pages are linked together and how people use these links in their voyage around the net,’ says NADINE project coordinator, Dima Shepelyansky research director at the Laboratoire de Physique Théorique, CNRS Toulouse.
The project uses, among other tools, some provided by Google to show how pages are linked together Doing so can, for example, show the probability of people visiting certain sites, making choices, buying objects or voting in certain ways.
Refining ways of tracking online interaction
To develop and test their methodologies, researchers looked at Wikipedia biographical entries to see if they could rank the people referred to in order of influence. They analysed the 24 major languages, considering the number of articles linking to the individuals using Google’s PageRank system which says a page is important if important pages link to it.
But this threw up an interesting problem for the project to iron out – the scientist Linnaeus appeared to be the most important individual. Since he was responsible for classifying organisms, there are links to his page on every Wikipedia page referring to plants and animals which skewed the results.
So researchers decided to introduce CheiRank, which describes the importance of a page in proportion to the number of outgoing links. By combining both, researchers were able to establish a robust way of measuring importance. Self-organising, hyperlinked web communities can be also detected by developed methods.
Online information flows similar to commercial exchanges
Considering the way links to and from a page can show how information is exchanged, the project then applied their findings to the analysis of commercial flows. NADINE has been using the United Nation’s world trade database which gathers data from the last 50 years. ‘We have been developing a new way of analysing the commercial exchange of 61 products across the UN countries, determining the sensitivity of trade balance to price variations’, he explains.
NADINE brings together a partnership of theoretical physicists, mathematicians and computer scientists from France, The Netherlands, Hungary and Italy. ‘Transnational, EU funding was indispensible when it comes to getting a team of scientists from such a variety of disciplines together,’ Shepelyansky adds.
The project has been running for three years and ends this April (2015). It is supported by nearly EUR 1 223 million in EU funding. Now it has the methodology clearly established, researchers from the NADINE consortium intend to continue the work with various partners including the World Trade Organisation.