An economic perspective on data and platform market power

This paper starts with some basic economic characteristics of data that distinguish them from ordinary goods and services, including non-excludability and non-rivalry, economies of scope in data re-use and aggregation, the social value of data and their role in generating network effects. It explores how these characteristics contribute to the emergence of large digital platforms that generate a combination of positive and negative welfare effects for society, including data-driven network effects. It distinguishes between lexicographic and probabilistic data-driven matching in networks. Both may lead to market “tipping”. It emphasizes the social value of data and the positive and negative social externalities that may come with this. Platforms are necessary intermediaries to generate the social welfare or network externalities from data. However, the economic role of data-driven platforms is ambivalent. On the one hand, platforms enable society to benefit from positive externalities in data collection via economies of scale and scope in data aggregation of transactions and interactions across users, both firms and consumers. That gives them a privileged market overview that none of the individual users has. Platforms can use this information asymmetry to facilitate interaction and increase welfare for users. These data externalities attract users to the platform. On the other hand, data-driven network effects may result in monopolistic market power of platforms which they can use for their own benefit, at the expense of users. Any policy intervention that seeks to address the market power of online platforms requires careful balancing between these two poles. Finally, the paper briefly disc usses ecosystems that leverage data to coordinate interactions between different platforms.


Introduction
Data are the driving force behind the digital economy, including the large online platforms that have emerged as key players in the digital economy. Data are collected, ana lys ed, tra nsforme d, a cc ess ed a nd tra ded between many players in the digital economy. This paper presents some basic economic c ha rac teristics of data that distinguish them from ordinary goods and services. It explores how these characteristics contribu te to the emergence of large digital platforms that generate a combination of positive a nd n eg ative welfare effects for society.
Data access and trade may cover a variety of modalities of data exchange between two or m ore pa rties, ranging from monetised trade in data, to voluntary free access or the exchange of data in return for a service. Any voluntary data exchange is a market-based data transaction. A key question for data policy m akers is whether private and voluntary data access decisions maximize the social welfare o f s o ciety as a whole. Economists define market failures as situations where the aggregate private welfare of firms and consumers remains below the total welfare that society as a whole could achieve with a given te chnolog y. Th is o ccu rs when the incentives of private firms and/or consumers make them behave in ways that diminish overall social welfare. This may justify regulatory intervention in data markets and the imposition of remedies to a dd res s these failures. These may include some form of mandatory data access co ndition s th at o verrule private decisions.
In line with the European Commission's "Better Regulation Guidelines" 1 we follow a b road er a ppro ach to possible regulatory intervention in data and data-driven services markets. It include s m onop olistic m arket failures that are usually handled by competition law but extends to other sources of market failures s uch as externalities, asymmetric information and missing markets because of high transaction costs. We a ls o po int out potential regulatory failures and social concerns, such as welfare distributio n a nd dis crimination th at could motivate regulatory intervention. In contrast with the mainstream competition law and economics vie w that adheres to a narrow consumer welfare policy objective 2 , this paper takes a wider public policy economic s view and focuses on the overall social welfare of society as a policy objective, combining welfare of firms and consumers. The distinction between consumer and social welfare may become important for example in datadriven online platforms. Policies that focus exclusively on the consumer side may have unintende d n ega tive effects on the supply side of the platform, and vice versa.
Furthermore, we go beyond markets and look at the impact that data have on institutions and organisation al arrangements in the digital economy. A striking feature of this new organisational landscape is the emergence of online platforms and possibly platform ecosystems that create structural links between several platforms 3 . Digital data technology contributed to the emergence of new markets for goods and services th at were not feasible in the pre-digital economy because the technology was simply not available to o ve rcome some information cost constraints. These new markets often require n ew wa ys o f o rg anisin g ec ono mic exchange and new types of firms that are generically labelled as "pla tfo rms". At th e sa me time , th ese platforms may generate new sources of market failures that will also be discussed. This paper is structured as follows. Section 2 discusses the specific economic characteristics of data that are in several respects different from ordinary goods and services. We explore how these characteris tic s a ffect data collection -or the market for data exchange between data sources and colle ctors -a nd d ata trad e between data collecting and data using firms. Section 3 brings platforms into the picture, a new type of firms that leverages the economic characteristics of networks and data to c reate m ore e fficient m arkets. W e examine the benefits that this brings as well as potential new sources of da ta-driven m arket failure s. In Section 4 we move from single platforms to data-driven ecosystems where platform s c oord inate th eir activities through data sharing. Section 5 adds some concluding observations. 1 The European Commission's "Bette r Regulation Guidelines" (2017) are available at https://ec. e uro p a. e u/in fo/ law / law -makin gp roce ss/planning-and-proposing-law/better-re gulatio n-why-and-how/bette r-regulation-guidelines-and-toolbox_e n T h e Euro p e an Commission's "Data Strate gy" (2020, p 14 footnote 39) also advocates a marke t-failure based approach to re gulatory intervention in data marke ts. Available at http s://ec.europa.eu/info/sites/info/file s/communication-european-strategy-data-19feb2020_en.pdf 2 The economic characteristics of data

Data are usually an intermediary input, not a final consumer good
Unless they are aviation aficionados, consumers do not search for flight schedules on Google or Skysca n ner because they enjoy looking at these schedules but because they want to buy an air transport service. Data are not created ex nihilo. They are collected from observations on the behaviour of people, machines and naturethe data originators. Firms collect data directly or from consumers or other firms. They can then be us ed fo r the production or improvement of a good or a service. Data exchanges thus involve at least two markets , an upstream data collection market and a downstream data u se m arket. Th ere ca n b e a s in gle vertic ally integrated single firm operating on both markets, or there can be different firms in both markets th at tra d e data between them. Data collection can happen prior to their use in services, or it c an b e a b y -pro du ct o f services. For example, Google Search collects data by scanning webpages while the Search rankin g d epen ds on data collected from users of the Search engine. Data exchange, data trade, data sharing a nd ac ces s are labels that may cover different exchange modalities: data can be traded for a monetary compensa tio n o r in exchange for a service, sharing can be for free or subject to conditions in other ma rkets , e tc. Da ta c a n b e traded directly -when they are effectively transmitted between parties -or indirectly -when parties d o n ot transmit data but only a data-driven service. For example, online advertising platfo rms like Goo gle d o n ot transmit consumer data directly to advertisers. They sell a targeted advertising service ba sed on co ns umer data which they keep in-house.

Data collection has an economic cost
The data collector needs to have a financial incentive to invest in data infrastructure, for example beca use it offers the prospect of monetizing the data. Data originators, consumers and firms, need incentives to s h are their data with a collecting firm. A frequently observed business model in data collection markets is to offer originators a free service in return for sharing their personal or industrial data. The willingness of data sources to share data with collectors will not only depend on conditions in the data market but a lso o n s u bseq uen t use of the data in services markets. For example, the willingness of consumers to s h are th eir d ata with a website will depend on the quality of services offered by that website as well as subsequent use of th e d ata by the website, for instance for online advertising. Lack of transparency in data re-us e ma y o f co urs e blu r that picture. Firms that offer free services need to find a way to cover the cost of providing th es e s erv ic es. Google and Facebook offer users free services in return for the ability to monetise u s er da ta in a n o nline targeted advertising market. Just like in the real economy, there are no free lunche s in th e d a ta e cono my though the party that pays for the lunch may be different from the party that enjoys the lunch. Any change in the cost of data collection and in the benefits for data users will affect the volume and possibly the quality of data collected.

The value of data depends on their use
Data have no value on their own; they become valuable only to the extent that consumers and firms ca n u se them to improve their position in data-driven services markets. Data ca n h ave m any e ffects on s ervic es markets. Economists have tried to get a better understanding of these market effects and the welfare impact on stakeholders. There is no coherent framework yet for the economic analysis of data. Some authors 4 foc us on the revenue-shifting potential of data. They assume that a "better" dataset generates more revenue for a firm, for a given level of utility provided to users. If firms extract more revenue than the utility they provide to users, users will shift to other firms, unless the firm's monopolistic market power prevents users from moving. That would result in an anti-competitive use of data. Pro-competitive 5 uses imply that both firm revenue and user utility from the data-driven services increases with additional data. For example, more da ta c ollectio n and more efficient use of the data in a hotel booking pla tform can simultaneously improve the user experience, revenue for hotels and platform revenue. Competitive use may still cause welfare shifts betwee n 4 De Corniè re and Taylor (2020) "Data and competition: a general framework with applications to me rgers, ma rke t s tru ctu re a nd p rivacy p olicy". Mime o, February 2020. 5 The notions of p ro-and anti-competitive behaviour go beyond cla ssic notions of competition p olicy. The y ta ke a b roade r s o cial we lfare p erspective that combines firm re ve nue and consumer utility. Incre ased marke t shares and marke t p ower can still b e p r ocomp e titive if they increase overall we lfare. firms and their customers, or between sub-groups on each side. This may trigger equity and welfare distribution concerns, for example when firms use data for price-or other forms of discrimination strate gies that increase the welfare of the firm but not for some users. A problem with this very g ene ric a pproa ch to data is that all these statements are subject to empirical evidence. This may be easy to obtain for firm s and platforms that collect user data and run behavioural experiments with their online users in order to decide o n their profit-maximizing commercial strategies. It is more difficult for policy makers to acces s re levant d ata that provide insights that could feed social welfare improving policies 6 .

Excludability and monopolistic data trade
Contrary to physical goods, data are not excludable by nature. They can easily be copied and d is sem in ated . The law can assign exclusive rights to data originators and/or collectors. So far, there a re n o g ene ral d ata ownership rights in the EU or elsewhere 7 . In a few cases, the law grants erga o mn es e xclu sive righ ts. Fo r example, in an attempt to bring data rights in line with the principles of intellectual property rig hts , th e E U Database Directive 8 granted, under restrictive conditions, sui generis ownership rights to data collecto rs, th e producers of databases. The EU General Data Protection Regulation 9 (GDPR) g ran ts s ome e xc lus ive and inalienable rights to natural persons as data originators, to keep control over their personal data, including the right to consent to access to personal data, and data access, portability, and deletion for the data sub ject. In the case of personal data, the data subject as data originator is usually unambiguously defin ed. Th is is n ot necessarily the case for non-personal machine-generated data that may be co-generated a nd co lle cted b y several parties. Assigning exclusive rights to any of these parties may affect th e en tire valu e c hain in an industry 10 . All these attempts at assigning exclusive rights over data to private parties a re in line with th e Coase Theorem that hypothesizes that markets will work efficiently when ownership rights are we ll -de fined and transaction costs are low or zero. However, because of the intrinsic social value of man y da ta a nd th e inability of individuals to internalize the externalities that their use entails, private ownership righ ts c ann ot bridge the gap between the private and social value of data (see Section 2.8 below).
In the absence of legal ownership protection, a data holder can apply technical protection measures to ensure his exclusive control and access to the data. This makes him a de facto data monopolist, provid ed th ere are no close substitute data sources. Exclusive access is necessary to raise revenue from selling d a ta o r d atadriven services via negotiated bilateral contracts with d ata us ers tha t d et ermine d ata a cces s a nd u se conditions, including prices. Contracts between contracting pa rties b en efit from le ga l pro tection un der commercial law. They can be enforced in courts. However, they cannot be enforced against th ird-partie s. In case of data leaks, data holders have no recourse against third-parties that benefit from these leaks.
If more parties have access to the same dataset, or to close substitutes, competition will drive prices down to the marginal cost of reproduction, which is usually close to zero for digital data. That eliminates any opportunities to generate revenue from the data and any incentives to invest in th e co llection of da ta. Monopolistic data pricing above marginal cost requires rationing or reducing the quantity an d/o r q uality o f data that can be accessed. Not all demand will be satisfied, unless perfec t price dis crimination b etween buyers would be feasible. Monopolistic trade does not maximize social welfare 11 : it increases the welfare o f the data holder at the expense of data users. Completely open data markets on the other hand drive p rices and revenue down to zero and eliminate incentives to collect data in the firs t pla ce. Da ta p olicy re quires careful balancing between these two extremes. 6 This e xp lains the origins of a range of Business -to-Government data sharing initiatives in several EU Member S ta te s a nd b y th e Europ ean Commission. See for e xample the re port of the Exp ert Group on B2G data sharing, available at http s://ec.europa.eu/digitalsingle -marke t/en/news/meetings-expert-group-business-government-data-sharing 7 Ne stor Duch-Brown, Bertin Martens and Frank Muelle r-Langer, 2017. "The e conomics of ownership, access and trade in digital data, " JRC Working Papers on Digital Economy 2017-01, Joint Research Ce ntre . Available at http s://ideas.repec.org/p /ip t/de cw p a/2 01 7 -01.html

Data are not a homogenous product
They can be traded in various levels of fine graining and information content. Th ey a re s ub ject to q uality differentiation. For example, detailed consumer profile data are more valuable tha n c oa rse -gra ined d ata. Quality differentiation may be required in order to avoid fallin g into th e Arrow Para dox : o nc e da ta are revealed to a potential buyer there is no point in trading them anymore because the buyer a lrea dy h as th e information he wanted to buy. There are many strategies that a potential data seller can apply to reduc e th e quality or information content of a dataset in order to overcome this p ara dox . 12 . He c an offer a re du ced sample of the data, or a coarse-grained or aggregated version that does not reveal details, or an anonymized version, etc. For example, mobile phone operators sell mobility insights but not original consumer d ata ; d ata will be anonymized, aggregated and processed. The seller can also refrain from sharing data direc tly with a buyer and deliver an indirect data-based service only, as in the Google advertising example. Data sellers vary the quality of data collected from originators and transmitted to fin al u sers to max imize th eir pro fits 13 . Collecting very detailed data from consumers may make them suspicious and reject the s er vice o f fered in return for the data. Handing over too detailed consumer data to data users may have a similar effect o n th e originators. Data buyers will want detailed data because it enables them to discriminate in the sa le of th eir services. The data intermediary will adjust the quality of the data that he c ollec ts a nd sells (th e le vel of aggregation and segmentation) to maximize his profits.

Non-rivalry and economies of scope in data re-use
Data are non-rivalrous. Many parties can use the same dataset at the same time for a varie ty o f pu rpos es without functional loss to the original data collector. Rival goods can only be used by one party a t th e time. For example, a car is a rival physical good and can only be used by one driver at the time. If a ca r wou ld b e non-rival, all drivers could re-use the same car at the same time to drive to d ifferent d es tin ation s. Th e welfare gains would be enormous: it would suffice to invest in the production of a single car to ca ter to th e needs of all drivers. Data collected by one firm can be re-used for other purposes, either by the same firm o r by other firms provided they can access the data. It results in cost savings because the primary data collection effort is a sunk cost that can be amortized across many uses, rather than remaining confined to a single user. It can boost innovation and enable the production of new and innovative data se rvic es th at th e original data collector had not envisaged. This promise of substantial welfare g a in s from exp loiting n onrivalry in data re-use constitutes the foundation stone of the data access and sharing debates 14 .
Economies of scope in re-use were originally defined in the context of joint production and (re -) u se o f th e same product or asset to produce other outputs 15 . For example, a car manufactu rer c an re-us e th e s ame engines in different car models. Re-use of the same non-rival engine design entails zero marginal re -des ig n costs. However, there is a positive marginal cost for physical re-production of additional eng ines. No n -rival immaterial products, such as knowledge and digital data, have quasi-z ero m arg inal re -p rodu ction co sts because it involves only copying an electronic data file. Note that data re-us e b y o the r firm s m ay cre ate interoperability problems and important fixed costs for the design of an interface.
Data re-use and access by other parties also has a cost side. All d igital d ata c an, in p rinciple, b e made interoperable and shared for the benefit of society 16 . However, neither firms nor individuals want their private data to be widely available. Privacy and commercial confidentiality are important for the autonomy of private decision-making and for extracting private value from these decisions. While non-rival data can be shared b y firms and individuals without functional losses, sharing may entail an opportunity cost and economic lo ss es for the original data holder. Other firms may re-use the data in services applications that compete with thos e of the original data collecting firm and undermine the latter's market position 17 . The data holder may want to 12 For an overview, see Dirk Be rgemann and Ale ssandro Bonatti, 2018, Markets for information: an introduction, CEPR discussion p aper DP1314, 2018 Dirk Be rge mann, Ale ssandro Bonatti, and Tan Gan (2020), The economics of social data. Ya le U niv e rsity , Co w le s F o undatio n discussion p aper nr 2203 re vised. produce these alternative services in-house and appropriate the benefits. Firms may re-use personal data for purposes that harm the data subject's privacy and welfare.
Firms and persons will trade off the expected benefits from data sharing against the expected costs and risks that they might incur from doing so. These private cost-benefit perceptions m ay limit th e e xten t of data exchange, sharing and re-use. The question for policy makers is whether private data decisions by consumers and firms maximize the welfare that society as a whole could derive from the data. If not, there is a m arket failure that may require policy intervention. Policy intervention should not seek to m aximiz e da ta s h aring . Data sharing is not an objective in its own right but a means to achieve higher s ocia l we lfare for s ociety. Policy makers should only intervene when the market is not delivering a social welfare-maximizing volume o f data sharing, considering both the costs and benefits of data sharing.

Economies of scope in data aggregation
A second, and often neglected, source of economies of scope in data comes from data aggregation. M erg ing two complementary datasets can generate more insights and economic value compared to keeping th em in separate data silos, provided that the datasets are complementary and not entirely separable. This insight can be traced back to the economics of learning and division of labour. Rosen 18 observed that when a person h as a choice between learning two skills, specialisation in one skill is always beneficial when the costs of learning both skills are entirely separable. However, when learning costs are not separable because knowledge sets are complementary, there are economies of scope in learning both skills, provided the benefits from c omb in ing the two exceed the additional learning costs. This insight can be applied to d ata. W h en two d a tas ets are complementary, applying data analytics -the equivalent of learning -to th e me rged se t will yield more insights and be more productive than applying it to each set separately, especially when the marginal cost o f applying analytics to a more complex dataset is relatively small.
Economies of scope in data are controversial in economics, also becaus e th ey a re o ften m is un ders tood . Authors usually do not distinguish between economies of scale and scope, or between economies of scope in re-use and in aggregation of data. Tucker 19 defines economies of sco pe s omewha t a mbigu ous ly as c ost savings relative to an "increased level of production of multiple products". "Incre ase d level o f p rod uc tio n" refers to economies of scale. "Multiple products" could be interpreted as economies of scope in re-use of data but not in data aggregation. A useful way to distinguish economies of scale and scope is to consider a dataset as a two-dimensional spreadsheet, with the number of columns representing the number of variables and the number of rows the number of observations on these variables. E c ono mie s o f s ca le refer to inc reas ed prediction accuracy due to an increase in the number of rows. Economies of scope in data aggregatio n refer to increased prediction accuracy due to an increas e in th e n um ber o f c olu mns . Add ing m ore c olu mns (variables) is not helpful when they are highly correlated or when they are not relate d a t a ll. A n umb er o f empirical studies claim that economies of scope in data are weak or non-existent 20 . All these studies are more about economies of scale rather than scope. Bajari et al 21 come closest to a proper s tud y of ec onom ie s o f scope in aggregation when it merges data across several product markets. Th ey find tha t p rodu ct s ales forecasts do not become more accurate when historical data from several products are combined. However, weak complementarity among product markets results in h igh ly se para ble d ata sets a nd th u s in we ak economies of scope. The absence of empirical studies on economies of scope in data aggregation is a m ajo r gap in data economics. There is anecdotal evidence in support of economies of scope in d ata a g greg ation.
McNamee 22 explains how Google gradually improved its targeted advertising b y c omb ining p ers ona l data from several sources, starting from web searches and adding email and maps (loc ation ) d ata . Na vig ation apps like Waze and Tom-Tom combine real time GPS location data with maps that are populate d with d ata from a wide range of public and private sources including road and traffic authorities, municipa lities , firms and in-map advertisers. These public sector data may have little commercial value on their own but crea te a valuable service when aggregated with other data. 18 She rwin Rosen (1983) Specialisation and human capital, Journal of Labor Economics, Volume 1, Number 1 Jan., 1983. 19 Tucke r (2019, p 5) 20 Chiou and Tucke r find no decrease in search e ngine accuracy when time se ries of consumers' historical se arche s a re s horte ne d be cause of EU p rivacy re gulation. Neumann e t al show that large data brokers do not necessarily p e rf orm b e tte r in co n sume r p rofiling than data brokers with fewer consumer p rofile data. Claussen e t al find that more individual user data helps algorithms to outp erform human news editors but decre asing returns to user engagement set in rapidly. Schaefer e t al find that th e q uality o f se arch re sults imp rove with more data on p revious searches. McAfee et al find that Google Search outperforms Micro s oft B in g in long-tail searches because of a higher number of users. Economies of scale and scope in data aggregation are a source of positive externalities. In the age of artificial intelligence and machine learning, personal data collected on the behaviour o f on e s et of c ons umers h as predictive value for the behaviour of other consumers 23 . Once a firm has a cc umula ted a c ritical m as s o f consumer data, the additional insights obtained from adding another consumer's personal data a re s ma ll 24 , compared to what can be learned from data already collected about persons with a similar profile. Acemoglu et al 25 argue that these externalities in personal data collection create a market failure. Th e y d iminish th e value of individual personal data as well as consumer incentives to p rotec t th eir p rivac y. Th a t, in tu rn, increases the supply and further decreases the market value of personal data. Data collectors c an rea p th e benefits of that externality; consumers cannot prevent this negative externality for their own data. Their b es t deal is to exchange their personal data in return for a free online service that has a higher marginal use value for them than the depressed market value of their individual data. This externality could explain th e p rivacy paradox 26 . Consumers value their privacy but do not invest in protecting it. They have relatively little resistance to sharing location or browsing data when used for advertising purposes. Consumer re sis tanc e is high only for the most sensitive data such as bank statements or fin gerprin ts 27 . In ves tment in privac y protection tools may have a signal value in itself that can be exploited against consumer interests 28 .
Note that the two interpretations of economies of scope in data (re-use and aggregation) ma y le ad to ve ry different policy implications. Economies of scope in re-use provide an argument in favour of data dissemination and de-concentration. Economies of scope in aggregation, by contrast, favours data concentration in large pools from a variety of sources. The two are not mutually exclusive. Since data are nonrival they can be stored at the same time in concentrated pools and in distributed settings. Both concentration and de-concentration can result in market failures that undermine social welfare 29 .

The social value of data
A peculiar characteristic of many 30 data is their social use value. Economies of scope in aggrega tio n a dd s a social dimension to the value of data. Owners of two separate but complemen tary d ata sets c an re ac h a higher level of value and insights from their data if they pool the two sets. Another source of social va lu e o f data is related to economies of scale. Once a sufficiently large sample of behavioural observations has be en compiled to produce robust predictions, that data sample can be used to predict th e b eha vio ur o f a g ents outside the sample 31 . This implies that collecting more data about other agents with similar c h ara cteristics has diminishing marginal value because the existing dataset is sufficiently repre senta tive to p redic t the behaviour of other agents, even if other agents refuse to share their personal data.
These externalities imply an inherent market failure in exclusive private control over c ompleme ntary d ata, both for data sources and for data collectors. The party that does (not) provide the data to a collector is n ot necessarily (may still be) the party that is affected by their use. The de facto exclu sive d ata ho lder is no t necessarily the party that maximizes benefits from the data. Two or more parties can agree to pool their data and generate the full social value of the data. However, coordination costs and ris ks m ay u nderm in e th is spontaneous pooling. An intermediary agent may be required to realize the socia l ex terna lities from d ata pooling and turn them into benefits that (a) pay for the c oo rdina tio n c osts , (b ) g ene rate b ene fits th at incentivise individuals to participate in the pool, and (c) extract a profit from the intermediation services. With this, we reach the world of data platforms in the next section. Ace moglu e t al (2019) 26 Acquisti e t al, (2016). 27 Je ffre y Prince and Scott Wallste n (2020) How Much is Privacy Worth Around the World and Across Platforms?, Te chnolo gy Policy Institute . 28 De ngler and Prüfer, 2018. 29 Economie s of scope in aggregation and re-use e xist in other domains too, for e xample in intelle ctual p roperty rights. For e xa mp le , the marke t value of a set of complementary p atents may be higher than the sum of their se parate values. Hence th e p ractice o f p ate nt bundling and thicke ts, and the bundling of Standard Essential Patents (SEPs) to facilitate re -use o f te ch nical s ta nda rds. Bundling strengthens the monopolistic p osition of pate nt holders. Fair, re asonable and non-discriminatory (FRAND) lice nsing seeks to comp e nsate this by avoiding abusive behaviour. 30 Some typ es of data may have little or no social value because they remain situation, p erson or firm-specific and cannot be used b y othe r agents or situations, or has no complementarity with other datasets. 31 Dirk Be rge mann, Ale ssandro Bonatti, and Tan Gan (2020) The economics of social data, March 2020, Cowles foundation discussio n p ap er nr 2203R. Ace moglu e t al (2019).

Platforms and data-driven network effects
Much of the current debate on data access is still implicitly set in the context of traditiona l firm s a nd data exchanges between individual data collectors and re-users. However a substantial volume of data exchan ges and data-driven services trade takes place in a new type of firms that are usually classified under the generic label of "platforms". While monopolistic market failures may occur in linear data exchanges, recent da ta and competition policy related reports 32 pointed out that monopolistic behaviour occurs mainly in very large online platforms that have become gatekeepers to online markets. In this section we e xplo re th e cru cial r o le o f platforms in the digital economy and the role that data play in these platforms.

Platforms in the digital economy
What are platforms? There are many definitions of platforms -or multi-sided markets in economic jargon -in the economic literature and there is no consensus am ong e co nomis ts o n th ese d efinitions 33 . Th e first generation of multi-sided market models were extension of the economics of infrastructure networks, such as telephone and railroad networks for example. Network effects or network ex terna lities oc cur wh en u se rs derive benefits from the presence of other users. When more users connec t to a te leph one n etwork this creates more opportunities to call other users. It makes the network more attractive to a ll u sers . Th e first generation of platform models in economics 34 focused on markets with a t lea st two typ es o f us ers, for instance buyers and sellers. Platforms are faced with a "chicken and egg" problem: they need many us ers o n one side of the market in order to attract many users on the other side of the market. Th ey c an so lve this problem by charging a very low or zero price to one side of the market to attract many users on that side, and charging a high price to the other side to pay for the cost of operating the platform. Users on the side with a high price elasticity of demand pay low or zero entry costs while users with low price elasticity of demand pay a higher price. This explains why advertisers pay for ads while users get free ac ces s to s earc h a nd s o cial media services: advertisers have no choice but to advertise in a particu lar p latform wh ere a u ser with a specific profile is looking for a good or service that the advertiser can offer. Users can, to d ifferent e xte nt, however multi-home between many platforms to find what they are lo oking for. Th es e mo dels ra n into problems to distinguish between intermediary platform and ordina ry reta ilers a nd d efining th e typ e o f interaction between two sides 35 .
Recent economic thinking on platforms has broadened the defin itio n. Platform s c an b e d efin ed 36 m ore generically as undertakings that bring together economic agents and actively manage network externalities 37 between them. The key role of platforms is to generate positive network effects or network externalities an d in this way maximize the social value that can be extracted from the data collec ted b y th e pla tfo rm. Th e presence of economies of scale and scope in the data aggregated by a platform ensure tha t th e co llective social value of data exceeds the sum of their individual private values 38 . Creating a searchable ca talo gue of products or a directory of users is a first step in generating that social value. For more efficient proba bilis tic matching, the platform requires detailed data on buyer characteristics and preferences and on the characteristics of the products and services offered. For example, Netflix can impro ve its film title s ea rch engine when it learns more about user preferences and film characteristics 39 .
A comparison with traditional offline markets illustrates the importance of the online platform's role a s d ata collector and producer of data-driven externalities. In a traditional town market buyers walk around b etween market stalls, collect information on what is on sale and sales conditions, and make their choices . Th e to wn authority as market organiser has hardly any information on sellers' offers, buyer p refe rence s a nd a c tual transactions. Each user has to collect this information separately; there is no common information pool. Th is is privately costly for users and socially costly for society as a whole. Costs in crea se with market s ize. In online markets, the platform operator collects an aggregated view of supply and demand and actual transactions. Users can benefit from this aggregated information. It would be impossible for u s ers in larg e online platforms with millions of product entries to collect all information on their own . Platforms are in a unique position as third-party data aggregator to realize economies of scale and scope in data ag greg ation across many users. Individual users cannot realize these benefits. This fits well with our definition of economies of scope in data aggregation: the value of the insights from the aggregated dataset is higher than the sum of values of individual user datasets 40 .
The label "platform" refers to a multi-sided market as well as to the firm that manages th is m arket. W hile markets can grow spontaneously, in many cases they require an organiser to take the initiative and define the operating conditions. Platforms are new types of market-organising firms that emerged in the wake of digital data. The traditional view of the firm goes back to Ronald Coase 41 . Coase wondered wh y firm s e xist a s an arrangement between workers who divide tasks and exchang e inte rmedia te go ods a nd s ervices in th e organisational setting of the firm rather than going through the market for e ac h o f th es e ex cha ng es. He argued that contractual arrangements reduce transaction costs compared to going through th e ma rket fo r each exchange between workers. The borderline of the firm, between in-house production and external tra de, depends on transaction costs. Digital data and online platforms have dramatically reduced these trans ac tio n costs to quasi-zero in many cases. With quasi-zero digital trans action co sts , s ome firms s top in -h ous e production altogether, delegate production to external agents and transform themselves into market p la ces. These firms "invert" and become market organisers rather th an pro duc tio n o rga nisers 42 . In c ontra st to traditional firms that keep the market outside, they organise a platform market whe re d iffe rent typ es o f users, for instance buyers and sellers, can trade goods and services. Iansiti and Lakha ni 43 s how th at d atadriven platforms are not subject to diminishing returns to scale. Human labour is rep laced b y d a ta-driven algorithmic procedures with high fixed set-up costs but nearly z ero m arg inal c os ts. No n-rival d ata an d algorithms make these platforms infinitely scalable. This leads to huge productivity and efficiency g ains bu t also to increased market power and monopolisation.

The role of data in platforms
The above explanations show that data collection and analytics play a key role in the intermediation func tion of platforms. However, the first generation of economic models of platforms actually had no explicit ro le for data. These models are suitable for relatively simple networks with unambiguou s le xicog rap hic m atch ing between users, such as telephone networks that inspired these models. The data-free platform model fails to explain what happens in complex platforms where collecting user data is indispens ab le to g en erate d a tadriven network effects and increase matching efficiency in ambiguous and probabilistic matching 44 .
Data play a role in generating network effects. In some cases the role of data is very minimal and static . Fo r example, users in a telephone network differ only by their telephone number, a unique lexicographic addre ss . Users can be unambiguously matched by combining two lexicographic addresses. The only dataset required to make the telephone network operate optimally is a telephone directory. Matching between te leph one u s ers cannot be improved by observing the behaviour of the users. Similarly, in simple online e-commerce stores , a targeted search for a well-defined product may just require a catalogue of unambiguously defined pro du cts. For example, search for a book title in the Amazon book store. In these cases network effects a re d riven b y the numbers of products and users and their unique identification. The quantity and quality of data on us ers and products plays no role in unambiguous matching processes.
In other platforms the role of data is crucial. For example, matching in search engines and targeted advertising markets requires data on the characteristics of users an d p rodu cts , b eyond a lexic ogra phic identifier, in order to select the most likely and optimal matches. Many matches are possible but which match or ranking of matches is the most optimal? Probabilistic matching requires more detailed da ta o n re levant user characteristics in order to improve the efficiency of matching. For example, a search engine will not o nly index the IP addresses of webpages in the world-wide web but also collect the content of the pages, an alyse and classify that content. It will collect data on user clicks on search ranking in order to b etter u nd erstand which pages are most relevant for a specific search term and for a specific u s er. It will th en ca rry ou t a probabilistic matching between the two, resulting in a ranking from most likely to less likely m atch es. M o re precise data on user preferences will increase the efficiency of probabilistic matching. The quantity, q u ality and analytics of the data will play an essential and dynamic role in generating data-driven network effects.
Many of today's largest online platforms are probabilistic matching services: Google Search, Facebook so cial media, online advertising, Amazon, Netflix, Uber, e-scooter platforms, etc. They put data at the c ore of th eir business model and specialise in transactions that require substantial datasets to do an effic ie nt ma tch in g between users. Platforms help to create new markets that were missing in the pre-digital economy b ecau se information-related transaction costs were too high. For example, finding a hotel was costly in the a nalo gue economy and required intermediation from travel agencies that offered a limited choice to consumers. Finding "information" in general was costly. These missing information markets were not a market failure because the technology to overcome them was not available at the time, or remained very imperfect. Digital data technology has dramatically reduced information cost and thereby expanded user choices.
Data-driven network effects 45 are intrinsically linked to economies of scale and scope in d ata ag greg ation . They can reinforce the efficiency of probabilistic matching networks and thereby strengthen network effects. For example, McAfee et al 46 show how larger number of users in Google Search make it more efficient in rare search terms compared to Microsoft Bing that has a much smaller number o f u sers . E co nomies of s cale means more observations on similar search term while economies of scope in aggregation imply c ollec tin g search results from a wider variety of search terms. Algorithms reinforce the value o f the d a ta th rou gh a feedback loop that builds on better predictions and learning-by-doing that, in turn, strengthen s d ata -d riven network effects. That difference in efficiency, in turn, motivates u sers to s hift to Goo gle. Th e two m ay reinforce each other. The rise of artificial intelligence and machine learning has further amplified eco nomies of scale and scope in data aggregation. Machine learning is a very data-intensive technology. W hile h uman learners can learn a behavioural response from a few observations, machine learning algorithms often require huge numbers of observations to correctly learn an appropriate response.
Individual consumers or firms cannot achieve these data-driven network effects on their own. Th ey re qu ire third-party intermediaries to collect, classify and analyse data in order to make efficient use of it. That is th e role of platforms. Platforms are in a unique position to aggregate data o n tra ns action s a nd inte ra ction s across many users, including firms and consumers. They can realize the economies of scale and scope in data aggregation that drive the social value of data. If individual firms a nd c on su mers wo uld kee p th eir own transaction data, without sharing them with the platform, the collective social value o f d ata wou ld n ot b e realized. That gives them a privileged comprehensive market overview that none of the individual users has.
Platforms can use this privileged data position in two ways. First, they c an u s e it to pro duc e s earc h a nd matching services for users: helping consumers to find what they are looking for, helping businesses to find their customers, helping advertisers to better target their ads, etc. That generates positive welfare effects or network externalities that attract more users to the platform. Second, they can monetise thes e s ervices fo r their own benefit. They charge platform entry fees to users, based on the insights from the classic pla tform  (2016) "Big data and competition p olicy" already speculated that there is a lin k be twe en e conomies of scope and network e ffects or network e xte rnalitie s. Tucke r (op. cit.) is not convinced by th is co nf latio n of conce pts. 46 Op . cit. profits and providing welfare benefits to users 47 . Going too far in one or the other direction un dermin es th e viability of the platform and the social welfare benefits that it can p rodu ce. Fo r e xa mple, no t -for-profit intermediary platforms exist. But their financial capacity to invest in innovative data collection, analytics an d service production for users will be curtailed. Conversely, for-profit firms that do not generate social welfare externalities also exist.

Market failures in platforms
Platforms can achieve monopolistic market power in several ways. In the tra d itio nal p latform e con omics model, data-free network externalities may confer market power on platforms 48 . "Ma rke t tipping " oc curs because users prefer to congregate on the largest platform. This strengthens the position of th e inc umb ent platform at the expense of potential new entrants into the market wh o h ave to overc ome th e h urd le o f network effects in order to successfully compete with the incumbent. The do minan t pla tform be comes a gatekeeper to the market. It sets market conditions. The platform's exclusive access to aggregated user data reinforces its monopolistic market position 49 . The downside of the positive network externalities of pla tfo rms is exploitation of this monopolistic position to the detriment of users a nd a ctua l or p ote ntial c ompe tin g platforms. They may steer users to transactions that are more beneficial for the pla tform an d le ss s o fo r users. They may use the data to compete with their own business users and to forec lose a ftermarkets or leverage their market position in adjacent markets. Monopolistic power concentration tendencies are inherent in the platform and data economy. However, an exclusive focus on reducing monopolistic ma rket ec onom ic power may undermine the positive social welfare externalities from data aggregation. Policies should seek to maintain the benefits of data aggregation externalities while addressing the anti -competitive use of platforms' data advantages.
The strength of data-driven network effects plays a key role in tipping 50 and varies by type of platfo rms and the relevant data in these platforms. For example, in ride hailing and e-mobility platforms, network effe cts are very local. The platform may be organised on a global basis but network effects depend on lo cal s up ply and users in cities. Expanding the supply in city A has no benefits for users lo cated in c ity B, u nle ss th ey happen to travel frequently between the two cities. This makes it e asier fo r s maller lo ca l pla tform s to compete in local markets with global platforms. Hotel booking platforms are global however. Users search for hotels in many cities and platforms have to ensure a wide g eo grap hica l variety o f o ffers . Th is ma ke s competition more difficult. Platforms can pursue deliberate strategies to tip the market in th eir favou r, fo r example by increasing the costs of multi-homing or switching to other platforms. For ex ample, d rivers c an easily switch between ride hailing platforms with little costs. To discourage drivers from switching, platforms may offer them an uninterrupted sequence of rides, with advance notice of the next ride before the on-going ride is completed. Platforms can try to differentiate their products from competitors' by a dd in g in novative features. If these features can easily be copied by competitors they offer a less sustainable advantage.
Hagiu and Wright 51 illustrate how the value that platforms can extract from data is c ond itio nal o n s everal factors. Improving the quality of insights and the matching efficiency of data can be subject to economies o f scale. In some cases a few observations are sufficient to m ake a n a cc ura te pred iction, in o the r ca ses diminishing returns to scale remain far away. For example, automated driving algorithms a re s till far from perfect despite millions of miles of accumulated driving data by leading firms such as Goog le/Wa ym o . Th e value of these insights depends on the market size. This is often true for artificial intelligence based applications in platforms that depend on large numbers of observations. Insights that benefit from externalities and can be extrapolated to a wide number of users have high value. For example, pers on alised music recommendations in Pandora, based on cumulative learning from individual us ers , c ann ot ea sily b e applied to other users. Spotify's shared music recommendations by contrast benefit from s tron g n etwork externalities because they are useful to many users. The accuracy of Google Maps for traffic pre dictions is subject to data-driven network effects because it increases with the volume of data collected from An dro id users. Several competition policy reports investigate the link between data and platform market power 52 or monopolistic market failure. They suggest some re-thinking of competition policy tools to take in to a cc oun t the specific nature of platforms as multi-sided markets and the complexity of data collection, an alytics an d use in data-driven platforms. This includes a revision of the relevant market doctrine and theories o f h arm, new measures of market power and dominance thresholds in multi-sided markets, accelera te c ompetition procedures to stay ahead of market tipping, etc. Since data-driven network effects a re o ften th e ca us e of competition problems, these reports pay attention to data policy tools as a means to attenuate d ata-driven monopolistic behaviour, for example by opening access to exclusive datasets, or a variety of data pooling and data sharing modalities. Data access or sharing may prevent an upstream monopolistic data c ollecto r from foreclosing downstream services markets. For example, car manufacturers design the car data architecture to retain exclusive access to car data, which they can leverage to increase their share in a fte rsa les s ervices markets. Mandatory data access for other aftersales service providers can prevent this competition p roblem to occur 53 . Opening data access may backfire however. It may reduce rather than increase competition wh en the data are hoovered up by large platforms that can offer users additional advantages, based on economies of scope in re-use and aggregation with other data sources. For example, payment services offered b y larg e platforms such as Apple and Google, or payment services on the WeCha t s ocial m edia ap p in Ch ina and perhaps in future on Facebook, may compete with payment service s o ffered b y lo cal b an ks or s ma ller payment services start-ups. Google Android and Apple iOS are increasingly pres ent in ca rs a n d ma y offer aftermarket services that compete with manufacturers. Since data are no t a h o mogen eou s p rodu ct (s ee section 2.6.), data access and sharing can be fine-tuned to a degree o f co ars enes s th at p res erves s ome incentives and advantages for the original data collector while still broadening competition in the market for data-driven services. That would require a careful balancing act and constant market and technology monitoring by regulators.
Data sharing obligations may have potential disincentive effects on data collection efforts. Data sharing with potential competitors will erode a platform's data aggregation monopoly, lower the va lue of th e da ta and undermine the ability to monetise the data. In a multi-sided market, modifying access conditions on one s ide of the market will have implications for other sides. For example, forcing a search or social media platform to share consumer data with competitors may not only affect consumer p rivacy. It lo wers en try co sts into advertising and will force platforms to increase entry costs on the consumer side, or integra te new m oneyraising sides into the platform, in order to make up for the lost revenue.
Platforms are both a blessing and a curse in the data ec ono my. Th ey are nec ess ary in termedia rie s to generate benefits from data aggregation, realize data-driven positive n etwo rk e xte rnalities a n d th ereby enable the emergence of new markets that were not feasible prior to the arrival of digital data. At th e s ame time, exclusive control over the data allows gatekeepers to control the ecosystem and genera te s ignific ant value for their intermediation services. They can impose excessive entry and access conditions, and exclus ive dealing rules preventing sellers from promoting their offers outside the gatekeeper's pla tfo rm. Refus al to share the data with business users in the platform, or with competing platforms, gives them a c omp etitive advantage that gatekeepers can use to foreclose the market and strengthen their monopolistic p os itio n, to the detriment of user welfare.
The European Commission's "Better Regulation Guidelines" distinguish b etwee n s everal typ es o f m arket failures that may require regulatory intervention to ensure optimal production of social welfare for society as a whole. Besides monopolistic market failures, other sources of failure includ e ex terna litie s, in fo rmation asymmetries and missing markets because of high transaction costs and ris ks. Be sides ma rket failures , regulators may also intervene in case of social concerns such as discrimination and unequal d istrib ution o f welfare. In the next sections we discuss three types of data-driven non-monopolistic market failures: negative externalities from data aggregation, asymmetric information problems that distort decision making by d ata users, and newly missing markets that emerge in the wake of th e da ta e con omy b eca use of hig h d ata transaction costs and new sources of data-related risks.

Negative information externalities
So far we discussed the role that platforms play in generating data-driven positive network externa lities . In this section we turn to negative data-driven externalities. We present examples of negative externa lities o n the consumer side in personal data markets, and on the producer side in commercial data markets. Po sitive externalities increase social welfare, provided they can be captured and turned into economic value for users . Negative externalities are to be avoided because they reduce the welfare of user, or they should be internalised by the party that causes the externality.
A first example of a negative data externality is the impact of consumer platforms on the value o f p ers onal data. Data collected on the behaviour of one set of users has predictive value fo r the beh aviou r of oth er users 54 . Acemoglu et al 55 show that the marginal value of an individual's p erso nal d ata is d iminish ed b y negative externalities from economies of scale in data aggregation in platforms and economies o f s co pe in re-use of personal data. Once a firm has accumulated a critical mass of consumer data, the marginal re turn in terms of improved insights and additional value in the secondary re -use market -for example for advertising purposes -from adding another consumer's personal data are close to zero, co mpare d to what can be learned from extrapolation from data already collected about persons with a similar p ro file. Th is reduces the marginal value of a single person's dataset. It also reduces incentives for consumers to p rotec t their privacy since their profile can be assembled from data collected from o ther p ers ons . Th at, in tu rn, creates an excessive supply of personal data. Consumers may invest in privacy protection. That in itself may have signal value that can be exploited against consumer interests 56 . Following the entry into force of th e EU GDPR, an empirical study on the use of personal data for advertising in the travel in dus try 57 finds th at 12 percent of consumers refuse consent to collect their personal data. However, th e s tudy a ls o fin d th at th e reduction in the supply of available data increases the value of the remaining advertising data and, b ecau se of externalities, does not negatively affect the predictability of consumer responses to advertisin g. An oth er study shows that consumers underestimate the negative impa ct o f s h ar ing the ir p ers ona l da ta with a platform 58 . Data sharing improves matching efficiency and makes it easier for consumers to fin d wha t th ey are looking for. At the same time, the increased matching efficiency enables the platform to charge sellers a higher entry costs. That, in turn, pushes up consumer prices for products sold on the platform. Consumers are not aware of this second-round effect of data sharing.
Is this negative externality a market failure that requires regulatory intervention to be corrected? In d ividu als have no better alternative option to realize a higher value for their personal data. Brynjolfsson et al 59 pres ent empirical evidence that "free" services platforms compensate the negative externality and, in fact, generate a large consumer surplus. Consumers thus trade personal data at nearly zero value for valuable online services at a zero "free" price. That suggests that the positive network externalities produced by platforms outstrip the negative externality on personal data. Consumers get more value out of it than th ey pu t into it. Ch arg in g positive prices to advertisers and negative prices to consu mers fo r th ese e xtern alities c rea tes a p ric e distortion, at least from a traditional economics perspective. Zero prices are often seen as a mark et distortion 60 . Trying to correct any of these market distortions may reduce overall socia l welfa re b eca use it would reduce the number of consumers, the volume of data and make the p latform less inte resting for advertisers and for other consumers. It creates a lose-lose-lose situation. Public opinion often g oes in th e other direction, as the quip "if you are not paying you are the product" suggests . Oth er a uth ors go a ste p further and suggest that consumers should be paid for the "data work" that they contribute to platforms 61 .
Negative externalities may also occur on the firm or supply side of platforms. Suppliers sell their go ods and services through online platforms like Amazon, eBay. Platform opera tors c ollect a nd a ggre gate d a ta o n product characteristics, sales and consumer choices across many users. Once sufficien t d ata a re c ollec ted they can predict market responses to changes in product characteristics and prices. Platforms can us e th ese data to spot opportunities to enter the market with their own products and services and compete directly with independent suppliers on the platform 62 . They can also use the data bias search rankings inside the platform in favour of their own products. This is a form of data-driven foreclosure or self-preferen cing th a t dis torts competition. Data collected from suppliers and transactions are leveraged in favour of the platform.

Negative effects from asymmetric information
Asymmetric information between individual users and data-collecting platforms is an almost natural sta te in a data-abundant digital world. Platforms as data aggregators will always have more and better in form ation on the markets that they cover, compared to individual platform users (persons and firms). It is n ot o nly a question of amount of information however. Platforms will manipulate the level of fine -graining of information that they collect from data originators. The willingness to share information with the pla tform depends on the level of detail and the use of the data 63 . Conversely, platfo rms m ay d egra de th e le ve l o f detail and introduce segmentation on the data user side of the platform in order to maximize p rofits from their exclusive data. As private profit-maximizing firms, platforms will use this information asymmetry to their advantage, trying to extract maximum revenue from this data intermed ia tion ro le. Us ers m ay ta ke s uboptimal decisions because of imperfect information signals received from platforms.
Platforms may also use the information to promote its own services and products, competi ng with s ervic e producers on the platform. For example, in July 2019 the European Commission opened an investigation into Amazon 64 . Amazon combines the roles of online retailer on its own account and market place for independent sellers. The platform may have used non-public data that it collects and generates a bou t the ac tivities o f independent sellers to compete with its own sellers.
Researchers have observed that Amazon allegedly degraded the quality of information signals to c ons umer search results to favour Amazon sales and reduce the prominence of sales by inde pend ent s ellers ("s elfpreferencing") 65 . The market-distorting effects of asymmetric information in favour of the platform o pera tor is well-documented in empirical studies on all kinds of search engines 66 . Platforms apply business models that may be based on sales margins (for retailers), commissions on sale s (fo r ma rke t pla ces) o r a dvertising revenue (pure information matchmakers). The incentives embedded in the busines s m ode ls a ffect s ea rch rankings and drive a wedge between user preferences and the financial interests of platform. Fo r e xam ple, hotel booking platforms can manipulate search rankings towards price offers that increase their fee revenue.
There is considerable debate on how an unbiased search engine in an inherently informa tion-as ymmetric world would look like 67 . The "conduit" theory sees search engines as passive in termed ia ries th a t ma ke an "objective" selection of relevant search results in response to a user's searc h q uery. Th e ide a l co nsu merfocused search engine would be a "trusted advisor" that presents results that match his preferenc es. Th at search engine would frustrate the preferences of service suppliers o n th e pla tfo rm, a nd its o wn p rofitmaximizing objective. The appeal for "search neutrality" can be situated in that context. Th e " editor" th eory sees search as a subjectively curated ranking of results in response to a query, with the search engine a s an active editor. The editor view implies that there is no such thing as search neutra lity b ec aus e a ny ra nkin g represents the search engine operator's profit maximizing view. In reality, search results a re ne cess arily a combination of objective conduit and subjective editing. De lo s Sa ntos et a l 68 d emon stra te h ow s ea rch operators are squeezed between the wishes of different types of platform users and carve out a profit margin while keeping all parties reasonably but not entirely satisfied. The stronger their market po sition, the more they may distort the information picture. Locked-in users have no choice to go elsewhere for th eir s ervices . Competitive pressure may sometimes limit platforms' margin for manoe uver 69 . Th es e mo dels s h ow ho w 62 ranking bias is inherent to the platform's use of asymmetric information. Platforms need to d riv e a wed ge between the preferences of users on different sides of the market in ord er to ex tract a p rofit ma rgin to ensure the sustainability of their business model. More recent information theory models expand this in sigh t from rankings to the quality of information collected and shared by platforms 70 .
Note that not-for-profit platforms would not perform better in this respect. They could limit th eir fina ncial needs to cost recovery and charge a fixed fee to users, possibly in function of their intensity o f use. Th e s ide of the market that pays the fee would receive the most optimal information to match their preferences. Other sides may still suffer from bias in the collection and use of information. A platform c ann ot u se its d ata to simultaneously maximize the welfare of all users on all sides of the market, unless their preferenc es wo uld be perfectly aligned. Information asymmetry is a fact of life in digital platform economies.
Independent service providers have alleged that the "Extended Vehicle" data governance model preferre d b y many vehicle manufacturers may lead to self-preferencing in vehicle d ata ma rkets 71 . Un der th is m odel, vehicle manufacturers have the only direct access to all data collected by connected veh ic les; afterma rke t service providers can only get access to these data via a back-end server under the control of the manufacturer. Independent service providers claim that this may distort competition. Competition policy tools can be used to address individual cases where a lack of access to data forecloses such providers. The ris e o f digital car technology has shifted focus on the EU Motor Vehicle Type Ap prova l Regu la tion (2018) a s a regulatory tool to define an appropriate level of information fine-graining to restore informa tio n s ymmetry between authorised and independent service providers. Industry self-regulation has failed bec aus e o f weak incentives for industry players to come to an agreement.
This example brings us to data sharing that is often touted as a means to overcome information asymmetry and maximize social welfare benefits for society 72 because it generates economies of scope in re-us e. Data sharing markets may fail however when the data originator or collector perceives a risk of negative repercussions on his private welfare 73 . Data-driven platforms may offer compensation for this perceived risks, for instance by offering consumers a free service in return for sharing their data, or offering firms en han ced market access in return for sharing their data. Alternatively, platforms can m odu late th e d egre e of finegraining and segmentation of the data they collect and share. Mandatory data sharing upsets these platform strategies, both on the data collection and the data use side of th e pla tfo rm. It m a y re su lt in le ss data collection and undermine the positive externalities from data a ggre gation . Da ta p olicy ma kers n eed to carefully balance these positive and negative aspects of data-driven platforms.

Missing markets because of high transaction costs and risks
High transaction costs in the analogue economy prevented the emergence of many types of markets. Digital data massively reduce information costs and thereby facilitate m arket en try for c ons umers a nd s mall suppliers, from small hotels and bed & breakfasts that can now c ompete with large hotel chains on accommodation booking platforms, to independent taxi drivers who can offer their services on Uber and Lyft, and people entering the online labour market, or staying in touch with a large number of family, frien ds an d professional contacts on social media. All this is made possible by intermediary online platforms. Markets that were "missing" in the pre-digital era suddenly emerge as a result of declining market entry a n d tra ns action costs. However, even in the digital data economy so me ma rkets s till rema in b locked b ec aus e o f h igh transaction costs. Moreover, new services are required in order to keep digital markets running but the y m ay not appear because of still too high transaction costs and risks. In this section we present a few examples o f missing markets because of high transaction costs and risks, and explore how these market failure s m ay b e addressed by a mixture of regulatory intervention and private third-party intermediation.

Transaction costs in personal data services markets
Personal data are an example of market failure because of high transaction costs and missing m arkets fo r services that could reduce these costs. Under the EU GDPR, data subjects have the right to consent to the us e of their personal data before a firm can collect their data. This gives rise to frequent popping u p o f c on sen t notices when consumers browse the internet. Consumers rarely read these consent notices however bec aus e the time involved is often not worth the effort. Even if they do they find it hard to make sense of the no tic es and understand what will happen to their personal data if they give their cons ent. Pe rson al d ata co nse nt notices are difficult to read and uninformative about possible data re-use 74 . A c o nsu mer s urvey c onfirms consumers' ambiguous attitudes towards privacy notices 75 . Another recent consumer survey 76 illustrates h ow risk assessments about sharing personal data on the internet vary widely according to type of data. Financial and biometric information command high subjective opportunity costs. Data use for advertising pu rpos e are not perceived as entailing a significant privacy cost. Location and social network are somewhere in the middle. Valuations vary across countries and by gender and age.
More than two decades of research into the economics of privacy 77 have not produced an objective mea su re of the opportunity costs of personal data sharing. The re-use of personal data has ambiguous welfare effects. It can increase personal welfare when the data are re-used, for example, by search engines to reduce s earc h costs and provide better search results that are more in line with co ns umer p referenc es. It m a y red uce welfare when data are re-used for targeted advertising that is more persuasive than informative a nd d rives consumers away from their original preferences. Informative advertising helps co nsu mers to m ake be tter choices. Persuasive advertising diverts consumer attention away fro m the ir p referenc es. Th is is a lo ngstanding and unresolved debate in the marketing literature. If academics cannot solve the debate, consumers have an even harder time to assess the privacy costs of sharing personal data with an online service provider.
High transaction costs make the current system of consent notices dysfunctional, especially in the presence of depressed market values for private data. What seems to be missing is an active market for privacy management services. Many private start-ups have tried to en ter th e ma rke t for Perso nal In forma tion Management Services (PIMS) 78 . They offer an intermediary platform to handle personal data exchanges with commercial platforms. However, none of these have scaled-up to b ecome s ign ifica nt ma rket pla yers in personal data markets. The reason is clear: they do not really reduce hig h in divid ual tra n sa ction c os ts 79 . Management costs are still relatively high, at least in time spent on the platform, compared to the depresse d value of individual personal data. That makes their services unattractive to consumers who still overwhelmingly prefer dis-intermediated direct data exchange with platforms, clicking almost blindly on th e consent notice.
More economically feasible personal data management req uires te ch nolog y th at s ub stan tially lowers transaction costs. This could happen for example when consent notices become standard ised an d m ach in e readable so that they can be processed by AI-driven machines. Standardisation would include the identity o f the data collector, the purpose for which it is collected, the level of fine-graining in use of the data, and thirdparty commercial partners that may access the data. A privacy service provider could auto matica lly lin k to these third-party privacy protocols and estimate possible risks for the data subject in functio n o f h is o r h er use of the internet. The service provider could then grant or deny consent in functio n o f p re-set c on sum er preferences. Machine learning could gradually improve the efficiency by learning from cons umer b eh aviour and across individuals and websites and collecting a detailed map of data sharing practices b etween firms and websites to suggest alternative service providers with lower privacy costs. Autom ation o f the co nsen t process would complete it in milliseconds, saving data subject a substantial amount of time. The bo ttlenec k lies in the standardisation process however. Platforms can produce their own standardised consent notice but without interoperability the system would run into high obstacles. Collective action seems to be required and that requires regulatory intervention 80 .

Transaction costs and lack of transparency in advertising markets
A similar lack of transparency management services occurs in online advertising markets 81 . Online advertising can be split between "walled gardens" in search (Google) and social media (Fac ebo ok), a n d o pen d isplay advertising where Google holds a strong position. Advertising is a two-sided market between publis hers and advertisers, with several layers of intermediary platforms that do intermediate matching and pric e au ction s for the supply of ad publishing windows and the stock of ads produced by advertisers. For every euro spent on ads by the advertiser, only 62 cents reaches the publisher, the rest remains in intermed iate s teps , la rge ly dominated by Google 82 . It is challenging for advertisers to verify publishing and views of ads because of lack of transparency in intermediate stages. Price auctions in these market are problematic 83 because Google itself participates in the bidding while it has privileged information on the offers of its competitors. Selfpreferencing is an issue. Data transparency and sharing through open standards and automated a d m arket tracking tools could be the solution. It could improve transparency and oversight for advertisers , p ub lis hers and content providers, increase competition and enable all participants to get a better view on what they pay for and what they achieve. It would create an ad data services market. Increased data sharing and transparency in the advertising eco-system may run into all kind s o f a dvers e effe cts th ou gh , n ot le ast because it tests the limits of consumer privacy and commercial confidentiality.

Risks
There are circumstances where potential data suppliers refrain from participation in the production of services markets because it may be costly for them. An example is pooling mobility data between tra ns port s ervice providers in a city. This can have positive social welfare effects b y im proving traffic ma nag ement and reducing congestion and pollution. Carballa 84 demonstrates how commercial tra n spo rt s er vic e pro vid ers (buses, metros, taxis, e-scooter platforms, etc) may gain or lose market shares if they agre e to s h are th eir data on a common platform, depending on price, substitutions and effects. Competitors may use the d ata to improve their offers and increase their market share. Alternatively, b eing o n th e c ommo n p latform m ay attract more users to a particular provider. The net impact is an empirical question. If th e n et impa ct were negative, transport providers would have no incentive to participate in the platform. The platform may be in a position to compensate losers by re-allocating part of the overall social welfare surplus to them. For example, if drivers are willing to pay a positive price for improved congestion management, some of that revenue could be re-allocated to transport service providers that lose from participation. Withdrawing from the platform will reduce social welfare for all however. Regulators may have to intervene to make data sharing mand ato ry to overcome coordination problems, in the public social welfare interest 85 .
Another dimension of transaction costs is ex-post risk in the execution of contracts. According to inc omplete contract theory contracts are necessarily of finite length and can never include provis ions for a ll po ss ib le events. Contracts inevitably come with residual uncertainties that c an give ris e to ex -po st c os ts d uring monitoring and execution of a contract. This is especially the case for trade in non-rival and hard-to -ex clud e 80 Posner and Weyl (2019) p ropose a p articular variant on this theme. They suggest that data subjects s hould un ite in u nions to ne gotiate a higher value for the ir data with data collecting p latforms. Automated data consent notices would re duce co ordin at ion and marke t e ntry costs for such unions. These unions would still face the problem of allocating the social value of the data be tween p rivate me mbers. See also the conclusions section in Be rgemann, Bonatti and Gan (2020). 81 The UK Competition and Marke ts Au thority (CMA) conducted a detaile d study on online advertising markets. T h is p aragra p h w a s insp ire d by that re p ort. Se e http s://assets.publishing.service.gov.uk/media/5efc57ed3a6f4023d242ed56/Final_re port_1_July_2020_.pdf 82 Damie n Geradin and Dimitrios Katsifis Google's (Forgotte n) Monopoly: Ad Te chnology Services on the Open We b, TILEC D is cus sion Pap er, 21 May 2019. Spark Ninety, Transparency in p rogrammatic online display advertising markets; the European Commis sion 's Platform Observatory, Jan 2020. 83 Ge radin and Katsifis (2019); op.cit. Se veral EU competition authoritie s have launched investigations in online advertising, in clu ding the UK, FR and DE. A UK CMA study is exp loring potential re me dies for ads marke ts that could be p art of an e x -ante re gula tory re gime . 84 Bruno Carballa-Smiechowski (2018) Determinants of coopetition through data sharing in mobility -as-a-service. 85 The Euro pean Commission's initiative to p romote business-to-government data sharing "in the p ublic inte rest" should be seen in thi s conte xt. Se e European Commission (2020) Towards a European strategy on business-to-government data sharing f o r th e p ublic inte re st, Report of the B2G expert group. data. Dosis and Sand-Zantman 86 distinguish between contractible and non-contrac tible d ata righ ts. So me contractual provisions may be unenforceable, non-monitorable or lack a commitmen t d evic e. As a re su lt, contracts are subject to the hold-up problem: parties will try to re-negotiate the contract when an unforeseen or non-committable event occurs. This includes risks from data leaks, unexpected data quality p roblem s o r processing errors. In traditional contracts, unexpected costs and benefits are assigned to th e owner o f the traded good or service. In the absence of legal ownership rights 87 , data are ruled by de facto exclusive control. Data holders can use technical protection measures to ensure their exclusive use of the data. They may grant use rights to other parties through bilateral contracts. These contracts only bind the contracting pa rties, no t third parties. In case of data leaks, the data holder has no leverage over third parties that might g et h old o f the data -except in cases where data benefits from intellectual property protection under the copyrig ht, su i generis database or trade secret regimes, which offer protection agains t, re spe ctive ly, rep rodu ction, reutilisation and unlawful acquisition or use of the protected subject matter. Th e se ris ks m ay red uce data collectors' incentives to make data available for re-use and be more restrictive in granting data a c cess . Th e risks of contractual hold-up may be too big for holders of valuable or commercially sensitive da tas ets. Th e Facebook -Cambridge Analytica case has amply demonstrated the risks of bilateral contracting for non-rival data.
Some authors have suggested assigning data ownership rights to overcome this problem 88 . From an economic perspective, ownership rights are residual rights: the costs and benefits of events that are not fores een in a contract or law are automatically allocated to the owner of the residu al rig hts . Deb ates on the p os sible introduction of such rights 89 have faltered and attention has now shifted to introducing data access righ ts 90 . Ownership and access rights are complements. Who should get such rights, if any, is not an easy question. For personal data, there is a "natural" rights holder, the data subject. For non-personal machine-gen erated d ata that may involve several parties for the co-generation of the data, it is often hard to unambiguously id entify a "natural" rights holder. For example, in agriculture land owners, land operators, ma ch in e ma nufac turers , machine operators, sensor owners, data analytics providers, etc. may all claim rights over the data 91 .
A more pragmatic solution may be for data exchanging parties that perceive high post-contractual risks in the execution of the exchange to appoint a neutral third-party intermediary who is ta sked with m ana ging th e exchange in accordance with the terms of a contract. For example, a mobility service provider in a c ity m ay require pooled data from all mobile phone operators in that city to create detailed insights on citizen mobility patterns. None of the data suppliers trust the other to handle the data pool that has s trateg ic co mmercial value for competitors. Solving this coordination problem requires a trusted th ird -p arty inte rmedia ry wh o collects the data, performs the analysis and ensures that only the processed results are shared with ag reed users. This is the domain of semi-commons or governance agreements that seek to overcome the pitfa lls o f commons -that lead to overutilization and underinvestment and facilitate free-riding -and anti-commo nsexclusive private use that leads to underutilization and keeps data locked in silos 92 . Semi-commons are often costly to manage. They are economically feasible when the value of th e a greem ent for th e p articipa nts exceeds the costs.
Data trusts and industrial data platforms fit the neutral profile 93 . In order to g ua ran tee en forc ement, th e intermediary should be neutral and have no stake in the data or the outcomes of the analysis . Th at a voids strategic behaviour at the expense of the participants. The intermediary should only receive a fixed remuneration to produce the desired outcome. This permits him to act credibly as a trusted servic e pro vid er for contractual commitments. He can enforce the commitment because he has full control over the data an d access to the server. That reduces post-contractual risks and monitoring costs for the participants. Commercial for-profit data platforms may also provide guarantees against data leaks but they will exploit the data in their own interest, and sometimes against the interests of the data providers. They create new sources of ex-post risks.

Data-driven ecosystems
So far we discussed the use of data within a single platform. In this sectio n we c ons ider th e u se o f data across platforms and markets. Information from one market or platform can be re-used in another market.
In classic economics, firms that re-use resources obtained in one market to enter into a noth er ma rke t are known as conglomerates 94 . Conglomeration can be fuelled by excess resource capac ity a nd ma rke t power considerations. Re-use of financial, human and physical resources within a firm to enter a d ifferen t ma rket may be interpreted variously as leading to greater efficiency or as a sign of inefficient markets for sharea ble inputs 95 . When capital markets are not functioning well, firms with excessive financial resources m ay d ecide to invest directly in other sectors rather than investing their surplus through capital markets. Firms ma y als o cross-subsidize between different units to expand market power. Google advertising revenue subsidizes many of its other activities. Transmission mechanism may include data. Co ntra ry to ph ys ical g oo ds, d ata are intangible resources that are non-rival by nature and can be re-used for different purposes at the same tim e, without diminishing their value for any of these purposes. Moreover, a unique feature of data is that economies of scope in data aggregation can generate additional value from combining data across different markets and platforms.
Apart from creating a single legal and financial conglomerate structure, firms can also decide to collab orate through alliances that exploit complementarities between products and services. Such alliances fa ll ou tside collusion prohibitions as long as markets are seemingly unrelated. An ecosystem can be defined a s a s et o f firms that coordinate complementarities in different markets in a deliberate non-generic a nd s trateg y way with a view to create more value 96 . Examples of digital ecosystems include operating systems, including apps operating systems. While it is not hierarchical or subject to vertical integration, there is an element of market power that makes it non-voluntary. It creates mutual dependencies. Their dis tin ctive featu re is th a t the y provide a structure within which complementarities (of all types) in production and/o r c ons umption ca n b e contained and coordinated without the need for vertical integration. Powerful firms craft rules and shape the process of ecosystem development to tie in complements and make complementors abide by them.
Data can be leveraged as a transmission channel to coordinate compleme ntarities . Fo r ex ample, Goog le combines data from search, e-mail, apps, location, maps, etc. to drive its advertising business. A ke y featu re of this ecosystem is standardisation of data interfaces between these products so that d a ta c an e as ily b e transferred and interpreted across the different components. For example, smartphone hardware manufacturers who wish to use Android apps are required to join the Open Handset Alliance which ob ligate s members to use only Google approved Android versions. In this way, even though An droid is o pen s o urce, Google's control prevents fragmentation of the code base by means of some level of s tand ard iz ation . Th e downside is that potential operating system innovations are not interoperable with Google data services and Google may be able to charge higher prices for those services. The offsetting benefit is that app d evelop ers and hardware manufacturers have to contend with fewer variants of the Android operating system than th ey otherwise would and are thus able to ensure interoperability.
Very large platform firms, such as the GAFAMs (Google, Apple, Fa cebo ok, Ama zon a nd M icros oft), h ave become sprawling businesses that keep expanding into n ew s ectors a nd d om ains , re -u sing da ta and algorithmic tools and applying "envelopment" 97 and bundling strategies to enter into new markets . Th rou gh envelopment, a provider in one platform market can enter anothe r pla tfo rm ma rket, c ombinin g its own functionality with the target's in a multi-platform bundle that le vera ges s ha red u ser re la tio nsh ips. Th e envelopment hypothesis builds on the traditional view of bundling and extends this to include th e s tra tegic management of a firm's user network. Envelopers capture share by foreclosing a n inc umb ent' s a cce ss to users. In doing so, they harness the network effects that previously had protected the incumbent. Eisen man n et al (2011) distinguish three envelopment strategies: (a) supply side economies of scope: co mpon ents an d services that can be re-used in other platforms, (b) demand side economies of scope: user overlaps th at c an be leveraged and (c) negative price correlations between different services. Data can be an important vecto r for the first two strategies.

Concluding remarks
The data economics issues that we discussed here have much in common with th e la w an d e cono mics o f intellectual property rights (IPR), such as patents and copyrights. The economic characteristics o f d ata , n onrivalry and no natural excludability, are similar to those of innovation. IPR give exclusive ownership rig hts to innovators in order to yield a return on investment and an incentive for innovation. IPR policies strug gle with the same balancing act as data policies, between the social welfare costs of monopolistic exclusive rights and the social welfare gains from the innovation incentive effects. Monopolistic IPR licen se p ricing, a bove th e marginal cost of reproduction, reduces access to innovation. This is an unavoidable socia l ha rm in ord er to generate dynamic innovation benefits. Society manipulates that balance by limiting the s cope of ex clus ive rights. Similar considerations apply to data collection, access and use, including in online p la tfo rms. It to o k several centuries for society to develop a coherent system of IPR rights and it i s s till evolving , d rive n b y technology that affects the cost of innovation production and diss eminatio n a nd th erefore the b a lanc e between protection and access. Digital data are a very new product in society. There are live ly d isc uss ions between proponents of exclusive ownership rights and defendants of more open a cces s rig hts 100 . A m ajo r difficulty with data is the attribution of such rights. Innovations are u s ually p rod uced b y a well -defined innovator or group of innovators with common interests. Data by contrast require at least two parties, a d ata originator and a collector, and often more, sometimes with diverging interests. While personal data rights may be "naturally" attributed to a data subject, attribution is more difficult for non-pers ona l d a ta whe re many parties may be involved in origination, collection, aggregation and analysis of the data. Changes in attribution of rights may affect entire data value chains and downstream services markets. They will affect the p ac e of innovation that data can bring to society.
More importantly, both ownership and access rights overlook th e inh erent s o cial va lue o f d a ta a nd th e externalities that they entail 101 . A single data originator or collector is usually not in a position to intern alize these externalities. Market failures will remain. The debates often give the impression that the attribu tio n o f exclusive rights and access rights or data sharing rights are policy objectives in themselves. Th is pa per h as emphasized that such rights are only policy instruments that should be used to maximize the social welfare that society as a whole can derive from the use of data.
In this paper we focused mainly on social welfare as a benchmark for identifying market failures a n d p olicy intervention. Public policy economics defines social welfare me asu re 102 a s th e c omb in ed welfare o f all stakeholder groups in society, consumers and producers. However, the mainstream benchmark in competition law is a narrower consumer welfare benchmark 103 . These two measures c an ea sily lead to co ntr ad ictory conclusions. For example, regulatory intervention to open market acc ess on on e s id e o f a pla tfo rm may reduce welfare on other sides of the platform market. Classic economics rejects the comparison of welfare gains and losses between groups or individuals because consumer welfare is assumed not to be quantifiable. Alternative approaches accept quantification but open the door to measures of social welfare impro vement whereby some parties gain at the expense of others. Economics d isting uis hes b etween stric tly Pa retoimproving welfare measures whereby no agent loses welfare, and a less stringent Ka ldor -Hicks 104 we lfare measure whereby some agents may lose but could be compensated by the gains that other agents ma ke in order to avoid equity concerns. Western societies have historically put emphasis on individual wellb eing a n d are reluctant to impose private costs on individuals in order to achieve wider social welfare gains, unless they are compensated by transfers to ensure some degree of equity. Other societies have a more collective view of social welfare and attach less importance to individual welfare. They would find it easie r to a cc ept p rivate costs as long as overall welfare increases. This underscores the borderline between the eco nomics o f d ata and cultural, social and political value judgements in society on how to maximize societal welfare from data.