Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now

Executive Summary 

The Digital Services Act (DSA) presents a significant opportunity to enhance the transparency and accountability of online platforms, by ensuring that external researchers can systematically access publicly available data. The inconsistent and often arbitrary implementation of Article 40(12), the legal basis for such access, by very large online platforms (VLOPs) and very large online search engines (VLOSEs) has, however, created substantial barriers for researchers, limiting their ability to conduct meaningful investigations. This issue was particularly evident during the 2024 European Parliament elections, where civil society organisations were unable to systematically monitor public discourse across all relevant platforms. 

While the European Commission (EC) is actively enforcing existing regulations by opening proceedings against Facebook, Instagram, and X, research organisations — interested third parties with rights under Article 40 of the DSA — are not formally included in these processes. This approach also risks creating a patchwork of agreements for each platform, leading to inconsistencies and requiring advocacy organisations to push for changes on a case-by-case basis, which is both time-consuming and inefficient. 

To address these shortcomings as soon as possible, the EC should accompany individual enforcement actions with a guidance document to clarify its expectations of how Article 40(12) should be implemented. Such guidance would offer important guardrails for both researchers planning their work and platforms aiming to avoid legal action, making enforcement more predictable and efficient. Early in the DSA’s implementation, this approach garnered support from platform researchers, including DRI, and even from the VLOP TikTok, highlighting its broad potential impact. 

By issuing this guidance, the EC can address the current gap in centralised rules for data access. This action is both necessary and timely, as the implementation of the DSA has, in many cases, actually worsened data access, by prompting platforms to adopt more restrictive rules. 

Unstandardised Implementation of Art. 40(12) Risks Arbitrary Decision-Making by VLOPs and VLOSEs 

Article 40(12) of the Digital Services Act (DSA) mandates that VLOPs and VLOSEs provide researchers, including those affiliated with non-profit organisations, with publicly available data, in real time and without undue delay. While this is a significant step towards greater platform transparency, nearly a year after it came into force, the implementation of this provision has left researchers disappointed. Many platforms have used Art. 40(12) to overhaul their data access provisions, making access more difficult and cumbersome (for example, Meta phased out the easy-to-use CrowdTangle tool, introducing less practical new arrangements). 

Platforms are taking different approaches to fulfilling their obligations to provide researchers access to publicly available data. While some (e.g., Google Researcher Program) allow researchers to scrap data from the platform’s interface, most VLOPs provide access through application programming interfaces (APIs). Regardless of the access mechanism, almost all platforms require researchers to fill in application forms to assess whether they meet eligibility criteria, i.e., that they are independent from commercial interests; that they  disclose the funding of their research; that they are able to comply with data security and confidentiality requirements; and that their research contributes to the detection, identification, and understanding of systemic risks in the Union, as outlined in Article 34(1) of the DSA. 

The use of such forms is legitimate and necessary, but experience with the process, documented through initiatives such as the DSA 40 Data Access Tracker or the DSA Data Access Audit, has been negative in many cases. Some of the significant challenges encountered include: 

Challenges related to the application form requirements: 

  • Complicated and lengthy processes: Applications often lack clear timelines for decision making, and provide limited information on how platforms handle them. In a report analysing early DSA compliance, the Weizenbaum Institute found that the average response time to data access requests was 1.5 months, which does not reflect the “without undue delay” requirement outlined in Article 40(12). DRI has experienced such delays. We applied through the X DSA Researcher Application Form on 17 April 2024, and, as of August 2024, four months later, we had received neither access nor a conclusive response. 
  • Stricter interpretations of the researchers’ eligibility criteria: Platforms are imposing requirements not mandated by the DSA. For example, TikTok’s Research API asks researchers to have “demonstrable academic experience and expertise in the research area specified in the application”, while Meta’s Content Library API requires lead researchers to demonstrate their technical skills with Python, R, SQL, or another coding language. 
  • Stricter interpretation of eligible research questions. Some platforms have rejected applications because, in their view, the proposed research questions do not contribute to detecting, identifying, and understanding systemic risks in the EU. This requirement gives platforms considerable control, as interpreting whether research questions meet the DSA criteria is discretionary, and interpretations can vary widely. Differing requirements across platforms reflect this discretion: some platforms require research proposals of 250 characters, while others require up to 5,000 characters, along with detailed literature reviews. 
  • Applications based on specific research questions unnecessarily complicate the process for non-profits. Many non-profits operate with multi-year grants that support ongoing social media monitoring across various events, such as elections or crises. This constant need to file new applications every time they wish to conduct research on digital platforms, despite the continuity of their grants, creates an excessive administrative burden. 
  • Confusion about how organisations other than universities can be compliant with the EU’s General Data Protection Regulation (GDPR). Requirements such as having institutional review boards or ethical reviews pose practical challenges for non-profits, as they typically lack these procedures and/or institutions. Researchers at the Hertie School and the EDMO Working Group on Platform-to-Researcher Data Access have pointed out that projects using public APIs usually deal with data of low to medium sensitivity. Therefore, platforms should require researchers to apply data protection measures that correspond to this level of risk
  • Potential legal risks for researchers due to Terms of Use. Before submitting their applications, researchers must sign contracts with platforms that include clauses on data refresh, management, and protection, and, in some cases, pre-publication reviews. Although many of the restrictions in these terms of use are motivated by privacy concerns and the need for platforms to manage GDPR-related risks, some, such as pre-publication reviews, are clearly unreasonable, and would greatly limit and slow down autonomous research and publishing.  

Challenges related to data quality and accessibility: 

  • Different definitions of “publicly available data” across platforms. When platforms allow unlimited scraping, researchers can collect any information visible on the platform’s web pages. However, with API access, platforms control what data is accessible, leading to inconsistent interpretations of what constitutes “publicly available data” across different platforms.  
  • Imposition of daily rate limits or quotas on the data that can be accessed. While daily caps on data access are intended to prevent abuse, they impose significant barriers to systematic research, which often requires large datasets. 
  • Lack of time-series data. Most platforms don’t offer time-series data. Notably, as Mozilla and others have reported, the new Meta Content Library has removed this feature, which was previously available in CrowdTangle.  

According to the Weizenbaum Institute, only a small number of researchers have gained access to publicly available data from VLOPs and VLOSEs, with the vast majority of these working at academic institutions. Even if researchers are granted access, significant issues persist. Platforms may, for instance, arbitrarily alter a researcher’s access permissions after granting them. At a workshop organised by the European Digital Media Observatory in May, researchers with API access reported several issues, including poor data quality, the lack of real-time data, inadequate search functionality, no access to deleted posts, unclear handling of edited posts, and mandatory use of VPNs, which may imply monitoring by the platform. 

The inconsistency in applying Art. 40(12) suggests that VLOPs and VLOSEs have created a new de facto regime for data access. This concern is not new. Civil society organisations, including DRI, have repeatedly proposed the establishment of a standardised data access request procedure for all VLOPs and VLOSEs, or the mandatory harmonisation of terminologies and data structures across platforms. 

This issue is compounded by delays in enacting the Delegated Act on Data Access and the discontinuation of tools like CrowdTangle and Twitter’s API, which previously provided easy access to high-quality data for researchers globally. Currently, with no centralised rules for both Art. 40(4) and Art. 40(12), the work of researchers — academic or otherwise — are subject to the discretion of VLOPs and VLOSEs, precisely what the DSA was intended to prevent. 

The EC’s Enforcement Approach to Data Access Has Limitations 

Researchers have informed the EC of the many challenges they face in accessing publicly available data. In response, the Commission has activated the enforcement mechanisms under the DSA. On 18 January, it requested information from 17 VLOPs and VLOSEs on their compliance with Article 40(12). On 30 April, the Commission opened formal proceedings against Facebook and Instagram, partly due to Meta’s planned shutdown of CrowdTangle in the context of the European Parliament elections. The Commission described this as a failure by Meta to properly assess and address the systemic risks that Facebook and Instagram pose to civic discourse and the electoral process. Moreover, it recommended that the platforms “(…) take swiftly all the necessary action to ensure effective real-time public scrutiny of its service by providing adequate access to researchers, journalists and election officials to real-time monitoring tools of content hosted on its services”. 

On 12 July, the Commission announced preliminary findings indicating that X was violating several provisions of the DSA, including Article 40(12). In particular, the Commission found that X’s prohibition on allowing eligible researchers to independently access publicly available data through scraping is incompatible with the DSA. In addition, the platform’s data access application process discourages researchers from pursuing their projects, and imposes disproportionately high fees for basic data access. DRI had anticipated this violation in an op-ed published last year. 

While we acknowledge the Commission’s proactive enforcement efforts, which reflect a strong commitment to making the DSA an effective regulatory framework, relying solely on this approach is insufficient to address the challenges posed by the unstandardised — and, at times, arbitrary — implementation of Art. 40(12) by VLOPs and VLOSEs. 

On the one hand, enforcement decisions have an inter partes effect, meaning they apply only to the specific companies or parties involved. These decisions can set important precedents and guide future cases, but interested parties, such as research organisations, are not formally part of the process. Even if civil society organisations were involved in some way in some aspects of the proceedings, it would be very time-consuming to deal with each platform.  

The EC may also resolve enforcement cases through settlements or commitments, rather than non-compliance decisions. Recently, For example, TikTok agreed to permanently withdraw the TikTok Lite Rewards programme from the EU, and pledged not to launch any similar programme that could bypass this restriction. These commitments are legally binding, and any breach would immediately violate the DSA. 

While such agreements can certainly lead to improvements, they often lack clear, enforceable standards for future cases. Typically, the only public information provided is a press release summarising the outcomes, without offering details that could inform other parties and increase overall understanding of Article 40 (12). Improvements to data access mechanisms would also need to be negotiated on a case-by-case basis, addressing each VLOP or VLOSE individually. What is more concerning is that civil society organisations and researchers — the primary interested parties with regard to Art. 40 (12) — would be excluded from these negotiations, leaving them without a voice in decisions that directly impact their work.  

Beyond enforcement measures, some have suggested that the EC could address the problems with the implementation of Article 40(12) through the Delegated Act on Data Access. This idea stems from the call for evidence released over a year ago, where the Commission invited stakeholders to propose processes and mechanisms to facilitate access under this provision. There are, however, doubts about this possibility, since Article 40(13), the legal basis for the Delegated Act, does not explicitly include access to publicly available data within its scope. Furthermore, Commission representatives have repeatedly indicated that the Delegated Act will focus exclusively on access for vetted researchers.

Even if the Delegated Act addresses aspects of Article 40(12), it will likely do so indirectly — such as by defining what does not constitute publicly available data, or by highlighting the importance of Art. 40(12) for exploratory research. This approach is welcome, but insufficient, as it provides guidance only by exclusion, lacking the clear, direct standards needed to effectively regulate access to publicly available data. 

The EC Should Prioritise Issuing a Guidance Document to Clarify and Standardise the Implementation of Article 40(12) 

The Commission frequently issues guidance documents to help member states, businesses, stakeholders, and the public understand and apply specific aspects of EU law. These documents can be issued as written guidelines, online FAQs, or delivered more informally through dedicated meetings. Crucially, according to the EU Better Regulation Toolbox, “the Commission has an autonomous power to issue guidance documents”, as per Art. 292 of the Treaty on the Functioning of the European Union. This means the Commission does not need legislative approval to issue guidance documents but, instead, can independently decide when harmonisation or clarification of EU law is necessary. 

Despite their non-binding nature, guidance documents — particularly those that interpret EU law — can have indirect legal effects, as recognised by the Court of Justice of the European Union. They “(…) provide a helpful reference point for a judicial assessment”, and can thus be used as tools to enhance clarity and legal certainty for all parties involved in complying with EU law. 

In the context of the DSA, the Commission has already issued guidelines on mitigating systemic risks for electoral processes and a guidance document supporting online platforms and search engines to comply with their obligation to report user numbers in the EU. While the first instrument was explicitly foreseen in Art. 35(3) of the DSA, the latter was a response to several questions and requests for clarification from VLOPs and VLOSEs. This shows that the DSA does not provide an exhaustive list of the topics for which guidance can be issued. 

A guidance document from the Commission would be an important step forward in addressing the many challenges hindering researchers’ access to publicly available data under Article 40(12). Early in the DSA’s implementation, scientists and platform researchers, including those from DRI, advocated for this approach. Recently, a report by the Mozilla Foundation and other organisations reiterated this proposal. One VLOP, TikTok, has even supported the idea in the past, demonstrating that such guidance would benefit researchers, platforms, and the Commission alike. 

For researchers, this would provide clarity, by addressing their uncertainties and offering a clear framework on what to expect from Article 40(12). For VLOPs and VLOSEs, it would set minimum compliance standards, helping them avoid investigations or enforcement actions. For the Commission, it would streamline enforcement, making the process more efficient and predictable. 

We commend the Commission’s extensive efforts in building data access structures for researchers. This complex issue requires coordination and insights from many stakeholders, as well as alignment with other legislation, particularly the GDPR. The guidance document we propose is intended to complement the current work of the Commission, particularly the Delegated Act on Data Access, planned to be published in autumn 2024. Hence, this initiative should not impact the shared sense of urgency on the publication of the Delegated Act, as such a delay would be detrimental to researchers as well. Instead, the Commission could use the insights gathered through the consultation structures already developed for the Delegated Act to inform the guidance document on Art. 40(12). 

We, the undersigned civil society organisations and independent researchers, respectfully make the following recommendations to the European Commission: 

  1. Prioritise issuing separate guidelines for the implementation of Article 40(12) of the DSA within the coming months. This is crucial, because publicly available data is essential for exploratory and other research, often sufficient for investigations by non-academic researchers, such as those affiliated to civil society organisations, and would address the current gap in data access, where no centralised rules exist for either Article 40(4) or 40(12). 
  2. These guidelines should be developed through a public consultation process, including dialogue with online platforms and other key stakeholders, such as researchers and civil society organisations. To make this process more efficient, the Commission could, for example, incorporate insights gathered from the consultation structures already developed for the Delegated Act.  
  3. The content of the guidelines should address, at a minimum, the following key areas: 
  • Definition of “publicly available data”: Establish a clear and consistent definition across platforms. 
  • Standardised application forms: Develop common application forms across platforms, with only relevant and necessary questions. 
  • Eligibility criteria interpretation: Provide principles for interpreting the eligibility criteria set out in Article 40(12), including examples of research questions aimed at understanding systemic risks within the EU. 
  • Response timelines: Define clear timelines for responding to data access requests, specifying what constitutes “without undue delay” (e.g., one week in normal circumstances, and 24 hours in emergencies). 
  • Data protection expectations: Outline the data protection measures expected from researchers, particularly those affiliated with non-profit organisations. 
  • Cost-free access: Clarify that access under Article 40(12) should be free of charge. 
  • Legal status of public interest scraping: Provide clarity on the legal status of public interest scraping. 
  • Permissible terms of use: Outline permissible and impermissible clauses in terms of use, such as the conditions under which pre-publication reviews are allowed, to mitigate legal risks for researchers. 
  • Dispute resolution mechanisms: Discuss the inclusion of a third-party body or other alternatives to resolve disputes related to the denial of data access by VLOPs and VLOSEs. 

Undersigned: 

Organisational Signatories 
Democracy Reporting International  Science Feedback DSA 40 Data Access Collaboratory Das NETTZ Civil Liberties Union for Europe
Institute for Strategic Dialogue (ISD) Mozilla Foundation 
Individual Signatories 
* Sofia Calabrese (European Partnership for Democracy). * Dr. Martin Degeling (interface).  * Dr. Julian Jaursch (interface).  Dr. Philipp Darius (Center for Digital Governance, Hertie School). 

* Indicates the individual’s affiliation is listed for identification purposes only; does not imply institutional endorsement. 

Date: September 2024 

This brief was written by Daniela Alvarado Rincón, Digital Democracy Policy Officer (DRI), with contributions from Beatriz Saab, Digital Democracy Research Officer (DRI), and Michael Meyer-Resende, Executive Director (DRI). We are especially grateful for the insightful comments and feedback provided by the undersigned researchers and organisations. 

The brief is part of the access://democracy project, funded by the Mercator Foundation. Its contents do not necessarily represent the position of the Mercator Foundation. 

Co-organised by Democracy Reporting International, Forum Transregionale Studien, 
Berliner Landeszentrale für politische Bildung and Verfassungsblog.

Thursday 20 February 2025
Revaler Str. 29, 10245 Berlin

18:30 – 20:00

Supported by

Stiftung Mercator GmbH

Related posts