The world’s first dedicated large language model (LLM) for Carbon Dioxide Removal. Conceived and curated by OpenAir members, powered by

Read: Frequently Asked Questions

  • WHAT IS CDR.AI?+ is a large language model (LLM) developed and maintained by OpenAir members to support and deepen rigorous, science-based understanding of carbon dioxide removal (CDR) among the general public, policymakers, journalists, and civil society organizations. It is free and open to use by anyone seeking information. 

    Our platform is enabled and hosted by Pulze, a California-based technology company.

    The idea for was first proposed by OpenAir member Tina Baumgartner, who has remained a core contributor to the project since its launch. Other founding members of the team include Tank Chen, Peter Hoberg, and Chris Neidl.


    An LLM, or Large Language Model, is an advanced type of artificial intelligence that can understand and generate human-like text responses based on the patterns it has learned from large amounts of data that it can query instantaneously. It can be used for tasks like answering questions, writing essays, translating languages, and having conversations. Popular LLMs that you may already be familiar with include ChatGPT, BERT (Google), and RoBERTa (Facebook), but there are thousands of models currently in operation for both public and private use. is a form of LLM known as a RAG, or Retrieval-Augmented Generation. A RAG is a method in AI that combines searching a large database for relevant information with generating human-like text. It is therefore similar to an LLM. However, while LLMs generate text based on pre-learned data, RAGs enhance this capability by retrieving and using real-time information to produce more accurate and relevant responses.

    RAGs tend to provide more accurate and up-to-date responses by incorporating current and specific information retrieved from external sources but they are also more complex due to the additional retrieval step, which involves searching for relevant information and integrating it into the response.


    ChatGPT and other widely used LLMs are powerful tools that support learning about any subject, including CDR. However, at OpenAir we believe that within the current context of CDR education and knowledge-building, there is a need for ai tools that select only from the highest quality, and most credible sources. Further, establishing credibility requires transparency; therefore all sources queried to generate responses with are clearly listed in the response. Finally, because CDR is such a fast evolving domain, with new knowledge emerging at a rapid rate, it is important to select for recent data. Therefore, only draws from sources that have been published since 2022.


    Unlike many popular LLMs, which draw from a variety of categories of information – such as media content and public websites, as well as institutional and academic sources –’s data sources are restricted to a narrower range of categories to ensure very high quality.

    These include:

    • Peer-reviewed articles published in top-ranked academic journals. includes only journal articles that have been subjected to peer-review prior to publication. Further, selection consideration is limited to articles that were published in journals that have attained a minimum SCImago Journal Rank (SJR) for the year that the article was published. SJR ranking is a robust and widely accepted tool for evaluating academic journals. It provides a comprehensive view of journal quality by considering both the number of citations and the prestige of the citing journals. All ranked journals are distributed into five groups, or quintiles (‘Q’), with Q1 being the highest ranked. Only articles published in Q1 journals can be considered for inclusion in’s data pool.


    • Resources published by accredited academic institutions, university presses, or other independent bodies. Many quality knowledge resources are published directly by academic institutions, and other research bodies in the form of reports and books. includes such resources, but at present, restricts selections to those published by institutions that have attained a minimum score of 75 for academic reputation in the Quacquarelli Symonds (QS) World University Rankings for the specific year that the resource was published. QS World University Rankings is widely accepted and respected globally to assess the relative quality and reputation of universities. In addition to university resources, also includes relevant materials published by different national academies of sciences and engineering.


  • HOW ARE CDR.AI'S DATA SOURCES SELECTED, MAINTAINED AND MODIFIED OVER TIME? + is entirely volunteer led and managed by OpenAir members. Our core team of data curators continuously add new source materials to our data pool over time, using search terms related to a variety of CDR subjects and topics. The latter include key word searches for all known and emerging carbon dioxide removal methods, but also for key cross-cutting concepts that relate to public policy; techno-economic assessment; monitoring, reporting & verification; life-cycle assessment; public perceptions and attitudes; and other factors.

    Importantly, our team does not select sources based on the confirmed or perceived perspectives or biases of source authors, or on the specific conclusions put forth in sources. Rather, we start with search terms using platforms such as Google Scholar, or on individual journal webpages, and filter results by date. Positive search results are then reviewed according to our quality standards described above before being uploaded to the LLM data pool.

    At the time of’s launch (July 2024) over 600 individual sources have been uploaded to the data pool. This number will continue to increase every week, indefinitely, as our volunteers identify, review and incorporate new sources.

    You can check out the current list of sources that have been uploaded to here.


    CDR.AI can only offer answers related to subjects that are discussed or cited in the different sources included in the data pool. For questions that fall out of the scope of those subjects, a “not enough information” response will be generated.

    The LLM first scans all sources in the data pool to find chunks of content that strongly suggest that the source contains information related to the question asked. Once sources have been identified, they are scanned in depth to find pertinent information. This all happens in less than a second. Then a response will be generated that integrates and analyzes all of the relevant content.

    No LLM – including – is perfect. All models are subject to error, misinterpretation, and even ‘hallucination’. Therefore we strongly encourage users to use very clear and concise language to pose questions, and to consider repeating questions in different variations to get the best and most accurate results.



    Yes, runs on the Claude 3.5  Sonnet LLM model, developed by Anthropic. This is a state-of-the-art LLM that excels in natural language understanding and generation while prioritizing safety and ethical considerations. It builds on the strengths of previous models to provide improved performance and reliability across various applications. Over time, our team will consider other models as options continue to evolve and improve.