Defining ‘Open Science’
UNESCO states open science is:
“… An inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone.”
The definition sets the goals of open science as being for the benefit of science and society, and the opening of processes and knowledge “… beyond the traditional scientific community.”
The 2022 publication by the US National Academies of Sciences, Engineering and Medicine, ‘Open Scholarship Priorities and Next Steps’ discussed how, from a reproducibility standpoint, the sharing of code and software are critical for open science. In certain fields (e.g., computer vision), research papers will no longer be accepted for publication without the accompanying software being made available as open source for reproducibility.
However, in reality open source and open science include more than the software, data and articles delivered as outputs. Professor Mary Shaw of Carnegie Mellon University (CMU), points out that open source includes a range of software artifacts including protocols, scripts, workflows, machine learning models and more – all of which must be considered when thinking about open source licensing and technology transfer.
At the recent OSPO++ ‘Open Source Innovation in Universities event’, Sayeed Choudhury, Director of the Open Source Program Office (OSPO) at CMU Libraries spoke of how the current policy landscape is influencing the shift toward open science as default practice. In parallel with this, as open science and innovation continue to redefine the research landscape, it’s critical for universities to focus on the inclusive and accessible nature of open science, the possibilities it unlocks and the pressing opportunity to shape its future.
Choudhury pointed to university projects that have been instrumental in the advance of open science goals, whether by accident or intentionally. For example, the Covid-19 dashboard developed at Johns Hopkins University at the very start of 2020 began as a research project that wasn’t expected to last long. The dashboard went on to become a critical open data source leveraged by a series of other dashboards and resources that helped scientists, medical professionals and government bodies respond to and manage the pandemic.
The Covid-19 pandemic made the world stand still, and open source data and development played a critical role in getting it turning again. Similar work and intention should be applied to climate change, disease research, sustainable development and much more. With a stated goal of open science being to benefit society as well as science, there is arguably no going back.
Formal government recognition of Open Science
Perhaps the most notable official voice in the United States weighing in on the meaning and impact of open science is the White House. The Biden Administration declared 2023 the ‘Year of Open Science’ and presented a formal definition, identifying open science as:
“ … The principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility and equity.”
The specific definitions emerging and the bodies behind them are moving open science and open source from sidebar options to primary objectives. Within the past year, other national institutions have officially updated policies and guidance around research, data and outcomes to make them open by default as much as possible.
Embargos, security and ‘open as the answer’
In August 2022, the US Office of Science and Technology Policy issued a memo entitled “Ensuring free, immediate, and equitable access to federally funded research”. The memo advised all federal agencies to make research data publicly accessible without an embargo period. Choudhury noted this focus on open data as a key development in the move toward open science, presenting a path toward openness that goes from articles and research publications to open access of data and then open software.
In early 2022, the US Department of Defense also updated its guidance around open source and application of its 2018 Cyber Strategy with instruction to make research open by default.
Earlier this year, Jen Easterly, Director of the US Cybersecurity and Infrastructure Security Agency, gave a talk at CMU challenging researchers to prioritize safety and security in open source software development - in essence to make OSS “secure by design” and “secure by default.”
The recent guidance and communications on open science highlights the national focus in the US on making all federally funded research publicly available in as open a manner as possible. This signals to universities and other institutions that open is the way forward. This also presents an opportunity for university OSPOs to engage with government, funders and other stakeholders on the evolution of future policy.
Open Science in Higher Education
For decades, the research and development undertaken in universities has been moving toward openness. Now with an official focus on open science and stated intent to make research, data and eventually software open and freely accessible for the benefit of science and society, greater consideration of the ‘open’ in open science is coming to the fore.
One example of openness is the approach taken by the Sloan Digital Sky Survey. One of the project principals, Alex Szalay, professor of physics and astronomy at Johns Hopkins University, explained to Choudhury why “astronomy data are worthless.”
“He said that it has no commercial value,” Choudhury recalled. “Nobody cares about whether you sell or you can buy astronomy data. It doesn’t have any patent restrictions … It is one of the most open scientific data sets you might imagine.”
The output from the work of the Sloan Digital Sky Survey offers an exciting view of what is possible when openness is integral to a data flow from the start. The “worthless” data collected by the telescope in New Mexico can only be interpreted by approximately a dozen people in the world. However, once data is processed, it becomes more meaningful and accessible to more people. Once that processed data is released publicly on sdss.org, it is then open to anyone in any field or profession.
Choudhury pointed to a different approach to openness from the CMU Cloud Lab. This project is based on the successful company founded by two alumni called Emerald Cloud Lab who developed it for start-up companies in the San Francisco Bay area who deliberately don’t wish to share results.
It is currently being adapted for the first academic, remote-controlled laboratory. As a potential model for similar projects at other universities, it is critical that its processes and data flows are open. The work to open up Cloud Lab is underway. However, it takes much more effort, time and resources to open up a mature, closed project in comparison to designing it as open from the start. Choudhury notes that starting from the open is the key to unlocking efficiency moving forward.
Empowering universities to design for openness
The ‘Rolling Wall of Openness’
“The idea of being open by default does not presume, constrain or dictate that you must be open,” says Choudhury. “But from a design perspective; from a way of working; from a way of integrating research into other types of areas; open by default will give us more options. It’s really as simple as that.”
Choudhury pointed out that openness can be a kind of continuum, or as Josh Greenberg, Program Officer at the Sloan Foundation calls it, “A rolling wall of openness.”
Research, data or outcomes can be open but do not have to be made immediately public. For some research, an embargo period is critical to allow the researchers who have done the work to be the first to use or publish its outcome. By making open the goal and designing for that from the start; projects have a greater opportunity to fulfill the intention of open science and open source. Even astronomers embraced this concept of an embargo period for data releases.
Measuring open science
Given the expectation for research projects to be open by default, the next question is, ‘How do we measure and report on innovation when data, processes and outcomes are already open?’
This is a question facing enterprises as much as universities. Earlier this year, IBM announced its decision to de-emphasize patents as the primary measure of innovation and to design new metrics of success in the age of open source.
As corporations seek new ways of measuring innovation within an open-by-default world, so must universities identify methods to support, measure and encourage ongoing development within an open environment. This adds a new dimension to persistent conversations around risk, security, progress and success. However, debate and invention in these areas and more have always led to important contributions from universities.
The discussions to come are sure to be thought-provoking, ongoing and meaningful. They’re also central to the cohesive progression not only of research and development across public and private sectors, but also of societies on and off campuses. With such wide-reaching implications, openness seems the only smart place to start.
View the full OSPO++ event talk with Sayeed Choudhury here.
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)