Measuring Open Source Project Health

In recent years, we learned the hard way that open source software requires substantial maintenance work. Just because the source code is open to inspection does not mean anyone is looking. While not the first incident, Heartbleed (1) raised the awareness that even widely used open source software needs to be maintained. Since then, supply chain attacks have become commonplace. Software security is getting much attention, with legislators in the US and Europe drafting laws to improve cybersecurity that will affect all software, including open source software. We must remember the lesson we already learned with Heartbleed: We need healthy open source projects if we want high-quality and secure software.

The partnership of OpenInfra Foundation and Bitergia

To make the following discussion tangible and more interesting, I will use the OpenInfra Foundation as an example. I chose the OpenInfra Foundation because Bitergia is their Official Metrics Partner, which gives me relevant experience and insights. We shared these examples at the OpenInfra Summit 2022 in Berlin (2), which is why I was invited to write this article. Following, we will explore what it takes to have a healthy open source project and what metrics we can use to understand project health.

For Context, allow me to introduce Bitergia and the OpenInfra Foundation. Bitergia (3) is an open source company specializing in Software Development Analytics. 10 years ago, the founders commercialized their research into how to analyze software development and released their tooling as open source. The tools are called GrimoireLab, a project within the CHAOSS Community. (4) CHAOSS is short for Community Health Analytics for Open Source Software and a community with the Linux Foundation. It is a community of practice for open source professionals, researchers, and anyone interested in the topic. To be upfront, I am a co-founder and board member of CHAOSS. I will mention CHAOSS again later because it is the place to find resources, guidance, and tools for analyzing open source project health.

The OpenInfra Foundation (5) emerged from the now 12-year-old OpenStack project. OpenStack is a cloud platform developed by over 450 companies and today runs on 40+ million cores in over 300 data centers. That is more data centers than the largest proprietary cloud provider, AWS, which has about 125 data centers. The OpenInfra Foundation hosts other projects too, including Kata Containers, StarlingX, and Zuul. It has two remarkable things in its approach to stewarding open source projects. First, the projects are provided with a software forge consisting of only open source tools (see opendev.org 6) that projects can choose to use and avoid proprietary services. Second, OpenInfra believes in a philosophy of open source with The Four Opens as guiding principles:

  • Open Source
  • Open Design
  • Open Development
  • Open Community

Bitergia and the OpenInfra Foundation share a passion for open source and therefore partnered up to build healthy projects. After a brief definition of Project Health, we will look at examples.

Defining Open Source Project Health

Open Source Project Health is a project’s potential to continue producing quality software. We may also refer to it as the health of the open source community or the project’s sustainability. I researched this for my Ph.D. and confirmed what other researchers also found: It takes three things to have a healthy project: Quality Code, Sufficient Resources, and an Active Community.

Quality Code is the desired outcome of software development. It includes best practices for source code to be well structured, human-readable, sufficiently documented, and free of bugs. Given the complexity of modern software, these may not be fully achieved, but we can have processes and policies in place to help projects get closer to this ideal. For example, OpenInfra strongly recommends practices such as extensive code review, automated testing, and source code linting, just to name a few.

Sufficient Resources will be different for each open source project. Some projects may only need a source code repository and an issue tracker, which are provided for free through services like GitHub, GitLab, or Gitee. A project that produces software for Mainframes or the public power grid will need special hardware to test the software. In the example of Openlnfra, projects are provided with a suite of open source tools that enable collaboration. (6) The foundation works with its members for additional resources if the projects need them. For example, infrastructure donors (7) are companies running OpenStack clouds, donating cloud resources to the OpenStack project infrastructure. Those resources are mostly used in the automated testing framework to support OpenStack development efforts.

Active Community is the result of people working together on the software. A project without activity is not updated, not adapted to the changing environments, and not helping users to resolve issues. Ideally, an active community has many types of contributions, including code and non-code contributions. It also matters who the contributors are.

This is the topic for the remainder of the article, exploring four aspects: Contributions, People, Organizations, and Inclusion. I will look at each aspect and the metrics available to understand these.

How to measure community activity

Contributions are the building blocks of work in an open source project. Code contributions advance the source code and are logged in a version control system such as git. The change history makes it easy to count and measure code contributions which has historically led to a bias to recognize this type of contribution over others. Other types of contributions include code reviews, which are essential to maintaining the source code’s quality and educating other contributors. Beyond these, bug reporting and triaging are contributions logged in an issue tracker or similar system. Many projects also have communication channels, such as mailing lists or chat platforms like Slack, which have a communication archive. In contrast, some contributions don’t get tracked at all but are essential to the health of a project. For example, organizing events, giving talks about the technology at a conference, socializing with other members, or managing finances. The OpenInfra Foundation is the fiscal host of its projects, manages background activities, organizes events, and ensures the environment is right. All activity levels across logged contribution types are captured and visualized in a Bitergia Analytics Platform. This dashboard shows trends of engagement and helps to understand how projects are doing over time.

People make contributions and self-select to be members of the community. A concern for healthy projects is that the knowledge required to maintain and advance the software is spread across multiple people. As a negative example, a project with only one contributor is entirely dependent on that person, and they may stop maintaining the project at any time. In a bad case, that person can break the software on purpose, and there is no check from others (yes, this has happened). The Bus Factor is a key metric that shows how dependent a project is on one, a few, or many people. The OpenInfra Foundation keeps track of this metric through the Bitergia Analytics Platform and, if needed, works with the project to highlight areas that need additional people and helps onboard new contributors. If a project does not have enough interest, it may also be officially discontinued, which is an honest and fair move that signals to users that they need to find an alternative.

Organizations play an essential part in the health of open source projects as users, employers, and sponsors. Organizations benefit from open source software with reduced time to market, lower costs to build software, and lower license fees. With this vested interest in open source software, organizations allow their developers to contribute back, maintain, and advance the software. This also gives organizations influence over the project’s direction and helps align the internal roadmap with the work in the open source project. The metaphorical elephant in the room is that when an organization de-prioritizes a project, that project will be in jeopardy if that one organization employs all project members. Bitergia created the Elephant Factor, which works the same as the Bus Factor, but instead of people, it looks for organizations. With a focus on getting more organizations involved in projects, OpenInfra has diversified the organizations contributing to the Kata Containers project (8, fig. 1).

Inclusion refers to a project’s approach to enabling and empowering contributors irrespective of their background or identity. This is a core concern of the Open Community principle. At a broader scope, inclusion is a major concern for open source because surveys show that a majority of open source contributors are white men. Investigations have shown that a toxic culture in projects is a frequent reason that actively excludes minorities or makes for a very unwelcoming environment. Unfortunately, the homogeneous contributor base also self-perpetuates through the unconscious bias in how software is written, workflows operate, and documentation is structured. It takes a concerted effort to identify the issues and to create a welcoming environment. The Openlnfra Foundation has a dedicated Diversity Working Group that works with the Foundation’s Board of Directors to incorporate diversity policies and create programs that reduce barriers and create an inclusive and welcoming culture. Every few years, the Foundation surveys its community to measure progress and identify areas of improvement.

Advice and Resources

We are at the end of the article. We discussed open source project health and saw examples from the OpenInfra Foundation of how that can be measured and promoted. I want to leave you with some advice and resources.

First, if you are thinking about measuring your open source project’s health but are concerned with the complexity, do not fret; start with the easy things to measure and answer some basic questions. Those answers will spawn new questions, and you can build on your experience to progress your metrics journey.

Second, you are in good company. Check out the CHAOSS Project to find resources, metric definitions, open source software for dashboards, and a community of practice.

Third, remember that the metrics are in service of the community. Be open and honest with the community when introducing metrics to stave off concerns early. For example, Openlnfra and Bitergia met with the community to review metrics. At the OpenInfra Summit 2022, we organized a Metrics Corner to showcase the metrics dashboards and discuss them with the project’s contributors. This was a huge success and we look forward to hosting the next Metric Corner at the OpenInfra Summit 2023 in Vancouver. Join us June 13-15! (9)

Fourth, collect metrics early to establish a baseline. The baseline is important to see changes in your community and to see if policy changes have the desired effect. A word of caution, do not try to benchmark against other, even similar, communities because each community has different ways of working, is in a different context, uses different tools, or has other reasons to produce metrics that are hard to compare. Each community is unique, and understanding its project health is a journey that can start with metrics but for sure requires a conversation with community members that can validate what is actually happening in the community.

Sources:
1. https://en.wikipedia.org/wiki/Heartbleed
2. https://www.openstack.org/videos/summits/berlin-2022/OpenInfra-Community-Dashboards-Overview-of-the-OpenInfra-and-Bitergia-Partnership
3. https://bitergia.com/
4. https://chaoss.community/ and https://chaoss.github.io/grimoirelab/
5. https://openinfra.dev/
6. https://opendev.org/
7. https://openinfra.dev/members/#infrastructure
8. https://katacontainers.io/
9. https://openinfra.dev/summit/vancouver-2023

Georg J.P. Link
georglink@bitergia.com

Georg is an Open Source Strategist. Georg’s mission is to make open source more professional in its use of community metrics and analytics. Georg co-founded the Linux Foundation CHAOSS Project to advance analytics and metrics for open source project health. Georg has an MBA and a Ph.D. in Information Technology. As the Director of Sales at Bitergia, Georg helps organizations and communities with adopting metrics and making open source more sustainable. In his spare time, Georg enjoys playing board games, Anno 1800, reading fiction, and hot-air ballooning.