Episode 236

Eva Maxfield Brown & Boris Veytsman on OSS Dependencies in the Sciences

00:00:00
/
00:39:58

June 7th, 2024

39 mins 58 secs

Your Host
Special Guests

About this Episode

Guests

Eva Maxfield Brown | Boris Veytsman

Panelist

Richard Littauer

Show Notes

In this episode of Sustain, host Richard Littauer engages with guests Eva Maxfield Brown and Boris Veytsman to explore their co-authored paper, "Biomedical Open Source Software: Crucial Packages and Hidden Heroes." The paper focuses on identifying crucial but often overlooked software dependencies in biomedical research. The discussions delve into how the study used data from two million papers to map these dependencies, revealing both well-supported and undermaintained software components vital to scientific research. There’s a conversation on the methodological challenges and the concept of "Nebraska packages," which are essential yet potentially undermaintained elements crucial to the software stack used in both industry and science. The conversation also covers broader implications for software sustainability, security, and future research directions, including improving how software contributions are tracked and recognized within scientific careers. Press download now to hear more!

[00:01:47] Richard dives into the paper co-authored by Eva and Boris. Boris explains the origins of the paper, starting from a workshop at CZI aimed at accelerating science through sustainable software, leading to the analysis of software used in biomedical research. He highlights the focus on identifying crucial yet often unmentioned software dependencies in research software, which he labels as “unsung heroes.”

[00:05:22] Boris provides findings from their study, noting that while many foundational packages were cited, there are significant packages that, despite their critical role, remain uncited.

[00:06:43] Eva discusses the concept of “Nebraska packages,” which are essential yet potentially undermaintained components that are crucial to the software stack used in both industry and science. Also, she elaborates on the methodological challenges of determining which packages to include in their analysis, particularly in terms of dependencies that vary between different users and contexts.

[00:09:42] Richard reflects on the broader implications of their discussion for the open source community, particularly in terms of software sustainability and security. Eva emphasizes the importance of security across all fields and discusses the potential impact of software bugs on scientific research and the need for robust software infrastructure.

[00:12:04] Boris comments on the necessity of well-tested tools in the scientific community, given that many scientists may lack a strong background in software development and training.

[00:13:47] Richard quotes from the paper discussing the absence of cycles in the network of software packages used in science, indicating a more robust design compared to general software. He questions this in light of earlier comments about scientists not being great at coding.

[00:14:08] Eva explains that the paper’s findings about acyclic dependencies (DAGs) might seem surprising given the common perception that scientific software is poorly developed. She notes that while scientists may not be trained in proper software packaging, the Python environment helps prevent cyclic dependencies.

[00:17:31] Richard brings up “Katz centrality” which is discussed in the paper, and Boris clarifies that “Katz centrality” refers to a concept by Leo Katz on network centrality, explaining how it helps determine the importance of nodes within a network.

[00:20:13] Richard questions the practical applications of the research findings, probing for advice on supporting crucial but underrecognized dependencies within software ecosystems. Eva addresses future research directions, including improving ecosystem matching algorithms for better accuracy in linking software mentions to the correct ecosystems.

[00:22:50] Eva suggests expanding the research to cover more domains beyond biomedicine, considering different software needs across various scientific disciplines. Boris discusses the potential for targeted interventions to support underrecognized contributors in the scientific software community aiming to enhance their prestige.

[00:27:22] Richard asks how the research team plans to map dependencies to individual contributors and track their motivations. Boris responds that while they have gathered substantial data from sources like GitHub logs, publishing this information poses ethical challenges due to privacy concerns.

[00:28:45] Eva discusses her work on linking GitHub profiles to academic authors using ORCID identifiers to better track contributions to scientific software.

[00:31:42] Richard brings up the broader impacts of their research, questioning whether their study on software packages centrality within the scientific community is unique or if there are similar studies at this scale. Eva acknowledges the need for more comprehensive studies and cites a previous study from 2015 that analyzed developer networks on GitHub. Boris adds that while there is extensive literature on scientific citation networks, the study of dependencies is less explored.

[00:34:38] Find out where you can follow Boris and Eva’s work and social medias online.

Spotlight

  • [00:37:06] Richard’s spotlight is Deirdre Madeleine Smith.
  • [00:37:29] Eva’s spotlight is Talley Lambert.
  • [00:38:02] Boris’s spotlight is the CZI Collaborators.

Links

Credits

Support Sustain