Tools for reproducible science | astrobites

In some of our daily recap bites, you can see “free access” written in parentheses next to the name of the journal. In other bites it is written “closed accessThat’s because some journals, like Nature or Science, don’t have free access to their articles. They force users to pay if they want to read the latest science news. Fortunately, we’re on the good way to overcome closed access issues due to arxiv.org, a website where you can find articles for free, but only if the authors have made their articles available there. Fortunately, most astronomers do this and maximize accessibility to their work.

Science can only be improved if it is open and if the results of the articles are transparent. In addition to easily accessible written articles, the ideal world of open science would also allow you to easily reproduce the results of any article. This could be done more easily if the authors of the articles included the codes they used for the analysis or made plots on an accessible platform. Some scientists refuse to do this on principle, because it often requires a lot of extra work – which most scientists are tired of. In the 2016 Nature Scientific Reports article, 1,500 scientists were surveyed about the reproducibility of research results. A third of respondents never even thought about creating techniques to check the reproducibility of data, and only 40% indicated that they used such techniques regularly. Also, up to 70% of researchers encountered non-reproducible experiments and results obtained not only by other groups of scientists, but also by the authors/co-authors of the published scientific papers! Today’s emphasis on quantity over quality encourages the production of articles with high-profile titles and no less inflated search results, which could lead to further distortions.

Let’s break down some key points of the importance of reproducible and open science:

  • Leads to collaborations and therefore improvements: science is done in collaboration! That’s why we organize conferences – we want to hear what others are working on, exchange ideas and finally collaborate with relevant people to publish more interesting results. If people can reproduce, develop and maintain your work, collaborations become more effective!
  • Builds trust in the community: in astronomy, we build different models and tools to analyze data and build theories. Right now it’s hard to keep up with all the tools available, which is exciting! But, it can be difficult to know which tool to choose – transparency in the field creates trust between colleagues!
  • Creates interesting debates/discussions: other astronomers might arrive at different results with different models or tools – this is where the interpretation of physics becomes very important. Controversial results are also exciting! We can certainly have much more confidence in a result if two groups arrive at the same result using different methods. If the results are different however, there is more work to be done. (for example, a group may have made a mistake, or the methods are biased, or there may be more than one answer).
  • Helps you work more efficiently: recently I met a blog post on the website of Professor Lorena Barba’s group. They share an anecdote of how it took them much less time to reproduce their own results for a new article having their reproducible results in the first place.

In this bite, I want to talk about some awesome tools that can help you start making your own science reproducible!

GitHub – a very popular website for collaborative coding. There you can create public and private repositories (eg usually people make repositories with finished codes public). The website’s other powerful tool is its version control system – it allows you to see changes in your code and seamless collaboration without touching the original code. An example GitHub profile is this profile of a professor Michael Zingale who is very active in open science (permission has been given).

Zenodo – another popular tool for storing your papers, data files, research software and other research-related artifacts. It’s free to download and free to access, and this universal repository makes your work citable and shareable.

show your work! – a workflow created by Dr. Rodrigo Luger. It uses another impressive workflow for repeatable data analysis, make snakes. The philosophy behind the workflow is that “anyone should be able to regenerate the PDF of the article from scratch with the click of a button”. show your work! is integrated with GitHub, Zenodo, and Overleaf, and it can save you a lot of time answering questions about how you got your results, because you can simply share the GitHub repository with your paper that the workflow creates for you !

Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics – a workflow created by Professor Lorena Barba’s group. It can store your computer studies in a public cloud called “Microsoft Azure”. The main advantage of the workflow is its speed: “public cloud resources today are able to provide performance similar to that of a university-run cluster, and therefore can be considered a suitable solution for computing of research.” (quoted from the paper)

Professor Lorena Barba’s research group cares a lot about reproducibility and writes about it on their Blog. I recommend checking it out!

Making science reproducible can seem like a lot of work – and it is, but only at the start! In the long run, it actually saves a lot of time, as mentioned above. Fortunately, the community of people who care about open source software is happy to help. For example, the Flatiron Institute organized a Astronomical software development workshop, where people shared their thoughts on Open Science and how to continue to build and maintain the community. There are also more upcoming workshops (stay tuned!).

Finally, if you take a look at the figures below from the Nature article mentioned above, fortunately, you will see that, based on the evolution of the scientific community, most of the factors that contribute to non-reproducible research can easily be eradicated (e.g. code/paper availability)!

Author’s note: I was not involved in the development of any of the workflows mentioned above. The people mentioned in the bite are one of the gems of the open science community and are mentioned solely for their work on reproducible science.

Astrobite edited by Jana Steuer

Featured image credit: Stanford Medicine

About Sabina Sagynbayeva

I am a graduate student at Stony Brook University and my main area of ​​research is planet formation. I am currently working on planetary migration using hydrodynamic simulations. I’m also interested in protoplanetary disks, but almost any subject related to planets fascinates me! In addition to doing research, I am also a singer-songwriter. I LOVE writing songs, and you can find them on any streaming platform.