Glossary¶

A¶

Acceptance Testing: A level of the software testing process where a system is tested for acceptability. The purpose of this test is to evaluate the system’s compliance with the project requirements and assess whether it is acceptable for the purpose.
Add: Command used to add files to the staging area. Allows the user to specify which files or directories to include in the next commit.
Authors: Authors in this context are the contributors to The Turing Way project who have made a substantial contribution to the project such as writing a subchapter, facilitating community interactions, maintaining project’s infrastructure and supporting the participation of others through mentored-contributions. All authors are named co-authors on the book as a whole.

B¶

Binder: A web-based service which allows users to upload and share fully-functioning versions of their projects in an environment they define.
Binderhub: A service which generates Binders. The most widely-used is mybinder.org, which is maintained by the Binder team. It is possible to create other BinderHubs which can support more specialised configurations. One such configuration could include authentication to enable private repositories to be shared amongst close collaborators.
Binderize: To make a Binder of a project.
Branch: A parallel version of a repository. Although it is contained within the same repository it allows you to develop it separately and then merge changes back into the ‘live’ repository or with other branches when appropriate.
Bug: This is an error, flaw or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.
Build: A group of jobs. For example, a build might have two jobs, each of which tests a project with a different version of a programming language. A build finishes when all of its jobs are finished.

C¶

Checkout: Git command to switch to a specific file, branch, or commit. Allows you to activate older versions of files or commits or switch between active branches.
Citizen Science: The inclusion of members of the public in scientific research.
Clone: Copy of an existing Git repository, normally from some remote location to your local environment. When you clone a repo you copy its entire history as well as all branches.
Code Coverage: A measure which describes how much of the source code is exercised by the test suite.
Code of Conduct: Guidelines that establish the kind of behaviour encouraged in the community, outline the process by which problems or violations of the guidelines will be addressed and who will be in charge of enforcing them.
Code Review: An additional way of testing code quality. Code review gets another programmer to look over the new code and assess it. The goal is to point out strengths and also potential areas of improvement.
Commit: Snapshot of project history. A commit can be made after changes of a single file or a range of files and directories.
Commit Message: A message the user can attach to a commit to explain what it contains.
Communication Channel: The method of communication established for projects that might include mailing lists, community forums, chats and/or social media.
Community Member: People who use the project. They might be active in conversations or express their opinion on the project’s direction.
Computational Environment: Features of a computer which can impact the behaviour of work done on it, such as its operating system, what software it has installed, and what versions of software packages are installed.
Conda: A commonly used package management system.
Container: Lightweight files that can encapsulate an entire computational environment including its operating system, customised settings, software and files.
Continuous Delivery: It automates and runs the steps required to build and test a project.
Continuous Deployment: It automatically deploys each time a code change is made.
Continuous Integration: It is the practice of integrating changes to a project made by individuals into a main, shared version frequently (usually multiple times per day). Also called CI.
Contributing Guidelines: Guidelines outlining how a person should go about contributing to an open source project.
Contributors: Everyone who has contributed something back to the project.

D¶

Data repository: See repository.
DMP: Data management plan.
Docker Container: An active computational environment executed from a Docker image.
Dockerfile: A file used for creating Docker images
Docker Image: A machine-readable set of instructions to create a specified computational environment.
Docker Registry: A storage and distribution system for named Docker images. The registry allows Docker users to pull images locally, as well as push new images to the registry (given adequate access permissions when applicable). Such systems are often hosted in the cloud for ease of access.
Digital Object Identifier: A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. However, they also have been used to identify other types of information resources, such as commercial videos.

E¶

Equitable, Diverse and Inclusive Practices: Ensuring scholarship is open to anyone without barriers based on factors such as race, background, gender, and sexual orientation.
End to End Test: A test that runs the program from beginning to end and verifies that the output is correct.

F¶

FAIR: Findable, Accessible, Interoperable and Reusable.

G¶

Generalisable: Combining replicable and robust findings allow us to form generalisable results. Note that running an analysis on a different software implementation and with a different dataset does not provide generalised results. There will be many more steps to know how well the work applies to all the different aspects of the research question. Generalisation is an important step towards understanding that the result is not dependent on a particular dataset nor a particular version of the analysis pipeline.
Git: Version control system that GitHub is built around. It is a widely used open source distributed version control system developed by the author of Linux.
Github: An online code hosting and version control service. It has a great many features to aid collaboration between users, and hosts a large number of open source projects.
GitLab: GitLab is a web-based DevOps lifecycle tool that provides a Git-repository manager providing wiki, issue-tracking and continuous integration and deployment pipeline features, using an open-source license, developed by GitLab Inc.

H¶

Head: The latest commit on the branch which is currently checked out.
Helm: A package manager for Kubernetes applications.
Human Readable: A human readable medium or human readable format is any encoding of data or information that can be naturally read by humans. Some human readable formats, such as PDF, are not machine readable as they are not structured data, such as the representation of the data on disk does not represent the actual relationships present in the data.

I¶

Image: Files used for generating containers.
Integration Testing: A level of software testing where individual units are combined and tested as a group. The purpose of this level of testing is to expose faults in the interaction between integrated units.
Issues: Bug tracking system for GitHub. Collaborators can use issues to report bugs, request features, or set milestones for projects. Issues are tracked, reported, and closed by collaborators during the development process. They’re a great way of communicating with your team and reporting progress.
Issue Tracking: The process of tracking current issues on the project, such as bug fixing, rolling out new features or community engagement plans.

J¶

Job: An automated process that clones your repository into a virtual environment and then carries out a series of phases such as compiling your code and running tests. A job fails if the return code of the script encounters an error.
JupyterHub: A multi-user server for Jupyter Notebook instances.

K¶

Kubernetes: Autonomous computational cluster manager.

L¶

License: This is a legal document that sets out the permissions for creative and academic work. It explains copyright, ensures proper attribution and sets out how others can copy, distribute and make use of the works.

M¶

Machine Readable: Machine readable refers to documents, data or other digital outputs whose content can be readily processed by computers. Such documents are distinguished from machine readable data by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created. Machine readable data can be defined as data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost.
Main: The repository’s main branch. Depending on the workflow, it is the one people work on or the one where the integration happens. This used to be called ‘Master’ in Github.
Maintainers: Contributors who are responsible for driving the vision and managing the organizational aspects of the project. They may also be authors and/or owners of the project.
Makefile: A text file that contains the configuration for the build.
Merge: The process of combining branches. Changes made on one or more branches are applied to another.
Merge Conflict: Incompatibilities between branches being merged.
Metadata: Data used to describe other data. For example (35, 33, 27, 30, 33) is data but the units (miles per hour) and the fact these are the speeds of cars on a certain stretch of road is metadata.
Mock Test: Replace a real object with a pretend one to use when running tests.

O¶

Open Access: Making all published outputs freely accessible for maximum use and impact.
Open Access publishing (gratis): The practice of making research publications available to anyone to read without charge.
Open Access Publishing (libre): Libre open access is gratis, meaning the research is available free of charge, but it goes further by granting users the right to copy, reuse, and remix the publication.
Open data: Documenting and sharing research data openly for re-use.
Open Educational Resources: Making educational resources publicly available to be re-used and modified.
Open Source Hardware: Documenting designs, materials, and other relevant information related to hardware, and making them freely accessible and available.
Open License: A license is a document that specifies what can and cannot be done with a work. It grants permissions and states restrictions. Broadly speaking, an open license is one that grants permission to access, re-use and redistribute a work with few or no restrictions.
Open Notebooks: An emerging practice, documenting and sharing the experimental process of trial and error.
Open Scholarship: This is a concept that extends open research further. It relates to making other aspects of scientific research open to the public such as open educational resources, having inclusive practice and citizen science.
Open Project: Same as Open Science or Open Research Projects. A project in which a significant amount of collaboration between the core or leadership team and the wider community takes place in the form of online interactions. Community interactions should maintain transparency and openness of the project to facilitate the growth of your community.
Open Source Software: Documenting research code and routines, and making them freely accessible and available.
ORCID: Open Researchers and Contibutor iD. It is a long lasting unique identifier for you as a researcher.
Owner: The person/s who has administrative ownership over the organization or repository (not always the same as the original author).

P¶

Package Management System: A tool for installing, managing, and uninstalling software packages including specific versions.
Persistent Identifier: A long-lived method for identifying a resource that is unique, and widely understandable by a community.
Pattern: A pattern rule is a rule that contains exactly one % character in the target, which can be used to match a part of a filename.
Persona: A persona is the detail of an imaginary user or member, based on real-world observations and understandings of existing members or potential future members.
Persona Canvas: The persona canvas can be used to assemble all your responses in one place, share this tangible information of your mental model (abstract concepts from our thoughts) with your colleagues and create a common language to communicate about your community members, users, and contributors.
Phony Target: A phony target is one that doesn’t correspond to a file on the filesystem. A target is marked as phony by making it a prerequisite of the .PHONY target.
Power Users: These are people who are already familiar enough with a platform to know the gotchas and tricks that make their experience more efficient.
Prerequisite: The prerequisite(s) of a rule correspond to files or other targets in the Makefile that must be up to date before the rule is run.
Project Design: An early phase of the project where a project’s key features, structure, criteria for success, and major deliverables are all planned out.
Pull Request: Proposed changes to a remote repository. Collaborators without write access can send a pull request to the administrator with the changes they’ve made to the repo. The administrator can then approve and merge or reject the changes to the main repository. For open source projects pull requests can be sent by anyone that has forked a project.
Push: Sending changes to a remote repo. The remote repository is updated with the changes pushed and now mirrors the local repo.

R¶

RDM: Abbreviation for research data management - see research data management for definition.
README: A file which contains useful information about a project such as what it is, how to use/install it, how to test it, and how to contribute to it.
Recipe: One or more shell commands that are executed by Make. Usually these commands update the target of the rule.
Regression Test: Comparing the result of a test before and after the code has been altered. If the output has changed a problem has been introduced somewhere in the program, and an error is thrown.
Replicable: A result is replicable when the same analysis performed on different datasets produces qualitatively similar answers.
repo2docker: A tool to build Docker images from code repositories.
Repository: Same as Data or Code Reprository. A long-lived place on the internet where resources (be they data, software, publications or anything else) can be stored and accessed. This keyword is often shortened to ‘repo’.
Reproducible: A result is reproducible when the same analysis steps performed on the same dataset consistently produces the same answer.
Rendered Output: This is what the text will look like on an online page in Github or web page
Research Compendia: This is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, metadata). The collection is created in such a way that reproducing all results is straight forward.
Research Data Management: Acronym: RDM. Refers to the organisation, storage and preservation of data created during a research project. It covers initial planning, day-to-day processes and long-term archiving and sharing. Shortened to RDM.
Research Ethics: Research ethics are the moral principles that govern how researchers should carry out their work. These principles are used to shape research regulations agreed by groups such as university governing bodies, communities or governments. All researchers should follow any regulations that apply to their work.
Review: Suggesting changes or asking for committing something to an already created pull request.
Risk Assessment: This is used to help choose the appropriate sustainable software concepts for your project.
Risk Matrix: A risk matrix is a way of quantifying what’s going on with the thing you’re interested in. One axis measures exposure in some way, and the other the impact of a mishap. The further from the origin, the more safeguards are needed to make the risk acceptable.
Roadmapping: This is the creation of a roadmap for your project. It is an outline for the work you need to do. It covers your goals, vision and a timeline for tasks.
Robust: A result is robust when the same dataset is subjected to different analysis workflows to answer the same research question (for example one pipeline written in R and another written in Python) and a qualitatively similar or identical answer is produced. Robust results show that the work is not dependent on the specificities of the programming language chosen to perform the analysis.
Rule: An element of the Makefile that defines something that must be built, usually consists of targets, recipes, and optionally, prerequisites.
Runtime Test: Tests embedded within the program which are run as part of it.

S¶

Self Archiving: Placing a publication or other research outputs in a suitable repository, institutional or subject-based, following the possible restrictions posed by the publisher, for example an embargo period, or limits on the allowed version to be deposited in such archives.
SHA: Unique string of numbers of letters used to identify every commit or node in the repository.
Smoke Testing: Very brief initial checks that ensures the basic requirements required to run the project hold. If these fail there is no point proceeding to additional levels of testing until they are fixed.
Staged: Staging the changes that will be included in the next git commit.
Stochastic Code: Code which, while correct, does not always output the same result. For example a program that outputs ten random numbers will generate a different result each time, despite being correct.
Syntax: The structure of statements in a computer language.
System Testing: A level of the software testing process where a complete, integrated system is tested. The purpose of this test is to evaluate whether the system as a whole gives the correct outputs for given inputs. Also see end to end test.

T¶

Target: The outcome of a rule in a Makefile. It is usually a file. If it is not a file, it’s a phony target.
Test Driven Development: A process of code development where unit tests are written before the units themselves.
Test Stub: Fake implementations of parts of code which are used in testing to remove dependencies.
Test Suite: The tests that have been written for a project.
Testing Framework: Tools that make writing and running tests less labour intensive.
Travis: A commonly used continuous integration platform.

U¶

Unit: A small piece of code that does one simple thing. It usually has one or a few inputs and usually a single output.
Unit Testing: A level of the software testing process where individual units of a software are tested. The purpose is to validate that each unit of the software performs as designed.

V¶

Virtual Machine: A simulated computer that can encapsulate and entire computational environment including its operating system, customised settings, software and files.

W¶

X¶

Y¶

YAML: A human readable/writable markup language which used by many projects for configuration files.

The Turing Way

Glossary¶

A¶

B¶

C¶

D¶

E¶

F¶

G¶

H¶

I¶

J¶

K¶

L¶

M¶

O¶

P¶

R¶

S¶

T¶

U¶

V¶

W¶

X¶

Y¶

Z¶