Glossary¶
A¶
- Acceptance Testing
A level of the software testing process where a system is tested for acceptability. The purpose of this test is to evaluate the system’s compliance with the project requirements and assess whether it is acceptable for the purpose.
- Add
Command used to add files to the staging area. Allows the user to specify which files or directories to include in the next commit.
- Authors
Authors in this context are the contributors to The Turing Way project who have made a substantial contribution to the project such as writing a subchapter, facilitating community interactions, maintaining project’s infrastructure and supporting the participation of others through mentored-contributions. All authors are named co-authors on the book as a whole.
B¶
- Binder
A web-based service which allows users to upload and share fully-functioning versions of their projects in an environment they define.
- Binderhub
A service which generates Binders. The most widely-used is mybinder.org, which is maintained by the Binder team. It is possible to create other BinderHubs which can support more specialised configurations. One such configuration could include authentication to enable private repositories to be shared amongst close collaborators.
- Binderize
To make a Binder of a project.
- Branch
A parallel version of a repository. Although it is contained within the same repository it allows you to develop it separately and then merge changes back into the ‘live’ repository or with other branches when appropriate.
- Bug
This is an error, flaw or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.
- Build
A group of jobs. For example, a build might have two jobs, each of which tests a project with a different version of a programming language. A build finishes when all of its jobs are finished.
C¶
- Checkout
Git command to switch to a specific file, branch, or commit. Allows you to activate older versions of files or commits or switch between active branches.
- Citizen Science
The inclusion of members of the public in scientific research.
- Clone
Copy of an existing Git repository, normally from some remote location to your local environment. When you clone a repo you copy its entire history as well as all branches.
- Code Coverage
A measure which describes how much of the source code is exercised by the test suite.
- Code of Conduct
Guidelines that establish the kind of behaviour encouraged in the community, outline the process by which problems or violations of the guidelines will be addressed and who will be in charge of enforcing them.
- Code Review
An additional way of testing code quality. Code review gets another programmer to look over the new code and assess it. The goal is to point out strengths and also potential areas of improvement.
- Commit
Snapshot of project history. A commit can be made after changes of a single file or a range of files and directories.
- Commit Message
A message the user can attach to a commit to explain what it contains.
- Communication Channel
The method of communication established for projects that might include mailing lists, community forums, chats and/or social media.
- Community Member
People who use the project. They might be active in conversations or express their opinion on the project’s direction.
- Computational Environment
Features of a computer which can impact the behaviour of work done on it, such as its operating system, what software it has installed, and what versions of software packages are installed.
- Conda
A commonly used package management system.
- Container
Lightweight files that can encapsulate an entire computational environment including its operating system, customised settings, software and files.
- Continuous Delivery
It automates and runs the steps required to build and test a project.
- Continuous Deployment
It automatically deploys each time a code change is made.
- Continuous Integration
It is the practice of integrating changes to a project made by individuals into a main, shared version frequently (usually multiple times per day). Also called CI.
- Contributing Guidelines
Guidelines outlining how a person should go about contributing to an open source project.
- Contributors
Everyone who has contributed something back to the project.
D¶
- Data repository
See repository.
- DMP
Data management plan.
- Docker Container
An active computational environment executed from a Docker image.
- Dockerfile
A file used for creating Docker images
- Docker Image
A machine-readable set of instructions to create a specified computational environment.
- Docker Registry
A storage and distribution system for named Docker images. The registry allows Docker users to pull images locally, as well as push new images to the registry (given adequate access permissions when applicable). Such systems are often hosted in the cloud for ease of access.
- Digital Object Identifier
A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. However, they also have been used to identify other types of information resources, such as commercial videos.
E¶
- Equitable, Diverse and Inclusive Practices
Ensuring scholarship is open to anyone without barriers based on factors such as race, background, gender, and sexual orientation.
- End to End Test
A test that runs the program from beginning to end and verifies that the output is correct.
F¶
- FAIR
Findable, Accessible, Interoperable and Reusable.
G¶
- Generalisable
Combining replicable and robust findings allow us to form generalisable results. Note that running an analysis on a different software implementation and with a different dataset does not provide generalised results. There will be many more steps to know how well the work applies to all the different aspects of the research question. Generalisation is an important step towards understanding that the result is not dependent on a particular dataset nor a particular version of the analysis pipeline.
- Git
Version control system that GitHub is built around. It is a widely used open source distributed version control system developed by the author of Linux.
- Github
An online code hosting and version control service. It has a great many features to aid collaboration between users, and hosts a large number of open source projects.
- GitLab
GitLab is a web-based DevOps lifecycle tool that provides a Git-repository manager providing wiki, issue-tracking and continuous integration and deployment pipeline features, using an open-source license, developed by GitLab Inc.
H¶
- Head
The latest commit on the branch which is currently checked out.
- Helm
A package manager for Kubernetes applications.
- Human Readable
A human readable medium or human readable format is any encoding of data or information that can be naturally read by humans. Some human readable formats, such as PDF, are not machine readable as they are not structured data, such as the representation of the data on disk does not represent the actual relationships present in the data.
I¶
- Image
Files used for generating containers.
- Integration Testing
A level of software testing where individual units are combined and tested as a group. The purpose of this level of testing is to expose faults in the interaction between integrated units.
- Issues
Bug tracking system for GitHub. Collaborators can use issues to report bugs, request features, or set milestones for projects. Issues are tracked, reported, and closed by collaborators during the development process. They’re a great way of communicating with your team and reporting progress.
- Issue Tracking
The process of tracking current issues on the project, such as bug fixing, rolling out new features or community engagement plans.
J¶
- Job
An automated process that clones your repository into a virtual environment and then carries out a series of phases such as compiling your code and running tests. A job fails if the return code of the script encounters an error.
- JupyterHub
A multi-user server for Jupyter Notebook instances.
K¶
- Kubernetes
Autonomous computational cluster manager.
L¶
- License
This is a legal document that sets out the permissions for creative and academic work. It explains copyright, ensures proper attribution and sets out how others can copy, distribute and make use of the works.
M¶
- Machine Readable
Machine readable refers to documents, data or other digital outputs whose content can be readily processed by computers. Such documents are distinguished from machine readable data by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created. Machine readable data can be defined as data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost.
- Main
The repository’s main branch. Depending on the workflow, it is the one people work on or the one where the integration happens. This used to be called ‘Master’ in Github.
- Maintainers
Contributors who are responsible for driving the vision and managing the organizational aspects of the project. They may also be authors and/or owners of the project.
- Makefile
A text file that contains the configuration for the build.
- Merge
The process of combining branches. Changes made on one or more branches are applied to another.
- Merge Conflict
Incompatibilities between branches being merged.
- Metadata
Data used to describe other data. For example (35, 33, 27, 30, 33) is data but the units (miles per hour) and the fact these are the speeds of cars on a certain stretch of road is metadata.
- Mock Test
Replace a real object with a pretend one to use when running tests.
O¶
- Open Access
Making all published outputs freely accessible for maximum use and impact.
- Open Access publishing (gratis)
The practice of making research publications available to anyone to read without charge.
- Open Access Publishing (libre)
Libre open access is gratis, meaning the research is available free of charge, but it goes further by granting users the right to copy, reuse, and remix the publication.
- Open data
Documenting and sharing research data openly for re-use.
- Open Educational Resources
Making educational resources publicly available to be re-used and modified.
- Open Source Hardware
Documenting designs, materials, and other relevant information related to hardware, and making them freely accessible and available.
- Open License
A license is a document that specifies what can and cannot be done with a work. It grants permissions and states restrictions. Broadly speaking, an open license is one that grants permission to access, re-use and redistribute a work with few or no restrictions.
- Open Notebooks
An emerging practice, documenting and sharing the experimental process of trial and error.
- Open Scholarship
This is a concept that extends open research further. It relates to making other aspects of scientific research open to the public such as open educational resources, having inclusive practice and citizen science.
- Open Project
Same as Open Science or Open Research Projects. A project in which a significant amount of collaboration between the core or leadership team and the wider community takes place in the form of online interactions. Community interactions should maintain transparency and openness of the project to facilitate the growth of your community.
- Open Source Software
Documenting research code and routines, and making them freely accessible and available.
- ORCID
Open Researchers and Contibutor iD. It is a long lasting unique identifier for you as a researcher.
- Owner
The person/s who has administrative ownership over the organization or repository (not always the same as the original author).
P¶
- Package Management System
A tool for installing, managing, and uninstalling software packages including specific versions.
- Persistent Identifier
A long-lived method for identifying a resource that is unique, and widely understandable by a community.
- Pattern
A pattern rule is a rule that contains exactly one % character in the target, which can be used to match a part of a filename.
- Persona
A persona is the detail of an imaginary user or member, based on real-world observations and understandings of existing members or potential future members.
- Persona Canvas
The persona canvas can be used to assemble all your responses in one place, share this tangible information of your mental model (abstract concepts from our thoughts) with your colleagues and create a common language to communicate about your community members, users, and contributors.
- Phony Target
A phony target is one that doesn’t correspond to a file on the filesystem. A target is marked as phony by making it a prerequisite of the .PHONY target.
- Power Users
These are people who are already familiar enough with a platform to know the gotchas and tricks that make their experience more efficient.
- Prerequisite
The prerequisite(s) of a rule correspond to files or other targets in the Makefile that must be up to date before the rule is run.
- Project Design
An early phase of the project where a project’s key features, structure, criteria for success, and major deliverables are all planned out.
- Pull Request
Proposed changes to a remote repository. Collaborators without write access can send a pull request to the administrator with the changes they’ve made to the repo. The administrator can then approve and merge or reject the changes to the main repository. For open source projects pull requests can be sent by anyone that has forked a project.
- Push
Sending changes to a remote repo. The remote repository is updated with the changes pushed and now mirrors the local repo.
R¶
- RDM
Abbreviation for research data management - see research data management for definition.
- README
A file which contains useful information about a project such as what it is, how to use/install it, how to test it, and how to contribute to it.
- Recipe
One or more shell commands that are executed by Make. Usually these commands update the target of the rule.
- Regression Test
Comparing the result of a test before and after the code has been altered. If the output has changed a problem has been introduced somewhere in the program, and an error is thrown.
- Replicable
A result is replicable when the same analysis performed on different datasets produces qualitatively similar answers.
- repo2docker
A tool to build Docker images from code repositories.
- Repository
Same as Data or Code Reprository. A long-lived place on the internet where resources (be they data, software, publications or anything else) can be stored and accessed. This keyword is often shortened to ‘repo’.
- Reproducible
A result is reproducible when the same analysis steps performed on the same dataset consistently produces the same answer.
- Rendered Output
This is what the text will look like on an online page in Github or web page
- Research Compendia
This is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, metadata). The collection is created in such a way that reproducing all results is straight forward.
- Research Data Management
Acronym: RDM. Refers to the organisation, storage and preservation of data created during a research project. It covers initial planning, day-to-day processes and long-term archiving and sharing. Shortened to RDM.
- Research Ethics
Research ethics are the moral principles that govern how researchers should carry out their work. These principles are used to shape research regulations agreed by groups such as university governing bodies, communities or governments. All researchers should follow any regulations that apply to their work.
- Review
Suggesting changes or asking for committing something to an already created pull request.
- Risk Assessment
This is used to help choose the appropriate sustainable software concepts for your project.
- Risk Matrix
A risk matrix is a way of quantifying what’s going on with the thing you’re interested in. One axis measures exposure in some way, and the other the impact of a mishap. The further from the origin, the more safeguards are needed to make the risk acceptable.
- Roadmapping
This is the creation of a roadmap for your project. It is an outline for the work you need to do. It covers your goals, vision and a timeline for tasks.
- Robust
A result is robust when the same dataset is subjected to different analysis workflows to answer the same research question (for example one pipeline written in R and another written in Python) and a qualitatively similar or identical answer is produced. Robust results show that the work is not dependent on the specificities of the programming language chosen to perform the analysis.
- Rule
An element of the Makefile that defines something that must be built, usually consists of targets, recipes, and optionally, prerequisites.
- Runtime Test
Tests embedded within the program which are run as part of it.
S¶
- Self Archiving
Placing a publication or other research outputs in a suitable repository, institutional or subject-based, following the possible restrictions posed by the publisher, for example an embargo period, or limits on the allowed version to be deposited in such archives.
- SHA
Unique string of numbers of letters used to identify every commit or node in the repository.
- Smoke Testing
Very brief initial checks that ensures the basic requirements required to run the project hold. If these fail there is no point proceeding to additional levels of testing until they are fixed.
- Staged
Staging the changes that will be included in the next git commit.
- Stochastic Code
Code which, while correct, does not always output the same result. For example a program that outputs ten random numbers will generate a different result each time, despite being correct.
- Syntax
The structure of statements in a computer language.
- System Testing
A level of the software testing process where a complete, integrated system is tested. The purpose of this test is to evaluate whether the system as a whole gives the correct outputs for given inputs. Also see end to end test.
T¶
- Target
The outcome of a rule in a Makefile. It is usually a file. If it is not a file, it’s a phony target.
- Test Driven Development
A process of code development where unit tests are written before the units themselves.
- Test Stub
Fake implementations of parts of code which are used in testing to remove dependencies.
- Test Suite
The tests that have been written for a project.
- Testing Framework
Tools that make writing and running tests less labour intensive.
- Travis
A commonly used continuous integration platform.
U¶
- Unit
A small piece of code that does one simple thing. It usually has one or a few inputs and usually a single output.
- Unit Testing
A level of the software testing process where individual units of a software are tested. The purpose is to validate that each unit of the software performs as designed.
V¶
- Virtual Machine
A simulated computer that can encapsulate and entire computational environment including its operating system, customised settings, software and files.
W¶
X¶
Y¶
- YAML
A human readable/writable markup language which used by many projects for configuration files.