Git usage methodology
This document proposes a methodology on how to use the Git versioning system and as set of generic evolution recommendations and semantic versioning of artefacts.
Git versioning system
Git [git] system and the public Git server GitHub [github] are frequently chosen technologies for version management in the context of the European Commission projects, and also the current one. In this section, we introduce technical concepts necessary to comprehend the following sections.
Git is software for tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. This system can be used in virtually any context where the collaborative evolution of artefacts is organised in a filesystem. Its goals, among others, include speed, data integrity, and support for distributed, non-linear workflows.
Git stores five types of objects.
A blob (binary large object) is the content of a file. Blobs have no proper file name, time stamps, or other metadata. In Git each blob is a version of a file, it holds the file’s data.
A tree object is the equivalent of a directory. It contains a list of file names, each with some type bits and a reference to a blob or tree object that is that file, symbolic link, or directory’s contents. These objects are a snapshot of the source tree.
A commit object links tree objects together into history. It contains the name of a tree object (of the top-level source directory), a timestamp, a log message, and the names of zero or more parent commit objects.
Additionally, Git manages named references to indicate locations of various commits.
Branches are named references that are advanced automatically to the new commit when a commit is made on top of them. They are used to create another line of development from a trunk. By default, Git has a master branch, which serves as the trunk. Usually, a branch is created to work on a new feature. Once the feature is completed, it is merged back with the master branch and we delete the branch.
Tags are like branch references but fixed to a particular commit. A tag assigns a meaningful name with a specific version in the repository. Used to label important points in history.
OP Context
Assumptions:
-
Git is used for managing textual and binary sources: textual sources can smoothly evolve and be tracked by Git; binary sources have to be rewritten each time and no evolution tracking is possible (hopefully changes are based on incremental evolution and not arbitrary)
-
The folder structure of the project is consistent and coherent in the evolution history and across the branches.
Requirements:
-
Share the current API functionalities/capabilities or current model organisation/structure/approach
-
Allow comparing versions and detect immediately whether versions are compatible
-
Each major, minor and patch versions are published as separate tags.
-
Branches shall not be used for versioning but for specific functional purposes, such as individual bug fixing, individual new feature developments, etc.
-
Allow transition smoothly from one major version to the next one without disrupting the current clients. Clients shall be guaranteed to have no disruptions for a published version for an established time frame.
-
Clients that are unable to migrate to new versions might be supported.
Semantic versioning
In the world of software management, there exists a dreaded place called "dependency hell". This applies not only to software development but to authoring models, architecture, documentation, or, broadly, any artefact. The bigger the body of work grows (software, model, documentation etc.) the more changes are introduced and dependencies increased, which makes it more likely to arrive, one day, in this pit of despair.
Releasing a new work package quickly becomes a nightmare. If the dependency specifications are too tight, you are in danger of version lock (the inability to upgrade a package without having to release new versions of every dependent package). If dependencies are specified too loosely, you will inevitably be bitten by version promiscuity (assuming compatibility with more future versions than is reasonable). Dependency hell is where you are when version lock and/or version promiscuity prevent you from easily and safely moving your project forward. [semver]
A solution to this problem is called "Semantic Versioning" [semver], which is a simple set of rules and requirements that dictate how version numbers are assigned and incremented. It involves describing clearly and precisely what is being published and which features are included and what are the limitations, basically setting the expectations and scope. Once the artefacts are identified for the public, you communicate changes to them with specific increments to your version number and eventually with associated change notes.
X.Y.Z (Major.Minor.Patch)
Consider a version format of X.Y.Z (Major.Minor.Patch). Bug fixes not affecting the artefact increment the patch version. Backwards compatible artefact additions/changes increment the minor version. And, backwards-incompatible artefact changes increment the major version. For details refer to [semver] specification. Under this scheme version numbers and the way they change convey meaning about the underlying artefact and what has been modified from one version to the next.
Figure 1 depicts an example of how the changes in version number shall be interpreted under the semantic versioning scheme. For example, evolving from version 1.3.5 to 1.3.6 represents a backwards-compatible patch, from 1.3.5 to 1.4.0 a backwards-compatible new development, while switching from 1.3.5 to 2.0.0 breaks the backwards compatibility.
Adopting semantic versioning has a number of advantages [despat].
-
Accuracy and exact identification of artefact versions
-
Traceability of artefact evolution in use for governance
-
Minimizing impact on the client-side caused by artefact changes
-
Guaranteeing that artefact changes do not lead to accidentally breaking compatibility on the semantic level
-
Minimal effort to detect version incompatibility
-
Clear separation of changes with different levels of impact and compatibility
-
Clarity with regard to evolution timeline of the artefact
-
Manageability of artefact versions and related governance effort (e.g., approval processes, quality gates, number of parallel versions, number of branches of artefact versions)
Git workflows
Git is a very flexible system that is method agnostic. A set of good practices, however, was developed over time and became widely accepted across communities of practitioners. In this section, we present a set of relevant Git workflows and branching strategies.
Basics branching
When committing a change, it is always best to avoid doing that on the master. For instance, if the task is to add a new feature to the project, then a new branch shall be created just for that feature without affecting the main part of the project. Once the feature implementation is completed, the changes on that branch can be merged into the master branch. This is depicted in Figure 2.
When a new branch is created, Git keeps track of which commit is branched off into a new branch. So the history of changes for all the files is well tracked.
The contributor makes changes locally on a support branch (for example a feature) and when the work is complete, the changes are all pushed into the remote repository. Then the contributor files a pull request, which means that before merging the branch into the master branch, the changes must be revised by peers. This process usually generates a number of improvement suggestions, which result in a series of additional minor commits. Using pull requests is a best practice generally adopted for ensuring a higher quality of work. When the pull request is accepted, the branch author or the repository maintainer merges the branch into the master or development branch, depending on the branching strategy. The pull request is closed and the feature branch is deleted from the repository.
Contribution types
Typical contribution - The typical day-to-day flow includes normal changes that contributors make to the codebase, changes that do not bring any heightened sense of urgency. These changes are ordinary in terms of size and complexity for your codebase and, generally, will make up the bulk of all the changes the contributors make. Since this flow will be used the most frequently, your strategy here must ensure proper coordination among the developers and support all relevant policies such as automated testing, pull requests, and deployments.
Emergency hotfixes - An emergency hotfix is when a particular incident or issue has been expedited to deal with some emergent situation, normally bug fixes. Your flow must account for a contributor who needs to make an urgent change and get it all the way through your process and into production while still aligning with the typical development workflow.
Simple vs complex changes - As the practice has evolved, the emphasis on contributors working on smaller changes and limited batch sizes has increased. This is due to the broader use of flow-based delivery practices. However, there are still situations when large, complex changes must be made and the branching strategy must accommodate those situations.
Standard vs experimental changes - Contributors feel greater certainty about how standard code changes will perform than with experimental code changes. For example, if a new untested approach is evaluated, or an experimental feature is developed, the result is almost always uncertain about how well it integrates with the existing codebase. In that case, the changes may or may not actually go to production but still need to be shared with other contributors. The branching strategy must account for these types of experimental changes.
Branch types
Typically the branches are divided into two categories, the main branches: master and development; and supporting branches: feature, hotfix, release.
Any repository holds at least one main branch, and in the case a development branch is adopted then there are two main branches with an infinite lifetime.
The branching strategy needs to support parallel work between contributors, ease tracking of features, prepare for production releases and to assist in quickly fixing live production problems. This is done with supporting branches. Unlike the main branches, these branches always have a limited lifetime, since they are removed after the work is completed.
Master branch - Every Git repository has a trunk (also referred to as main, mainline, or the master branch). When a Git repository is created, the trunk exists automatically as the implicit first branch. The use of a trunk and the timing of changes landing on it vary depending on the exact branching strategy being used. In trunk-based development, the trunk is the central branch to which all developers send their code changes.
Feature branch - A feature branch is used to develop a new feature. It can be short- or long-lived depending on the specific branching flow. The branch often is used by a single contributor for only their changes, but it is possible to share it with other contributors as well. Feature branches span from either the trunk or development branch depending on the branching strategy.
Development branch (optional) - The development branch is a long-lived feature branch that holds changes made by contributors before they’re ready to go to production. It parallels the trunk and is never removed. Some teams have the development branch correspond with a non-production environment. As such, commits to the development branch trigger test environment deployments if automation is set up. Development and trunk are frequently bidirectionally integrated, and it’s typical for a contributor to bear the responsibility of integrating them. Some branching strategies avoid a development branch and span feature or hotfix branches directly from the trunk. This is decided by the branching strategy and in our case, we leave it out.
Release branch - A release branch can be either short-lived or long-lived depending on the strategy. In either case, the release branch reflects a set of changes that are intended to go through the production release process. It usually involves, among others, increasing the version number, announcing a release, fixing last-minute bugs, but no new features are included.
Hotfix branch - A hotfix branch is a branch that’s used generally to hold changes related to emergency bug fixes. They are typically short-lived and are split off from a release or main branch. These branches are common in projects with explicitly versioned artefacts.
Naming conventions
Consider the (branch_type/branch_id) naming convention for branches. The branch_type is one of the branch types used in the adopted branching strategy: master, feature, hotfix, release. The branch_id is a unique identifier. In cases where the development process is managed with a ticket management system, then the branch_id shall be the ticket id. Alternatively, branches can be given mnemonic identifiers, but doing so is discouraged as a regular practice.
branch_type/branch_id
So for example implementation of a feature registered under id "#32" in the ticket management system, shall be developed in the branch named "feature/32". Whereas bug "#45" shall be fixed in the branch named "hotfix/45". Preparation of release "0.5.0" shall be done in the branch named "release/0.5.0".
vX.Y.X
Tags on the other hand shall be named always by the semantic version scheme mentioned in the previous section. A good practice is to add the prefix "v" for "version" like this "vX.Y.X". For example, the release "0.5.1" shall be tagged with "v0.5.1".
Simplified Gitflow model
The Gitflow model has been developed over 10 years [gitflow] and is the result of crystallising Git usage best practices. It involves two main branches: master and development. In this section, we describe a simplified version of it, which uses only a single main branch: the master.
Figure 3 depicts a prototypical branch organisation covering the typical use cases presented in this report.
Developing a new feature
New developments are done by branching off a new feature branch from the master, or occasionally from another feature branch. It is always merged back into the master and deleted as soon as the development is completed and revised in a pull request process. During the time of development, it may not be known when the feature will be released so it can be incorporated in the next or future versions.
Releasing a new version
To prepare a new release, create a branch named after its version. On this branch will be done the entire production release for a new version. This includes making last-minute adjustments and bug fixes, preparing the meta-data for release (version numbers, build dates, etc.), writing the release notes, etc. By doing all of this work on a release branch, the master branch is cleared to receive features for the next big release.
The key moment to branch off a new release branch from the master is when it reflects the desired state of the new release, incorporating all the planned features. All features targeted at future releases must wait until after the release branch is branched off.
Adding large new features here is strictly prohibited. They must be merged into master, and therefore, wait for the next big release.
When the state of the release branch is ready to become a real release, some actions need to be carried out. First, the release branch is merged into master, and the release branch closed. Next, that commit on master must be tagged with the release version for easy future reference to this point in history.
Fixing bugs
Hotfix branches are very much like release branches in that they are also meant to prepare for a new production release, albeit unplanned. They arise from the necessity to act immediately upon an undesired state of a released version. When a critical bug is discovered in the active version, and it must be resolved immediately, a hotfix branch may be branched off from the corresponding tag on the master branch that marks the production version.
The essence is that the work of most contributors on the master branch can continue with the feature developments, while a dedicated person is preparing a quick production fix. When the bug is resolved, a new version is released incrementing the patch number only, and merging the changes into the master.
Evolution patterns
Two in Production
An artefact evolves and new versions with improved content or functionality are offered regularly. At some point in time, the decision to evolve to a new major version is taken and the changes of the new version are not backwards compatible anymore. This is a breaking evolution for existing clients. However, clients evolve at different speeds. Some of them cannot be forced to upgrade to the latest version in a short time frame. Their development and evolution pace is different and is based on their needs and requirements.
As publishers, we need to ensure the possibility exists to gradually update an artefact without breaking existing clients, but also without having to maintain a large number of active versions. In the case of software development, this means running and maintaining multiple versions in production. In the case of ontologies, schemas and models this means maintaining multiple model versions published and officially declared as working standards.
The solution is to maintain two versions of a model in parallel as working standards as depicted in Figure 4. These two model versions provide roughly the same domain coverage but are incompatible with each other because they implement different modelling decisions and approaches.
Or, in the case of software products, deploy and support two versions of a software endpoint and its operations that provide variations of the same functionality, but do not have to be compatible with each other.
After a grace period, the older version must be decommissioned. The result is continuous update and decommission of the versions in a rolling and overlapping fashion.
To do so, choose the major versions that will be active in parallel (for example 2.x.x and 3.x.x) and inform the clients about the life cycle. When releasing the new artefact version into production or as a working standard, decommission the previous version and inform the clients about the update and migration options. This way a sliding window of active versions is created in which clients have time to evolve to the new one before the old one is decommissioned. Offer a limited time guarantee as explained in the next section.
Limited Lifetime Guarantee
An artefact has been published and made available to at least one client. The publisher cannot manage or influence the evolution roadmaps of its clients, or the damage caused by forcing clients to change their implementation. Therefore, the publisher does not want to make any breaking changes in the active versions of the artefacts, but still wants to improve them and evolve them in the future.
To do so the publisher has to let clients know for how long they can rely on the published version of an artefact. In addition, the publisher must guarantee to not break the published artefacts for a given fixed time frame. Typical time frames are multitudes of 6 months (6, 12, 18, or 24 months), which seems to provide a good balance for provider and client needs in practice.
This pattern is used in combination with the previously described pattern of two (or more) active parallel versions for a limited time. On the one hand, this keeps the client safe from unwanted negative impacts or outages. On the other hand, this sets a well-known deadline in advance that the client can use to plan for a smooth transition to the new version.
After the timeframe expires the old version can either be decommissioned or discontinued. That is, either the version is declared obsolete and its usage is discouraged; or, the version is simply no longer maintained.
A consequence of this approach is that it is possible to plan it well due to fixed time windows known long in advance. This, however, limits the possibility to respond to urgent unforeseen changes. Also, clients are forced to update, which may conflict with their roadmap; while the clients that cannot migrate to the new version are abandoned, unless an additional service agreement is set in place for them.
This strategy is a middle ground between two of its variations: (a) the eternal lifetime guarantee and (b) the aggressive obsolescence. We briefly summarize these variations in the next sections, and an alternative variation, the experimental preview. We, however, discourage adopting any of them.
Aggressive Obsolescence
Once an artefact has been released, it evolves and new versions with added, removed or changed content or functionality are offered. In order to reduce effort, artefact providers do not want to support certain commitments for clients anymore, e.g., because they are no longer used regularly or are superseded by alternative versions.
In order to reduce maintenance efforts to a minimum, the publisher announces a decommissioning date as early as possible for the obsolete artefacts. The publisher declares such artefacts to be immediately deprecated (i.e., still available, but no longer recommended to be used) so that clients have barely enough time to upgrade to a newer or alternative version. Artefact support is removed as soon as the deadline has passed.
Such an approach may be considered disruptive and quite brutal by many clients. Therefore to acknowledge this and offer a balance of power between the publisher and client, for example, involve the clients or give them the possibility to steer the artefact design and evolution. It can, for example, take the form of an evolution workgroup, standardisation committee or regular public consultations.
The consequences of this approach are that the publisher codebase is kept small and thus easier to maintain. The publisher must announce which features are deprecated and when they will be decommissioned, while the clients must migrate to the new version.
If saving costs is not an optimisation factor, but rather serving each client diligently is, then the next pattern proposes how to approach this.
Eternal Lifetime Guarantee
An artefact has been made available to at least one client. A new version is evolved and also published. However, one or more of the clients cannot be asked to upgrade to use the latest version.
To support clients that are unwilling or unable to migrate to newer API versions at all, guarantee to never break or discontinue a published artefact version.
This shall not mean, however, parallel developments. The new features and evolutions shall be done for the new versions of an artefact, while the old version may benefit from bug fixes, which ideally may be also ported into the new version. The development on the old versions shall stop, however, otherwise we reach the split point where the old version evolves differently based on the requirements of a few clients rather than the original artefact vision. And thus the two evolutions shall be maintained in different repositories because they are no longer parallel versions but distinct artefacts.
The consequences of this approach are that the clients do not need to change while the publisher becomes more attractive because clients can expect the artefacts to remain available for a long time. On the downside, the innovation opportunities are lost and the technical debt accumulating on the publisher side, which leads to the increased maintenance costs.
Experimental Preview
A publisher is developing a new artefact version that differs significantly from the published version(s) and is still under intensive development. As a result, the publisher wants to be able to freely make any modifications necessary without any commitments to the clients. However, the publisher also wants to offer its clients early access so that these clients can start experimenting and integrating against the new artefact version and comment on the functionality and structure. This is in fact an approach in support of an iterative and incremental, or even agile, development process.
The idea is to enable providers to share the experimental artefact version with the clients with minimal risks and also obtain early adopter feedback without having to freeze the development process.
This can be done by providing early access to the artefact version in development on a best-effort basis without making any commitments about representation, structure or functionality offered, stability, and longevity. The lack of maturity and experimental nature must be clearly and explicitly articulated about this artefact version.
This approach brings advantages for both clients and publishers. Clients have early access to innovation and the opportunity to influence the artefact design and development, thereby living according to agile values and principles such as welcoming change and responding to it continuously. Publishers have the flexibility to freely and rapidly change the artefact before declaring it stable. This, however, has a downside in that it may be difficult for providers to attract clients due to the lack of long term commitment or perceived immaturity. Clients have to keep changing their implementation if they do not run out of budget until a stable version is released.
The experimental preview, which covers the pre-release guarantees, is complemented by an application of the two versions in production approach for governing the life cycle of active artefact versions. The experimental preview can either be made available to all known or unknown clients; alternatively, a closed user group can be selected for it, which limits the support and communication efforts.
References
-
[semver] Preston-Werner, T. (n.d.). Semantic Versioning 2.0.0. Semantic Versioning. https://semver.org/.
-
[despat] - Zimmermann, O., & Stocker, M. (2021, March 15). Design Practice Reference. Leanpub. https://leanpub.com/dpr.
-
[soar] - Pautasso, C. (2020, July 2). Software Architecture. https://leanpub.com/software-architecture.
-
[gitflow] Dwaraki, A., Seetharaman, S., Natarajan, S., & Wolf, T. (2015). GitFlow. Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research. https://doi.org/10.1145/2774993.2775064
-
[git] Chacon, S., & Straub, B. (2014). Pro git (p. 456). Springer Nature. https://library.oapen.org/bitstream/handle/20.500.12657/28155/1001839.pdf
-
[github] Where the world builds software. GitHub. (n.d.). https://github.com/.