BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles The Importance of Pipeline Quality Gates and How to Implement Them

The Importance of Pipeline Quality Gates and How to Implement Them

Key Takeaways

  • A quality gate is an enforced measure built into your pipeline that the software needs to meet before it can proceed to the next step
  • Security scans should be added as quality gates to your pipeline at the beginning of your project
  • While manual approval steps should be avoided, when needed, they should be built into the pipeline for improved accountability
  • While automated checks are preferred you should build manual overrides for each gate to address urgent issues

  

There is no doubt that CI/CD pipelines have become a vital part of the modern development ecosystem that allows teams to get fast feedback on the quality of the code before it gets deployed. At least that is the idea in principle.

The sad truth is that too often companies fail to fully utilize the fantastic opportunity that a CI/CD pipeline offers in being able to provide rapid test feedback and good quality control by failing to implement effective quality gates into their respective pipelines.

What is a Quality Gate and Why Do You Need Them?

A quality gate is an enforced measure built into your pipeline that the software needs to meet before it can proceed to the next step. This measure enforces certain rules and best practices that the code needs to adhere to prevent poor quality from creeping into the code.

It can also drive the adoption of test automation, as it requires testing to be executed in an automated manner across the pipeline.

This has a knock-on effect of reducing the need for manual regression testing in the development cycle driving rapid delivery across the project.

These quality gates are typically automated, to allow for the pipeline to self-monitor the quality of the code delivered.

Still, it is possible to place a manual verification step into a CI/CD pipeline to prevent accidental errors or ensure certain measures have been properly signed off.

How a Typical Pipeline Looks with Quality Gates in Place

So, I’ve briefly explained the purpose of quality gates, but it perhaps makes more sense to describe how a quality gate will affect the structure of a pipeline and check for quality at the different stages of a project.

While pipelines will all be structured differently based on their purpose and the type of environments a team is working with, the following quality checks in a pipeline are helpful.

Stage 1 - Setup and Checkout:

The developer checks out the code. A setup file should be present to ensure that the environment is then built to be consistent for each developer. Part of this setup should include several linting standards that will also check that certain coding principles are been adhered to. This will prevent code from being deployed where it does not meet these appropriate linting standards.

Quality Check: Linting standards need to be met before the code build can be successful.

Stage 2 - Build:

Once a developer is ready to submit their code, the pipeline will take their code and build it in a container/mocked environment. This environment is where the unit tests will be run.

Stage 3 – Execute Unit Tests/CI Tests:

These tests include both the unit tests written by the developer for the modules under development and some broader component tests which will represent the execution across all the modules, but at a mocked level. These component tests are especially useful when developers have worked on different components of code in isolation and some additional automated tests are required to ensure correct interoperability between the modules.  

Quality Check: Check that the execution of unit tests meets pre-set criteria i.e., 100% successful completion of all tests with a 90% code coverage achieved at a unit testing level.

Stage 4 - Static Analysis:

The relevant static analysis scans and security scans are run against the code base to ensure that certain coding and security best practices have been adhered to.

Quality Check: Successful completion of scans with 100% code coverage.

Stage 5 - Environment readiness check

These are contract-level tests that will run to ensure that the test environment meets the required expectations of the deployment code. It could be something as simple as getting a successful response from certain dependent applications and databases, compliance or version checks of certain software and patches, to measuring existing server utilization to ensure it meets the requirements to be able to successfully run the new change before deployment.

Stage 6 - Deployment to Test Env:

It is only at this point that the code is deployed into an integrated test environment where it will exist against other unmocked modules (actual deployed code sitting outside of the current repos domain) that will run tests developed by the testing team to cover a wider range of unmocked integration tests and end-to-end tests.

Stage 7 - Post-Deployment Checks (Smoke Testing):

These should be lightweight tests of the code to ensure that it is working effectively within the test environment. It will run tests against the newly deployed functionality to ensure it is working correctly and do a high-level regression (smoke testing) of other functionality to ensure nothing critical has been broken by the change. Should it fail here, the code is rolled back and the QA environment is restored.

Quality Check: Successful passing of all post-deployment and smoke tests.

Stage 8 – Automated Functional Integration Tests:

This is where the remainder of the automated tests identified by the testing team are executed. This will span a wider coverage of the codebase and should include some unmocked tests as well, with more realistic data that better resembles production.

Quality Check: All tests need to pass.

Stage 9 - Dynamic Code Analysis:

This is another scan that is run against live code (unlike the static analysis scans which are run against pre-deployed code) and provides an additional measure of quality and security checks. This includes checking SQL queries, long input strings (to exploit buffer overflow vulnerabilities), large numbers (to detect integer overflow and underflow vulnerabilities), and unexpected input data (to exploit invalid assumptions by developers). These are all vital checks that are best run against an actual working environment.

Quality Check: Successful completion of scans.

Stage 10 - Deploy to Staging:

Before this step, you may also want to run a readiness assessment like what was run in Stage 5 to ensure this environment is ready for deployment too. I have simply just chosen not to repeat these steps here again.

The code is then passed on to a staging environment, which is another integrated environment, but one that better reflects the state of production. Also, unlike test environments which can be scaled up and down, this one should be permanently available and configured as the code would be in production unless your actual production environment is scaled this same way. This is important because we want this environment to mimic production settings as close as possible and be configured in the same way, to provide an accurate environment to test against.

Any final manual or automated validations can also be conducted at this time by the testing team. These won't necessarily form part of the automated tests unless the testing team deems it necessary, though anything that can be automated should ideally be automated.

Post-Deployment Checks:

As was conducted against the QA environment, a set of post-deployment tests are run. This ensures that the staging environment is in the correct state. Smoke tests are then executed to ensure the deployed code is in a usable state.

Quality Check: Successful passing of all post-deployment and smoke tests.

Stage 11 - Non-Functional Test Execution:

It's at this stage that all load, performance, and additional security tests are executed to ensure that the code meets all the required non-functional requirements (NFR) standards before being deployed into production.

Quality Check: Successful completion and passing of all NFR tests.

Once the code has passed all these stringent quality checks then it is deemed sufficient enough to be deployed to production.

What Does a Quality Gate Check

It all depends on how much information you have to make decisions upon. If you make use of the right data and can identify a reliable measurement based on that data, you can use it to build a quality gate into your system. However, some common quality gate criteria that should be considered are the following:

Quality Validation

This is the most obvious and common form of quality gate that teams make use of. Metrics from the test build artifacts such as pass rate or code coverage are measured and the code is deployed only if they are within required thresholds. In most pipelines, you will probably want to have multiple gates to assess quality across different layers of testing: unit, component, integration, and even any automated end-to-end (E2E) testing.

That automation element is important because any reliance on manual testing or manual processes will affect the speed of your pipeline and reduce its effectiveness. You will also want to have as many unit and component tests as possible, to reduce the execution times of the quality gates and provide quicker feedback.  

Security Scans on Artifacts

This is another quality gate that you want to build into your pipeline checks. Preferably from the beginning of your project. This quality gate requires security scans for things such as anti-virus checking, code signing, and policy checking to be set up against the code repo and for the scan results to then be analyzed and passed against a certain threshold before the code is deployed any further. A gate should initiate the scan and check for completion and success before moving the code to the next stage.

Along with searching for vulnerabilities in the code, you can also use this gate to check for outdated packages and add this to the scoring system. This will help to drive continued maintenance of the code to the latest versions and reduce future tech debt.

Infrastructure Health

This ensures that the environment you are intending to deploy into is in the right state to receive your code. Throughout your pipeline, you can monitor and validate the infrastructure against compliance rules after each deployment, or wait for health resource utilization to meet a certain preset requirement before continuing. These health parameters may vary over time, regularly changing their status from healthy to unhealthy and back to healthy. This is one of the reasons why routinely checking this can be useful as it prevents flakiness in deployments.

To account for such variations, all the gates are periodically re-evaluated until all of them are successful at the same time. The release execution and deployment do not proceed if all gates do not succeed in the same interval and before the configured timeout.

Incident and Issue Management

Ensure the required status for work items, incidents, and issues that are tracked by your sprint/defect management system are all in the correct state. For example, deployments should only occur if no priority zero bugs exist, You should also validate that there are no active incidents after deployment.

Seeking Approvals Through External Users

Notify external users such as legal approval departments, auditors, or IT managers about a deployment by integrating with approval collaboration systems such as Microsoft Teams or Slack, and waiting for the approval to be complete. This might seem like an unnecessary step that would slow down the agility of a pipeline, but on certain work items, it could be essential.

For instance, I’ve worked on certain applications where specific releases needed to meet specific regulatory and legal requirements and we had a legal person sign off on this functionality in test and give a manual approval before it could be deployed. Many financial companies may have similar audit requirements that need to be met depending on the functionality being worked on. When it is essential it is important, for accountability, that it gets built into the pipeline processes as required.

User Experience Relative to Baseline

Using product telemetry, ensure the user experience hasn't regressed from the baseline state. The experience level before the deployment could be considered as the baseline.

Overriding any measures

While a lot of these gates are automated checks that will enforce the rules strictly, there will always be scenarios where you will want to deploy code to the next stage of your quality gate knowing it won’t be able to meet certain criteria. This may be needed to address an urgent issue for example. You should build a manual deployment override that will bypass any or all steps via the verification of multiple people in the party. Preferably they don’t belong to the same discipline so at least two people from development/business and testing need to agree on the decision.

How to Build Quality Gates into a Pipeline

So, we now know what quality gates do, how they change the structure of a typical pipeline, and what should be checked – but how do we build these quality gates into your pipeline?

A lot of this depends on the tools that are being used by a specific company, so I will present a few examples of quality gates being implemented in YAML that should be able to work with the most common CI/CD applications.

Checking Environments Before Deployment

What you want to do here is run a set of smoke tests against an environment and then stop the deployment if the smoke tests fail:

- name: Pre-deploy test
       task:
             jobs:
                    - name: Server & database
                          commands:
                              - checkout
                              - bash ./scripts/check-db-up.sh
                              - bash ./scripts/check-server-up.sh

In this code example, some bash scripts have been developed to check the state of a server and DB before the deployment scripts are executed. If those commands return failures, then the deployment script does not run.

Similarly, the smoke tests could be placed after the code is deployed to ensure that the systems are still operational post-deploy.

- name: Post-deploy test
      task:
             jobs:
                    - name: Smoke test
                    commands:
                         -checkout
                         - bash ./scripts/check-app-up.sh

In these two examples, I’m simply just calling shell scripts that do basic environment checks for us, but you can easily call an actual suite of tests here that can execute to verify the health of your environment pre and post-deployment.

We are not interested in trying to measure any form of coverage here, instead, if any of the tests fail, the code should not deploy. And so, the trick is within the test themselves to build break commands that will prevent the tests from finishing should a failure occur.

Measuring Code Coverage and Pass Rates Before Deployment

Now depending on the code coverage tool you are using it might represent results differently, so you should set up your tool in a way that meets your needs and then adjust your pipeline accordingly.

 # ReportGenerator extension to combine code coverage outputs into one      
      - task: reportgenerator@4
        inputs:
          reports: '$(Agent.TempDirectory)/*/coverage.cobertura.xml'
          targetdir: '$(Build.SourcesDirectory)/CoverageResults'
 
      # Publish code coverage report to the pipeline
      - task: PublishCodeCoverageResults@1
        displayName: 'Publish code coverage'
        inputs:
          codeCoverageTool: Cobertura
          summaryFileLocation: '$(Build.SourcesDirectory)/CoverageResults/Cobertura.xml'
          reportDirectory: '$(Build.SourcesDirectory)/CoverageResults'
       
      - task: davesmits.codecoverageprotector.codecoveragecomparerbt.codecoveragecomparerbt@1
        displayName: 'Compare Code Coverage'
        inputs:
          codecoveragetarget: 90
 
      - task: CopyFiles@2
        displayName: 'Copy coverage results'
        inputs:
          SourceFolder: '$(Build.SourcesDirectory)/CoverageResults'
          Contents: '**'
          TargetFolder: '$(Build.ArtifactStagingDirectory)/CoverageResults'

In this sample code, we make use of a specific code coverage tool (Cobertura) and modules to combine the results from different code coverage scans to measure the coverage and have then set a soft target of 90% for it to meet. The results are then published so that the development team can analyze and improve them where necessary.

Ensuring Security Scans are Passed Before Deployment

This code looks at the results of a security scan and determines if the pipeline passes or fails based on a scoring ratio.
For this code example, I will show you first a YAML file that pulls the code from the repo, builds it, and then executes the security scan on it (in this case, an Aqua Security scan). However, all this will do is execute the security scan and unless the scan itself fails for any particular reason, the code will still pass.
steps:

  main_clone:
    title: Cloning main repository...
    type: git-clone
    repo: '${{CF_REPO_OWNER}}/${{CF_REPO_NAME}}'
    revision: '${{CF_REVISION}}'
    stage: prepare
  build:
    title: "Building Docker Image"
    type: "build"
    image_name: "${{CF_ACCOUNT}}/${{CF_REPO_NAME}}"
    tag: ${{CF_REVISION}}
    dockerfile: "Dockerfile"
    stage: "build"
  AquaSecurityScan:
    title: 'Aqua Private scan'
    image: codefresh/cfstep-aqua
    stage: test
    environment:
      - 'AQUA_HOST=${{AQUA_HOST}}'
      - 'AQUA_PASSWORD=${{AQUA_PASSWORD}}'
      - 'AQUA_USERNAME=${{AQUA_USERNAME}}'
      - IMAGE=${{CF_ACCOUNT}}/${{CF_REPO_NAME}}
      - TAG=${{CF_REVISION}}
      - REGISTRY=codefresh

To ensure we stop the deployment from going ahead if the scan flags any vulnerabilities, we can write a scan policy, like the below example, that will fail or pass based on the results.

$ kubectl apply -f - -o yaml << EOF
> ---
> kind: ScanPolicy
> metadata:
>   name: scan-policy
> spec:
>   regoFile: |
>     package policies
>
>     default isCompliant = false
>
>     # Accepted Values: "Critical", "High", "Medium", "Low", "Negligible", "UnknownSeverity"
>     violatingSeverities := ["Critical","High","UnknownSeverity"]
>     ignoreCVEs := []
>
>     contains(array, elem) = true {
>       array[_] = elem
>     } else = false { true }
>
>     isSafe(match) {
>       fails := contains(violatingSeverities, match.Ratings.Rating[_].Severity)
>       not fails
>     }
>
>     isSafe(match) {
>       ignore := contains(ignoreCVEs, match.Id)
>       ignore
>     }
>
>     isCompliant = isSafe(input.currentVulnerability)
> EOF

It’s important to remember here that the results are dependent on the scanning tool itself and its configuration.

Proactive Quality

Every team wants to release reliable, quality code and fund the right balance between the testing effort and the ability to deliver and deploy code quickly. By utilizing quality gates in your CI/CD pipelines, your team can take control of their QA and testing process and build confidence in knowing that code is suitably tested across multiple different testing disciplines before it is deployed into production.
 

About the Author

Rate this Article

Adoption
Style

BT