BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Five Reasons You Shouldn't Reproduce Issues in Remote Environments

Five Reasons You Shouldn't Reproduce Issues in Remote Environments

Key Takeaways

  • Development teams spend a large amount of their time fixing bad code. Having the right tools to optimize time spent in the defect resolution process is critical in the ability for software development organizations to move quickly.
  • Challenges like configuration drift, data security constraints, and oftentimes slow enterprise processes limit the ability for developers to move quickly when finding and resolving production defects.
  • Current tools used by organizations for applications observability and monitoring are helpful but they don’t always have the right information that developers need to get to the root of the problem.
  • Modern cutting edge production debugging tools can in many cases remove the need for development teams to create specialized defect reproduction environments that can be costly and time consuming to create and maintain.
  • Production debugging tools work hand in hand with current observability tools in the APM, logging, and exception management space allowing teams to get details about what their applications are doing at the code level without stopping their application or requiring code changes.

Bugs are an unavoidable part of software development and also one of the biggest time wasters that developers face when building software. According to a study done by Stripe, developers spend over 17 hours a week on maintenance issues like debugging and refactoring with about a quarter of that time spent fixing bad code. This leads to nearly $300 Billion in lost productivity every year.

For this reason alone, it’s critical to ensure that developers' time is being properly utilized and that they have access to the right tools that can help optimize the defect resolution process.

Much of this wasted time often happens in one specific scenario that most development teams should be familiar with. A bug is discovered in production and developers find that they don’t have the right information in their log files to diagnose the issue.

Teams go through an iterative cycle of pushing new builds with new log lines and eventually opt to deploy the current production build to a remote test or staging environment with hopes of being able to more easily reproduce the issue. Sometimes this might work and other times the environment is just too different to adequately reproduce the issue and it goes unresolved. In the rest of this article, we’ll take a look at a few reasons why you shouldn’t reproduce issues in remote environments.

Configuration Drift

As much as we try to keep development and test environments perfectly aligned with the same configuration as production, configuration drift almost always happens. Most organizations today realize that processes such as infrastructure as code and continuous deployment can cut configuration drift down tremendously, but even with strict practices in place configuration drift is often unavoidable. Even things as small as package updates to a server, differences in memory and CPU configurations, or manual changes made by developers or testers on a server can cause drift.

Instead of spinning up and deploying applications to remote test environments in order to identify the root cause of a defect, it’s much easier to inspect the state of the application while it’s running in its native environment in order to gather clues that can lead to resolving the issue. Production debugging tools allow teams to resolve defects faster, where they happen without having to worry about reproducibility in a separate environment.

Inefficient and Time Consuming

Today many organizations have automated build and deployment processes that allow for automated creation or tear down of environments, but many still do not. Regardless of which side of the fence you’re on, spinning up new environments can be time consuming. When development teams need to stop what they’re doing and wait for new environments to be spun up, context switching happens which results in reduced productivity. Deploying new builds into these environments can take anywhere from minutes to hours, or even days if approvals are required and multiple teams need to get involved.

In addition to the time needed to spin up these environments, they can also be costly in terms of cloud or infrastructure spend required to run these environments for extended periods of time. In general, most organizations aren’t spinning up exact duplicates of production environments due to cost constraints which can mean certain hard to reproduce environment specific bugs may still go unresolved in these remote environments.  

Those Hard to Reproduce Bugs

We’ve probably all encountered them at one point or another. You discover a bug in production, gather as many details from the logs as you can, try to reproduce the issue in a test environment but at the end of the day come up empty handed. The defect gets marked as "not reproducible". We hear it all the time from customers. Some bugs go unresolved due to the fact that you can’t get the information you need while the application is running in production and when trying to reproduce the issue in test environments, the bug doesn’t reproduce.

It’s not uncommon for software development organizations to have bugs in their production environments that go unresolved for months or even years due to the inability to reproduce them. For these scenarios it’s critical to have proper tooling which allows you to debug applications on demand without requiring code changes or redeployments of your application.

Data Constraints

When attempting to reproduce an issue across multiple environments, one area that teams must have solid processes around is test data management. Test data can be critical in the reproduction of bugs in that if you don’t have the right test data in your environment, the bug may not be reproducible. Due to the sheer size of production data sets, teams must often work with subsets of that data across test environments. The holy grail of test data management processes is to allow teams to easily quickly subset production data based on the data needed to reproduce an issue.

In practice, things don’t always work out so easily. It’s hard to know what attributes of your test data may be influencing a specific bug. In addition, data security when dealing with PII data can be a major challenge when subsets of data are used across environments. Teams need to ensure that they are in compliance with corporate data privacy standards by masking or generating new relevant data sets.

Many times it takes lots of logging and hands on investigation to uncover how data discrepancies can cause those hard to find bugs. If you cannot easily manage and set up test data on demand, teams will suffer the consequences when it comes to trying to reproduce bugs in remote environments.

The Right Tools for the Job

They say when your only tool is a hammer, all problems start looking like nails. Thinking outside of the box and having the right tools for the job tends to make life a little bit easier. When attempting to debug a production issue, the standard tool set that comes to mind for most organizations are APM, logging, and exception management tools. While these tools are all useful and have their time and place to be used, when developers want to dive into their code running in production these tools just don’t provide the level of detail needed to get to the bottom of those hard to find bugs.

Production debugging tools allow developers to get to a deeper level of detail that APM tools just don’t allow capturing. They also allow collection of on demand logs or snapshots of data from applications but unlike traditional logging solutions, it can be done all without writing a line of code or redeploying the application. Having a production debugging solution in place can often allow developers to find a defect right when and where it’s happening without having to go through the extra effort of spinning up a new environment.

Conclusion

When looking at software development organizations today, far too much time and money is spent reproducing issues in remote environments. Current tools used for monitoring and debugging running applications provide value but don’t always give relevant information that developers need to identify and resolve bugs. While of course creation of defect reproduction environments can sometimes be an unavoidable necessity of bug fixing, by giving teams the right tools in their tool belt, the costs and constraints associated with these remote environments can often be limited.

About the Author

Josh Hendrick is a Senior Solutions Engineer at Rookout. He loves discovering new tech and has worked across a wide range of companies specializing in software development and DevOps. Over the years he's been an avid traveler and has done consulting work across the world with technology firms in places like Hong Kong, Australia, and across Europe. In his free time, you'll find him practicing Tai Chi, playing guitar and producing music.

 

Rate this Article

Adoption
Style

BT