When we talk about best practices for software reliability, the conversation tends to focus on optimizing the applications themselves and the infrastructure that hosts them. The driving idea is that reliability must be baked into system architectures and infrastructure from the beginning.
That’s certainly true. But by focusing too much on the design and implementation of applications, you run the risk of overlooking another crucial element for achieving reliability (not to mention agility and extensibility): a culture of operational excellence.
It’s only by baking excellence into your organization’s operational culture that you can maximize reliability.
Let me explain…
Defining Operational Excellence
Put simply, operational excellence is a mindset that is embraced across an organization to maximize outcomes and positive results.
Operational excellence can be reinforced by specific tools and processes, but it’s ultimately a philosophy and a cultural principle. It must be integrated into your organizational culture in order to achieve its full effects.
Operational Excellence and Software
Operational excellence is a buzzword that you hear occasionally in the context of business management. It only rarely appears in conversations about software. Nonetheless, the concept has an important role to play in several respects, including making software more reliable and extensible.
Uniting Disparate Teams
One of the most pervasive challenges in modern software deployment and management is the fact that every deployment depends on multiple teams. Developers, IT operations, infrastructure engineers, and business managers all have a role to play in ensuring that an application is a success.
Getting all of these teams to work together can be a challenge. The ideal solution, according to the DevOps mantra, is to “de-silo” your organization by creating a single DevOps teams that oversees development, operations and everything in between. For some companies, that approach will work, but for others, single-team DevOps is not a realistic goal. There will always be some distinction between teams and specializations.
Yet whether you have a single DevOps team, or multiple teams working toward a common goal, operational excellence can help you achieve the cultural unification necessary to help everyone advance the common good. Operational excellence serves as cultural goal shared that is shared by all teams and team members during the software development and deployment process. By making excellence part and parcel of your culture, you gain a principle that can guide all of your teams, even if they otherwise share little in common culturally (which, generally speaking, is the case; developer culture is not the same as IT culture, and certainly not the same as business culture).
Operationalizing Best Practices
Another key benefit of operational excellence is that it helps to ensure that best practices are actually followed across the software deployment process.
Most stakeholders in software development and deployment know what they should do when performing their jobs...but they don’t always do it. Your developers might churn out an inefficient function because they are under pressure to meet a deadline and need to deliver something that works, even if they know that they are not following coding best practices. Your IT team may fail to configure granular IAM permissions for a virtual server because they lack the time, even though they know that is not a best practice from a security standpoint.
Instilling operational excellence into your teams helps to prevent these types of shortcuts. When a culture of excellence prevails, your team will follow best practices, even if they take longer.
Human beings are naturally inclined to desire finality. Feeling like we will never reach the end of a task is discomforting (that’s why poor Sisyphus was so unhappy, for example). Yet, this discomfort needs to be managed if your organization is to achieve a culture of continuous improvement. Continuous improvement means that your teams constantly find ways to make things better, even if they are already quite good.
For developers and IT teams, this can be a tough pill to swallow. Developers or engineers who have achieved 99.99999 reliability for their application would probably rather feel proud of what they’ve accomplished than be told that they need to add another 9 to that figure.
Still, striving to become even better is what continuous improvement requires. A culture of operational excellence can help ensure that your teams welcome this type of challenge rather than resent it.
When it comes to software development and deployment, personal “ownership” of any part of the code or process is a risk to the organization. If only one programmer knows how a certain microservice works, or only one IT engineer has the power to create new databases, your organization will lose agility, or it could even end up with a “hostage” employee situation.
Instead, software deployment should be as collaborative as possible. Everyone within a given team should understand and (in a pinch, at least) be able to perform everyone else’s job.
A culture of operational excellence helps to ensure that your teams take a collaborative approach to the way in which they build and deploy software. Instead of letting each individual own part of the process and receive personal credit for its success, operational excellence encourages team members to think of their work as a collective process with collective responsibility for the results. That’s because operational excellence focuses on the success of the organization as a whole rather than individual employees or teams.
To be sure, choosing the right architectures, tools, and people will do much to help maximize the reliability and effectiveness of your software...but those are only part of the solution. You also need a culture of operational excellence to unite your various teams behind a commitment to best practices, continuous improvement, and collective pride in the applications that they build and deploy.
This post originally appeared on devops.com.