I’ve done some work lately with teams that deliver their products in very different ways, and it has me thinking about how much our “best practices” depend on a product’s delivery and operations model. I’ve had a bunch of conversations about this tension
On the one hand, some of the teams I’ve worked with build software services that are developed and operated by the same team, and where the customers (internal or external) directly make use of the operated service. These teams try to follow what I think of as “conventional” SaaS best practices:
- Their development workflow prioritizes iteration speed above all else
- They tend to deploy from HEAD, or close to it, in their source repository
- In almost all cases, branches are short-lived for feature development
- They’ve built good automated test suites and well-tuned CI/CD pipelines
- Releases are very frequent
- They make extensive use of observability tooling, often using third-party SaaS for this
- Fast roll-back is prioritized over perfect testing ahead of time
- While their user documentation is mostly good, their operations documentation tends to be “just good enough” to onboard new team members, and a lot of it lives in Slack
However, we also have plenty of customers who deploy our software to their own systems, whether in the cloud or on-premise. (Some of them don’t even connect to the Internet on a regular basis!) The development workflow for software aimed at these customers looks rather different:
- Deploys are managed by the customer, and release cycles are longer
- These teams do still have CI/CD and extensive automated tests… but they may also have explicit QA steps before releases
- There tend to be lots of longer-lived version branches, and even “LTS” branches with their own roadmaps
- Logging is prioritized over observability, because they can’t make assumptions about the customer tooling
- They put a lot more effort into operational documentation, because most operators will not also be developers
From a developer perspective, of course, this all feels much more painful! The managed service use case feels much more comfortable to develop for, and most of the community tooling and best practices for web development seems to optimize for that model.
But from a sysadmin perspective, used to mostly operating third-party software, the constraints of self-hosted development are all very familiar. And even managed service teams often rely on third-party software developed using this kind of model, relying on LTS releases of Linux distributions and pinning major versions of dependencies.
The biggest challenge I’ve seen, however, is when a development team tries to target the same software at both use cases. As far as I can tell, it’s very difficult to simultaneously operate a reliable service that is being continuously developed and deployed, and to provide predictable and high-quality releases to self-hosted customers.
So far, I’ve seen this tension resolved in three different ways:
- The internal service becomes “just another customer”, operating something close to the latest external release, resulting in a slower release cycle for the internal service
- Fast development for the internal service gets prioritized, with external releases becoming less frequent and including bigger and bigger changes
- Internal and external diverge completely, with separate development teams taking over (and often a name change for one of them)
I don’t really have a conclusion here, except that I don’t really love any of these results. /sigh
If you’re reading this and have run into similar tensions, how have you seen this resolved? Have you seen any success stories in deploying the same code internally and externally? Or alternatively — any interesting stories of failure to share? 😉 Feel free to send me an email, I’d be interested to hear from you.