Simplifying complexity

As organizations evolve and software grows, so does complexity. Learning how to identify and curb complexity is a core skill to develop as an engineering manager.

Unconstrainted complexity inevitably results in suboptimal outcomes, such as:

  • A slowdown in a team's ability to accurately assess a problem or deliver a solution
  • A slowdown in the time it takes to onboard new engineers
  • An increase in incidents and rework
  • A drop in morale or an increase in attrition

It's impossible to eliminate complexity: some problem spaces are just naturally complex, and there's no way around it. The trick for an engineering manager is to manage complexity, which means identifying, measuring, and simplifying where possible.

Talking about complexity

Complexity can be found in many forms. It's helpful to be able to identify which aspect of complexity we intend to simplify.

Aspects and measurements

  • Cognitive load: characterized by an overwhelming diversity of detailed tasks that also increases context switching and few opportunities for deep work or system optimization, resulting in generally slow delivery and low morale. Measure with developer surveys, 1:1s, and onboarding metrics for new hires.
  • Process complexity: characterized by too many cooks in the kitchen, too many meetings, too many required signoffs, slow decision-making, unmet requirements, and frequent change orders.
  • Codebase complexity: characterized by slow builds, slow tests, reduced cycle time on code changes, reduced cycle time on code reviews, and increased rework or deployment rollback. Measure using maintainability index scoring using an index that works for the stage and engineering goals of the organization.
  • System complexity: characterized by no single person knowing how the system works, no known success metrics, or low-quality metrics; quality or performance feedback often comes from end-users, and debugging or root cause analysis requires multiple people and a significant investment. Measure with build times, time it takes from code written to code in production, time for functional tests to run, rework ratios, and other DORA-inspired metrics.

Assessment and Approaches

The Cynafin framework is helpful for leaders to assess the operating context to take appropriate actions quickly. The four quadrants are Simple, Complicated, Complex, and Chaotic, where simple is characterized as apparent cause-and-effect relationships where correct answers are based on facts and easily verified, and things become much less linear from there. I won't go into the details of each quadrant's characteristics (feel free to read the link or check out the wiki page – it's excellent). But mapping complexities with this framework has often led me to the following actionable behaviours:

  • Reduce Chaotic domains to Complex by drawing out a signal from noise by introducing Observability and collaborative Event Storming.
  • Reduce Complex domains to Complicated by introducing abstractions, boundaries, patterns, and workflows.
  • Make Complicated domains Simple by introducing or improving tooling, access to specialists, or load-shedding via a specialized team to handle inherently complex systems that are core to the organization's revenue streams.
  • Eliminate Simple tasks via automation or outsourcing.

Conclusion

  • Divide and Conquer: Complexity in software is typically managed by a "divide and conquer" approach, which can be applied at any level of granularity when considering software systems.
  • Abstractions, Interfaces and Boundaries: introduce smaller cognitive loads with specialization and bounded contexts modelled on supporting the business's current and future revenue streams. Move teams to support the bounded contexts, and have those teams own the architecture that they depend on and document with C4 patterns.
  • Refactoring to Design Patterns: Refactor complicated codebase towards named design patterns to increase understandability, maintainability, and flexibility.
  • Introductions of Frameworks: If you're noticing the same sorts of use cases come up frequently, introducing a framework can make a huge impact, although it might require an up-front investment, so be prepared with a considered plan when proposing to your team.  
  • Enabling Teams: A temporary enabling team to aid in strategic refactoring or short-term organizational projects such as GDPR, security improvements, etc. Enabling teams are also great for tackling technical debt or building technical surplus to unlock additional product velocity. They can also be transitioned easily to permanent developer experience teams if desired.
  • Complex Subsystem Teams: Load-shed core complexity from stream-aligned product teams with the introduction of a team designed to handle a particularly complicated area.

Bonus points:

  • Establish baseline performance measurements with SLAs, SLOs, and SLIs for services in production: Identifying baseline performance metrics enables a clear focus, especially when things are overwhelming. Knowing the performance and expectations of production services results in understanding the capacity for taking on new objectives and making wise decisions. For example, reducing complexity might be more critical when SLAs are unmet.
  • Establish high-level goals with Objectives and Quantitative Key Results: Identify big goals and ensure they can be achieved incrementally (i.e. a mix of leading and lagging metrics enables progress can be measured throughout development and not just at the very end). Working with Product Stakeholders to identify the right success metrics early results in a much simpler development process and managed expectations.