Agile Estimation – Hierarchy and Maturity Matter

Agile Estimation – Hierarchy and Maturity Matter

You are currently viewing Agile Estimation – Hierarchy and Maturity Matter

If there were one topic that seems to dominate all others in agile software development, I’d say its estimation. 

Every team seems to struggle with it and I’m always being asked for “silver bullet” solutions that might make it easier. And of course, there are none.

I do think there is some guidance I can provide that might help you to improve your estimation understanding, confidence, and results. Or at least that’s my intent. So let’s try…

The first thing I’d like to make clear is the nature of most agile estimation & planning approaches. I like to think of them as a 2-phased approach versus the 1-phase, plan everything down to the single molecule in advance approaches typically used in waterfall projects.

The two phases are:

Phase 1 – Backlog estimation

  • Estimating stories (elements) in the Product Backlog;
  • Using some sort of relative sizing mechanism (story points, Fibonacci, etc.);
  • Velocity (sprint output in points averaged across last 3 sprints) is used for release forecasting;
  • Priorities change so backlogs have a tendency to evolve. So re-estimate whenever you feel you have new & better information;
  • Commitments are not made until the team has done some form of more advanced planning or estimation.

Phase 2 – Sprint estimation

  • Often stories are broken down into tasks during sprint planning;
  • Tasks are quite often estimated in hours;
  • There is often a loose relationship between time (hours) and points that can be learned or compared. For example: if you have 2-5 point stories and one took 25 hours to complete and the other 250 hours, then you probably wrongly estimated the points for one of those stories;
  • Tasks ultimately end-up with an owner or owners – who normally determines their estimate for the work.

There is an iterative learning loop-back between the two phases, so that the team gets better at estimating at both levels, which narrows the variance or any gross inconsistencies between them.

I was with a client the other day and we were discussing their need for a “standard” estimation method for their teams. They are implementing SAFe, which makes two critical recommendations:

  1. Estimate using story points and;
  2. Normalize the estimates across all teams.

While the latter has garnered the most pushback in the community, both are somewhat prescriptive.

What I normally see, and try to emphasize in my coaching is more of a hierarchical approach to estimation units. That is, one size doesn’t fit all.

A more common estimation units’ approach follows and it aligns with the 3-tier scaling models (Ex: SAFe) that are so popular today:

Tier-1 (Portfolio)

  • Purpose: to get fast, very high level sizing so that executives can do road-mapping, business case investment analysis, valuation, priority ordering, negotiation, etc.
  • Usually the units try to abstract the estimators from the details so that they feel “comfortable” providing estimates.
  • Example units include T-shirt sizing and (# teams * # sprints or releases).
  • These estimates cannot equate to date/client commitments. Period!
  • Usually the terms EPIC or PROGRAM EPIC are used to capture the work at this level and the epic usually crosses multiple release boundaries (I.e., it needs to be decomposed into FEATURES)

Tier-2 (Program)

  • Purpose: to breakdown and fit work into a release tempo. To deploy customer usable / valuable features. To gain feedback. To deliver a customer release.
  • Usually the units try to (1) be finer grained than the ones used at Tier-1, but also provide more finely grained usefulness.
  • A very common example of units at this level is T-Shirt sizing that equates to Story Points. For example: Small = 50 points, Medium = 80 points, and Large = 100 points.
  • These estimates, pre-release planning & commitment, cannot equate to date/client commitments. Period!
  • Usually the terms FEATURE or THEME are used to capture the work at this level. The features or themes usually are sized to fit within a singular release.
  • IF you’re using SAFe, then these are features that are the hand-off mechanism to the teams to perform PSI or PI planning.

Tier-3 (Team – Execution)

  • Purpose: for the team to estimate work into sprint tempo. Usually at this level velocity is discovered, measured, and maintained.
  • Usually the units are even more finely grained. For example, the stories need to “fit” into each teams sprint length.
  • T-Shirt sizing can be used at this level, but the units are quite small. For example: Small = 1-2 points, Medium = 3 points, and Large = 5 points. The ranges usually try to maintain some sort of “relativeness”. Various forms of Fibonacci are extremely common. (see: )
  • At this level of granularity, commitment surfaces. Once the team breaks features down into stories in PI or release-level planning, they exit with a solid sense for what they are capable of delivering in singular release.
  • Backlog Refinement is an ongoing activity at this level, so stories are still being “worked” until the team feels they are READY for execution within a future sprint

To be clear, it’s incredibly common for organizations to get a sense for velocity at the bottom two levels.

Example of Program + Execution level planning

At a company I was with from 2009 – 2012, we had ~ 12 Scrum teams whose sprint velocity averaged ~25 points across all 10 teams.

Our “Release Train” had a 4, 2-week sprint + 1, 1-week hardening sprint tempo; so releases were ~ 10 weeks in duration.

Therefore, for a given release, our Product Owner team had ~1,200 points to work with.

We split that into ~1000 points for features and ~200 points for infrastructure, bugs, refactoring, automation, and other “internal investments”.

The teams would be refining their backlogs in advance of each release. The individual Product Owners would “present” FEATURES to the teams and they would apply program level estimation in T-shirt sizing. This would give the product team high-level sizing in order to make a decision on (1) what features would make the “cut” for the next release, and (2) how to split features if they were larger than a single release.

Once features were sized to fit, there would be a transition from the program-level to the team-level, which included:

  • Release Planning
  • Feature decomposition into individual stories
  • Refinement of the stories
  • Sprint planning & execution

As an adjunct to the estimation hierarchy discussion, I also think the topic of how much measurement guidance each agile organization should be giving is relevant.

If you follow the Scaled Agile Framework, then the guidance is quite prescriptive, it is 1 Story Point = 1 Ideal Day. There is no hierarchy to the recommendations, as the points simply scale up for larger Epics and down for smaller Stories. Here are the core references to support this:

Other approaches aren’t really recommended or supported. The essential focus of SAFe is to have every team estimate in common units. The why here is so that x-team estimation and forecasting can be effective.

But in doing so, they break one of the core concepts of story points, which is that they are intended to be unique per team. That is – uniquely calibrated, estimated, measured velocity per team. And they also start looking like traditional waterfall planning approaches, which can send the wrong message to the teams.

Certainly we don’t want to go back to the “padding estimates Upward” and “stripping estimates Downward” game that was so prevalent in historical software groups?

I’ve taken a much softer approach to this problem with teams. First, I ask everyone to use the same scales or approach. If the teams want to use Fibonacci, then we all use it. Or T-shirt sizes, then that’s fine as well. The key it so be consistent across the organization.

The next step is aligning everyone without forcing folks to estimate exactly the same way. We don’t need exactness, just that the estimates are in the “ballpark” or vary slightly. I’ve found that using reference stories and asking the coaches (Scrum Masters, Product Owners, and coaches) to help the teams ALIGN their estimates is possible without compromising the uniqueness of the teams’ estimates.

For instance, in my example above the teams varied in velocity from 22 points to 29 points per sprint. From my perspective, we were IN the ballpark for forecasting. We then leveraged ~25 points per sprint as our forecasting multiplier, knowing that we had some variance below it. In fact, we made the teams fully aware of this and our planning approaches.

I talk about this approach in a pair of related articles here:

My intent in writing this article was to explore some of the hierarchical nature of agile estimation. Yes, consistency is important. But also teams need the freedom to estimate without feeling constrained.

One of the most important aspects of agile at scale, is getting this balance right. I hope this has helped your thinking on that matter.

Stay agile my friends,


After I wrote this article, I noticed this one by Marty Bradley. I think it nicely compliments my thoughts. Here’s a link:

Leave a Reply