A common project management criticism is that since story points vary across teams, there is no way to ascertain one teams progress with respect to another. Amongst Agilists there is a general consensus that comparing velocity across teams is an anti pattern and is best avoided lest the overall productivity should suffer.
Sterling Barton suggested that the team velocity dependent on various factors. Velocity can be summarized as a function of
function of (sprint length, team makeup, sizing nomenclature, product)
All these factors change across teams and hence there is no way that the velocity of the teams can be compared.
Likewise, Danilo Sato mentioned that since velocity is dependent on so many factors, hence it makes sense to only compare velocity within a team. That too should be a trend comparison to gauge the progress of the team. He mentioned,
“Why is Team A slower than Team B?”Maybe because they estimate in different scales? Maybe their iteration length is different? Maybe the team composition is different? So many factors can influence velocity that it’s only useful to compare it within the same team, and even then just to identify trends. The absolute value doesn’t mean much.
A further drawback, as mentioned by Bob Hartman, of comparing team velocities is that the teams would start changing their story points scale so that their velocity looks better than the comparison.
Suddenly what was a size 1 last iteration is now a size 3 (or worse!). Don’t fall into this trap. If teams are working hard, meeting their iteration objectives and keeping the product owner happy I don’t care if their velocity is 10 or 10,000.
However, most managers would argue that there should be some mechanism to baseline the story points across teams for a valid comparison. Mike Cohn attempted to arrive at this common baseline for story points by getting a broad group of individuals across teams together and estimating a dozen product backlog items. Mike conducted this session with 46 people. He added,
When that meeting was over, each pair of estimators went back to their teams with twelve estimates. Those estimates could then be used as the basis for estimating future work. As each team estimated new product backlog items they would do so by comparing them to the initial 12 plus any estimates that had been produced since (by them or any other team).
Mike was however quick to mention one of the pitfalls of this exercise. He mentioned that once the teams are compared to each other, they respond to the peer pressure by gradually inflating the story points that they assign to stories.
Consider, for example, a team that is arguing over whether a particular story should be estimated at 5 or 8 points. If the team is under pressure (real or just perceived) to increase velocity they will be more likely to assign the 8. The next story the team considers is slightly larger. They compare it to the newly assigned 8 and decide to give it a 13. Without pressure to improve velocity, this same team may have given the first item a 5 and the second (slightly larger still) item an 8. In this one scenario the team has inflated their points from 5+8=13 to 8+13=21, or more than 50%.
Mike does advocate the creation of a common baseline, however he also warns the managers to be cautious and be on a constant lookout for scenarios in which the story points might be inflated.
Dave Nicolette mentioned an interesting example on how the story points might vary across teams.
How many Elephant Points are there in the veldt? Let's conduct a poll of the herds. Herd A reports 50,000 kg. Herd B report 84 legs. Herd C reports 92,000 lb. Herd D reports 24 head. Herd E reports 546 elephant sounds per day. Herd F reports elephant skin rgb values of (192, 192, 192). Herd G reports an average height of 11 ft. So, there are 50,000 + 84 + 92,000 + 24 + 546 + 192 + 11 = 142,857 Elephant Points in the veldt. The average herd has 20,408.142857143 Elephant Points. We know this is a useful number because there is a decimal point in it.
He added that if these story points were now plotted on a graph then going by the numbers it would be easy to fire Herds D & G and reward the Herds A & C.
Thus, in most circumstances it would be a futile exercise to compare velocity across teams. If somehow, there is a consensus built across teams for assigning story points then one might attempt to compare but that too with a great deal of caution.