Moving away from Agile: What's Next – Martin Harrysson & Natasha Maniar, McKinsey & Company
By AI Engineer
Summary
Topics Covered
- AI Creates Uneven Task Impacts
- Tailor Models to Workflow Types
- Abandon Two-Pizza Teams
- Scale with Holistic Measurement
Full Transcript
[music] Good morning. Hello everyone. It's
Good morning. Hello everyone. It's
really great to be here. Uh so I'm Martin and I'm here with my colleague Natasha. Uh we're from a part of
Natasha. Uh we're from a part of Mckenzie. you may may not be as familiar
Mckenzie. you may may not be as familiar with. We have a practice called software
with. We have a practice called software X and we work with uh mostly enterprise clients on how to build better software products which has messed mostly using
AI uh in the in the past couple of years.
Uh and so what our talk is about today is really more focused on the people and the operating model aspects of leveraging AI for software development and and that we believe that that has to
change quite significantly and and that's what we're excited to talk to you about.
If I take a quick step back uh in in time and we just uh you know think through some of these the major technology breakthroughs that we've seen in the last few decades uh they tend to
always come with a paradigm shift in also how we develop software and so I still recall uh almost 20 years ago now I started working as a software engineer
an entry- level developer um in a tech company and the company I was working for was just switching to to agile we were using camb boards we were
uh standups and and other ceremonies.
This was a big change. It was a massive change for the for the company. And now
with everything that is happening happening in AI, we're at the precipice of another such paradigm shift.
And um if we think about some of the um some of the things that are happening um with AI and software development that we've seen
at this um at this conference, there's no doubt that this is a new paradigm that is about us. And so we'll talk about two things. uh we'll first touch a little bit about how do you go from
these things that we're seeing at individual productivity to scaling that to the whole team and what that what type of changes we think that implies and then we'll talk a little bit uh
about how do you scale that across uh a whole organization and to really get get value um if if you sort of I'm talking to an
audience here which is using a agents all the time and I thought if I If I asked you about some examples, I'm sure you could rattle off, you know, 10 different ones where you would say,
"Look, there was this thing that I used to do. It it used to take uh maybe even
to do. It it used to take uh maybe even days and and and hours that are now taking only minutes, right? There's no
shortage of those those stories and you can go over to the expo and and talk to any of the companies there about all these all these great use cases. It
really shows that these tools work and they can be really impactful." And so yet despite seeing you know some of these uh improvement uh improvements
uh we've done some research to gauge you know where are our clients at the moment. We we recently surveyed about
moment. We we recently surveyed about 300 uh companies uh mostly enterprises around what are they seeing in terms of productivity improvements. So you have
productivity improvements. So you have this and then they would say uh on average we're often seeing only 5 10 15% improvements overall as as a company. So
we're in a place where there's a bit of a disconnect between this this big potential uh around AI as uh from the reality
and so we we think that um there is this gap because as we've started implementing AI whether it's um you know coding assistance or whether it's now
using you know you just heard about uh you know how open AI is using agents and more complex uh workflows what has started to emerge is a is a set of bottlenecks
uh that that were not necessarily there before. Like for for example, as we now
before. Like for for example, as we now start moving much faster in certain in certain aspects of the work, uh we haven't really changed how we collaborate among people and and team
members. That's not quite keeping up.
members. That's not quite keeping up.
We started generating way more more code, but we're it's still being reviewed in a in a pretty manual way in in many companies, right? And then we also have this this theme which was
recently highlighted in in even a research report from from Carnegie Melon uh about how all the new code that is being generated is also amplifying uh the generation of tech debt in some in
some cases and actually generating complexity. And so there are these
complexity. And so there are these bottlenecks. They're not impossible to
bottlenecks. They're not impossible to overcome but this is what we believe is limiting uh many companies from seeing the the the real value that that they should be seeing.
Let me talk about maybe just a couple of examples to to make that uh come to life a little bit more. One of the things that we see as a big rate limiter at the
moment is around how work is allocated.
And so what what we've learned over the last couple of years is that the impact from AI and agents is highly uneven.
There are some tasks which where it works amazingly well today and you see u huge improvements and there are others where it it's not as effective and so you have that variability. You also have
variability among people. Some have have uh lots of experience now using these tools and and know how to pick that up and others uh are less experienced.
Right? And so what that means for for team leaders, for engineering managers and so on is it's very highly non-trivial to know how to allocate work
and resources in in a good way. And this
is creating a lot of inefficiencies.
Another example uh is is around how work is being reviewed. So agents are often giving given pretty uh fuzzy uh you know stories that are written in pros with
pretty fuzzy acceptance criteria uh which which means that the code that comes back is not always what it was intended to be and and for many companies the only mechanism to control
that is is often manual review. So
you've automated some things but we've generated more manual review. So these
are some of the some of the examples of uh this bottleneck that we that we see coming up and as mentioned what what has that has
resulted in so far is that most most large companies today uh are are stuck a little bit in in a world of relatively
marginal gains. Uh they're working in
marginal gains. Uh they're working in ways that was developed with constraints that we had in the past paradigm of human development. So you have you you
human development. So you have you you know if you go out to most companies you see 8 to 10 person teams you see working in two week sprints you have all these these elements that were largely parts
of like an of an agile operating uh model and that is and that is uh putting in some some limits to what they can see. Over the past year, we've been
see. Over the past year, we've been working with lots of clients to to sort of break that model a bit uh and develop new ways of of working in smaller teams,
in new roles, uh in with shorter cycles.
And when you do that, we see really great performance improvements. And
that's what gives you gives us this uh path to where we see things are going to improve.
So we realized that rewiring the PDLC is not just a one-sizefits-all solution.
For example, different types of engineering functions across enterprise along the product life cycle may require different operating models based on how humans and agents best collaborate. So
if we take the example of modernizing legacy code bases, this task requires a high context of potentially the entire codebase but also has clearly well-
definfined outputs. So an example
definfined outputs. So an example operating model could look like a factory of agents where humans provide an initial spec and final review with minimal intervention.
For new features for green field and brownfield projects, the operating model may look like an iterative loop because they may benefit from the non-deterministic outputs and increased
variation where agents act as co-creators um providing more options to facilitate faster feedback loops.
So, as we mentioned, we did a survey among 300 enterprises globally to understand what sets these top performers apart. We found that they are
performers apart. We found that they are seven times more likely to have AI native workflows which meant scaling over four use cases across the software development life cycle rather than just
having point solutions for just code review or for just codev. They were also six times more likely to have AI native roles which meant having smaller pods
with different skill sets and new roles.
To enable these shifts, these organizations were investing in continuous and hands-on upskilling, impact measurement, and also incentive
structures to incentivize developers and PMs to adopt AI.
This led to five to six times increase in time to market and delivery speed as well as higher quality and more consistent artifacts.
So when we talk about AI native workflows, we mean that these enterprises are moving away from quarterly planning to continuous planning and also um the unit of work is
moving from storydriven to spec driven development. So that these PMs are
development. So that these PMs are iterating on these specs with agents rather than iterating on long PRDS.
On the talent side, AI native roles essentially means that we're moving away from the two pizza structure to one pizza pods of three to five individuals.
Instead of having separate QA frontend and backend engineers, there are more consolidated roles where product builders are managing and orchestrating agents with full stack fluency and also
better understanding of the full architecture of their codebase. PMS are
starting to create direct um prototypes in code rather than iterating on these long PRDs.
And one example um that we've described in our article, we've studied some AI native startups and realized that they've actually implemented all of these shifts to accelerate their outcomes. And in our article, we've
outcomes. And in our article, we've described how cursor actually operates internally.
But if you're a large enterprise predicated on the agile model, what are some steps you can take? So in in a recent client study with a leading international bank, we tested some team
level interventions to address the bottlenecks previously mentioned before mainly around the sequencing of steps within the agile ceremony and how uh to
define the roles of agents and humans within the sprint cycle. So let's walk through some examples.
First, team leads would assign sprint stories using agents based on the data of the team velocity and delivery history. And then they would create
history. And then they would create co-create multiple prototypes and iterate with agents on the acceptance criteria around security and observability needs to have more
consistent artifacts across teams. This prevents downstream rework that was mentioned before so that developers don't have to constantly be iterating with the agents during during the code
process. The squads were also
process. The squads were also reorganized by workflow. So there would be one which would be focused on um small bug fixes and another focused on
green field development. In the
background agents would be used to look and impact uh look at um the potential cross repository impacts um to prevent
debugging time for developers.
And another example is that instead of for reducing the collaboration overhead and meetings that happen within the sprint cycle, um instead of waiting for data scientists input, PMS would
directly be observing the real-time customer feedback to rep prioritize these features and this would lead to an acceleration in the backlog within the same amount of
time.
So we studied the um impact of these interventions and found high promising results. For example, not just the
results. For example, not just the increase in agent consumption by over 60 times, but there was also an increase in the delivery speed that was tied
directly to the business priorities for this bank. There was a 51% increase in
this bank. There was a 51% increase in code mergers, but also a decrease in um an increase in efficiency.
The other aspect of this is is uh around the different roles and and and the talent model. And so one of the biggest
talent model. And so one of the biggest differentiators that we saw as mentioned was around but you have actually changed the roles that uh that are involved in software development. And so you know
software development. And so you know what what you all are seeing is that engineers are moving away from execution and and just simply writing code to being more of orchestrators and and
thinking through more how to divide up work to agents. for example. And we also heard some examples of how the role of the product manager is changing. And so
while this this may sound, you know, pretty straightforward to many of you here who are who are working with these tools like day-to-day that you have to change what you do, the reality is that
about 70% of the companies that we that we survey have have not changed their roles at all. Right? And so you have this background expectation that people are going to do things differently but uh the the role is still defined in the
same way and it's the same understanding uh as it was you know a couple of years ago.
Um but we are starting to see you know some companies changing this. So this is another example from a from another recent nent client. They were set up in a in a way that is, you know, probably
pretty common for for u many companies and a kind of typical two pizza uh team model with with the types of roles that you would be familiar with. Um the we
ran a bunch of experiments and front runners and and tested new models that were had much smaller pods uh that had uh new roles which consolidated some of
the tasks that were previously done with different roles.
And and so by doing that we could we could create basically more pods or more teams uh with with the same number of people uh but retaining the expectation
that each pod is is uh is um performing at about the same level as as they were before.
And so so we also see really uh really positive results from that uh with with uh maintaining and even improving in some is the quality of the code that was
generated. In particular there was a
generated. In particular there was a there was a high speed up in in terms of uh the output from from the different teams and you can see some of the metrics uh here.
Let's shift gears a little bit and and and go from talking about just the team level. So how does this now scale uh
level. So how does this now scale uh across a big organization?
The reality is that many many companies don't just have like one or two of these these teams but often hundreds of teams even and thousands or even tens of
thousands of people who are working in this way. And uh this is where one of
this way. And uh this is where one of the biggest differences that we that we saw between those that are stuck a bit in the um in in getting only 10% or so
change improvements from those who are seeing outsized improvements is around how you manage that how you manage that change and change management I guess is like one of these a little bit of an
often catch or elusive term for uh for a lot of different things but but I think in some ways it's not a bad way to think about I I usually say that the change management is about getting a lot of
like small things right. And so the crux to like actually scaling this is often about getting 20 30 or even more things right at the same time that involve the
way you communicate uh what this means, the way you incentivize people, uh the way you upskill them, and it all has to come together.
Um and when it when it's not, we we we see what happens. And so this is an example from from another tech company that we worked with um where initially
we're rolling out new AI tools for them that that hit different parts of the product development life cycle. Um we we rolled we rolled out the tools there was some usage but often it dropped off. It
was either not used or it was um it was sort of um used in very suboptimal ways.
So that's the sort of jagged part that you're seeing on the on the left hand side here. despite kind of adding more
side here. despite kind of adding more users uh the overall impact did not change at all. So we had to do a quite a reset and and um start over effectively
reset the expectations. What should what what does this mean if you're a developer dayto-day? What does it mean
developer dayto-day? What does it mean for a PM? Uh we had much more hands-on upskilling. There was could bring your
upskilling. There was could bring your own code. there were, you know, coaches
own code. there were, you know, coaches available, especially those first like few sprints before you get make this a habit and work it into the way that you develop software dayto-day. It's a very
critical time and that's when when this matters a lot. Um, and having a bit of a a measurement system as well, so you know what's changing and and you're able
to to see what's uh what's what what's improving.
Another example just to put this alive a little bit as mentioned like this is about getting a lot of things um right and it's each one of these individually
may not seem like it's the biggest deal uh but put together they really make a make a huge difference like this is for this is some of the top uh interventions that another client had to go through
for them it really helped having you know setting up code labs for example really you know instituting a new set of certifications that helped motivate and and drive people to to change what they
do day dayto-day. And these these things really added up to the change they needed.
>> But building a robust measurement system that prioritizes outcomes and not just adoption is important not just to monitor progress but also pinpoint issues and course correct quickly. So
one surprising result from the survey was that these enterprises that were bottom performers were not even measuring speed and only 10% were measuring productivity.
But our goal is to make our clients top performing organizations. So we've
performing organizations. So we've worked with them to create a holistic measurement system that captures impact all the way down to inputs. So for
inputs this would include the investment into coding tools and other AI tools but also the time and resources in upskilling and change management. These
inputs would lead to direct outputs but a lot of organizations are just focusing on how the increased breath and depth of adoption with of AI tools is leading to increased velocity and capac capacity
increase. However, it's also important
increase. However, it's also important to understand how developers have uh different NPS scores and if they're enjoying their craft more um rather than feeling more frustrated. And it's also
important to understand whether the code is becoming more secure and have has better quality but also more resilient.
And one proxy for resiliency that we used for our client was the meantime to resolve priority bugs.
Now if we look at economic outcomes which is priority for um the seauite executives they look into what is the time to revenue target what is the increased price differential for higher
quality features or expanding the number of customers to meet the feature demand and also what is the cost reduction per pod for reduced human labor in
aggregate. Having these larger economic
aggregate. Having these larger economic outcomes can also lead um to for organizations to understand how there is an increased reinvestment in green field
and brownfield development. But as these tools evolve, the proxies for these metrics will also evolve. But hopefully
this provides a MECI framework as an initial starting point.
So what's next? The future of course is difficult to predict, let alone in the next 5 years. But we hope that with our vision of a new software development model, even as agents increase in their
intelligence and humans become more fluent in AI, that this model still stands. So hopefully this model that
stands. So hopefully this model that includes um shorter sprints, smaller teams, but large u smaller but larger number of teams will set enterprises up
for success in the long term.
>> So just leave you with some some key takeaways. um start now. I would say to
takeaways. um start now. I would say to to our our clients, this is a human change and it takes some times and it's a big change and and it's going to be a
journey and so I think um this is something that everyone needs to go on.
I think it's also important to figure out which model works for you and set a really bold ambition and with that say thank you so much for listening to us and and uh we have an article here if
you're more interested in in the research that we've conducted. Thank you
so much for having us.
>> [music]
Loading video analysis...