LongCut logo

Moving away from Agile: What's Next – Martin Harrysson & Natasha Maniar, McKinsey & Company

By AI Engineer

Summary

Topics Covered

  • AI Creates Uneven Task Impacts
  • Tailor Models to Workflow Types
  • Abandon Two-Pizza Teams
  • Scale with Holistic Measurement

Full Transcript

[music] Good morning. Hello everyone. It's

Good morning. Hello everyone. It's

really great to be here. Uh so I'm Martin and I'm here with my colleague Natasha. Uh we're from a part of

Natasha. Uh we're from a part of Mckenzie. you may may not be as familiar

Mckenzie. you may may not be as familiar with. We have a practice called software

with. We have a practice called software X and we work with uh mostly enterprise clients on how to build better software products which has messed mostly using

AI uh in the in the past couple of years.

Uh and so what our talk is about today is really more focused on the people and the operating model aspects of leveraging AI for software development and and that we believe that that has to

change quite significantly and and that's what we're excited to talk to you about.

If I take a quick step back uh in in time and we just uh you know think through some of these the major technology breakthroughs that we've seen in the last few decades uh they tend to

always come with a paradigm shift in also how we develop software and so I still recall uh almost 20 years ago now I started working as a software engineer

an entry- level developer um in a tech company and the company I was working for was just switching to to agile we were using camb boards we were

uh standups and and other ceremonies.

This was a big change. It was a massive change for the for the company. And now

with everything that is happening happening in AI, we're at the precipice of another such paradigm shift.

And um if we think about some of the um some of the things that are happening um with AI and software development that we've seen

at this um at this conference, there's no doubt that this is a new paradigm that is about us. And so we'll talk about two things. uh we'll first touch a little bit about how do you go from

these things that we're seeing at individual productivity to scaling that to the whole team and what that what type of changes we think that implies and then we'll talk a little bit uh

about how do you scale that across uh a whole organization and to really get get value um if if you sort of I'm talking to an

audience here which is using a agents all the time and I thought if I If I asked you about some examples, I'm sure you could rattle off, you know, 10 different ones where you would say,

"Look, there was this thing that I used to do. It it used to take uh maybe even

to do. It it used to take uh maybe even days and and and hours that are now taking only minutes, right? There's no

shortage of those those stories and you can go over to the expo and and talk to any of the companies there about all these all these great use cases. It

really shows that these tools work and they can be really impactful." And so yet despite seeing you know some of these uh improvement uh improvements

uh we've done some research to gauge you know where are our clients at the moment. We we recently surveyed about

moment. We we recently surveyed about 300 uh companies uh mostly enterprises around what are they seeing in terms of productivity improvements. So you have

productivity improvements. So you have this and then they would say uh on average we're often seeing only 5 10 15% improvements overall as as a company. So

we're in a place where there's a bit of a disconnect between this this big potential uh around AI as uh from the reality

and so we we think that um there is this gap because as we've started implementing AI whether it's um you know coding assistance or whether it's now

using you know you just heard about uh you know how open AI is using agents and more complex uh workflows what has started to emerge is a is a set of bottlenecks

uh that that were not necessarily there before. Like for for example, as we now

before. Like for for example, as we now start moving much faster in certain in certain aspects of the work, uh we haven't really changed how we collaborate among people and and team

members. That's not quite keeping up.

members. That's not quite keeping up.

We started generating way more more code, but we're it's still being reviewed in a in a pretty manual way in in many companies, right? And then we also have this this theme which was

recently highlighted in in even a research report from from Carnegie Melon uh about how all the new code that is being generated is also amplifying uh the generation of tech debt in some in

some cases and actually generating complexity. And so there are these

complexity. And so there are these bottlenecks. They're not impossible to

bottlenecks. They're not impossible to overcome but this is what we believe is limiting uh many companies from seeing the the the real value that that they should be seeing.

Let me talk about maybe just a couple of examples to to make that uh come to life a little bit more. One of the things that we see as a big rate limiter at the

moment is around how work is allocated.

And so what what we've learned over the last couple of years is that the impact from AI and agents is highly uneven.

There are some tasks which where it works amazingly well today and you see u huge improvements and there are others where it it's not as effective and so you have that variability. You also have

variability among people. Some have have uh lots of experience now using these tools and and know how to pick that up and others uh are less experienced.

Right? And so what that means for for team leaders, for engineering managers and so on is it's very highly non-trivial to know how to allocate work

and resources in in a good way. And this

is creating a lot of inefficiencies.

Another example uh is is around how work is being reviewed. So agents are often giving given pretty uh fuzzy uh you know stories that are written in pros with

pretty fuzzy acceptance criteria uh which which means that the code that comes back is not always what it was intended to be and and for many companies the only mechanism to control

that is is often manual review. So

you've automated some things but we've generated more manual review. So these

are some of the some of the examples of uh this bottleneck that we that we see coming up and as mentioned what what has that has

resulted in so far is that most most large companies today uh are are stuck a little bit in in a world of relatively

marginal gains. Uh they're working in

marginal gains. Uh they're working in ways that was developed with constraints that we had in the past paradigm of human development. So you have you you

human development. So you have you you know if you go out to most companies you see 8 to 10 person teams you see working in two week sprints you have all these these elements that were largely parts

of like an of an agile operating uh model and that is and that is uh putting in some some limits to what they can see. Over the past year, we've been

see. Over the past year, we've been working with lots of clients to to sort of break that model a bit uh and develop new ways of of working in smaller teams,

in new roles, uh in with shorter cycles.

And when you do that, we see really great performance improvements. And

that's what gives you gives us this uh path to where we see things are going to improve.

So we realized that rewiring the PDLC is not just a one-sizefits-all solution.

For example, different types of engineering functions across enterprise along the product life cycle may require different operating models based on how humans and agents best collaborate. So

if we take the example of modernizing legacy code bases, this task requires a high context of potentially the entire codebase but also has clearly well-

definfined outputs. So an example

definfined outputs. So an example operating model could look like a factory of agents where humans provide an initial spec and final review with minimal intervention.

For new features for green field and brownfield projects, the operating model may look like an iterative loop because they may benefit from the non-deterministic outputs and increased

variation where agents act as co-creators um providing more options to facilitate faster feedback loops.

So, as we mentioned, we did a survey among 300 enterprises globally to understand what sets these top performers apart. We found that they are

performers apart. We found that they are seven times more likely to have AI native workflows which meant scaling over four use cases across the software development life cycle rather than just

having point solutions for just code review or for just codev. They were also six times more likely to have AI native roles which meant having smaller pods

with different skill sets and new roles.

To enable these shifts, these organizations were investing in continuous and hands-on upskilling, impact measurement, and also incentive

structures to incentivize developers and PMs to adopt AI.

This led to five to six times increase in time to market and delivery speed as well as higher quality and more consistent artifacts.

So when we talk about AI native workflows, we mean that these enterprises are moving away from quarterly planning to continuous planning and also um the unit of work is

moving from storydriven to spec driven development. So that these PMs are

development. So that these PMs are iterating on these specs with agents rather than iterating on long PRDS.

On the talent side, AI native roles essentially means that we're moving away from the two pizza structure to one pizza pods of three to five individuals.

Instead of having separate QA frontend and backend engineers, there are more consolidated roles where product builders are managing and orchestrating agents with full stack fluency and also

better understanding of the full architecture of their codebase. PMS are

starting to create direct um prototypes in code rather than iterating on these long PRDs.

And one example um that we've described in our article, we've studied some AI native startups and realized that they've actually implemented all of these shifts to accelerate their outcomes. And in our article, we've

outcomes. And in our article, we've described how cursor actually operates internally.

But if you're a large enterprise predicated on the agile model, what are some steps you can take? So in in a recent client study with a leading international bank, we tested some team

level interventions to address the bottlenecks previously mentioned before mainly around the sequencing of steps within the agile ceremony and how uh to

define the roles of agents and humans within the sprint cycle. So let's walk through some examples.

First, team leads would assign sprint stories using agents based on the data of the team velocity and delivery history. And then they would create

history. And then they would create co-create multiple prototypes and iterate with agents on the acceptance criteria around security and observability needs to have more

consistent artifacts across teams. This prevents downstream rework that was mentioned before so that developers don't have to constantly be iterating with the agents during during the code

process. The squads were also

process. The squads were also reorganized by workflow. So there would be one which would be focused on um small bug fixes and another focused on

green field development. In the

background agents would be used to look and impact uh look at um the potential cross repository impacts um to prevent

debugging time for developers.

And another example is that instead of for reducing the collaboration overhead and meetings that happen within the sprint cycle, um instead of waiting for data scientists input, PMS would

directly be observing the real-time customer feedback to rep prioritize these features and this would lead to an acceleration in the backlog within the same amount of

time.

So we studied the um impact of these interventions and found high promising results. For example, not just the

results. For example, not just the increase in agent consumption by over 60 times, but there was also an increase in the delivery speed that was tied

directly to the business priorities for this bank. There was a 51% increase in

this bank. There was a 51% increase in code mergers, but also a decrease in um an increase in efficiency.

The other aspect of this is is uh around the different roles and and and the talent model. And so one of the biggest

talent model. And so one of the biggest differentiators that we saw as mentioned was around but you have actually changed the roles that uh that are involved in software development. And so you know

software development. And so you know what what you all are seeing is that engineers are moving away from execution and and just simply writing code to being more of orchestrators and and

thinking through more how to divide up work to agents. for example. And we also heard some examples of how the role of the product manager is changing. And so

while this this may sound, you know, pretty straightforward to many of you here who are who are working with these tools like day-to-day that you have to change what you do, the reality is that

about 70% of the companies that we that we survey have have not changed their roles at all. Right? And so you have this background expectation that people are going to do things differently but uh the the role is still defined in the

same way and it's the same understanding uh as it was you know a couple of years ago.

Um but we are starting to see you know some companies changing this. So this is another example from a from another recent nent client. They were set up in a in a way that is, you know, probably

pretty common for for u many companies and a kind of typical two pizza uh team model with with the types of roles that you would be familiar with. Um the we

ran a bunch of experiments and front runners and and tested new models that were had much smaller pods uh that had uh new roles which consolidated some of

the tasks that were previously done with different roles.

And and so by doing that we could we could create basically more pods or more teams uh with with the same number of people uh but retaining the expectation

that each pod is is uh is um performing at about the same level as as they were before.

And so so we also see really uh really positive results from that uh with with uh maintaining and even improving in some is the quality of the code that was

generated. In particular there was a

generated. In particular there was a there was a high speed up in in terms of uh the output from from the different teams and you can see some of the metrics uh here.

Let's shift gears a little bit and and and go from talking about just the team level. So how does this now scale uh

level. So how does this now scale uh across a big organization?

The reality is that many many companies don't just have like one or two of these these teams but often hundreds of teams even and thousands or even tens of

thousands of people who are working in this way. And uh this is where one of

this way. And uh this is where one of the biggest differences that we that we saw between those that are stuck a bit in the um in in getting only 10% or so

change improvements from those who are seeing outsized improvements is around how you manage that how you manage that change and change management I guess is like one of these a little bit of an

often catch or elusive term for uh for a lot of different things but but I think in some ways it's not a bad way to think about I I usually say that the change management is about getting a lot of

like small things right. And so the crux to like actually scaling this is often about getting 20 30 or even more things right at the same time that involve the

way you communicate uh what this means, the way you incentivize people, uh the way you upskill them, and it all has to come together.

Um and when it when it's not, we we we see what happens. And so this is an example from from another tech company that we worked with um where initially

we're rolling out new AI tools for them that that hit different parts of the product development life cycle. Um we we rolled we rolled out the tools there was some usage but often it dropped off. It

was either not used or it was um it was sort of um used in very suboptimal ways.

So that's the sort of jagged part that you're seeing on the on the left hand side here. despite kind of adding more

side here. despite kind of adding more users uh the overall impact did not change at all. So we had to do a quite a reset and and um start over effectively

reset the expectations. What should what what does this mean if you're a developer dayto-day? What does it mean

developer dayto-day? What does it mean for a PM? Uh we had much more hands-on upskilling. There was could bring your

upskilling. There was could bring your own code. there were, you know, coaches

own code. there were, you know, coaches available, especially those first like few sprints before you get make this a habit and work it into the way that you develop software dayto-day. It's a very

critical time and that's when when this matters a lot. Um, and having a bit of a a measurement system as well, so you know what's changing and and you're able

to to see what's uh what's what what's improving.

Another example just to put this alive a little bit as mentioned like this is about getting a lot of things um right and it's each one of these individually

may not seem like it's the biggest deal uh but put together they really make a make a huge difference like this is for this is some of the top uh interventions that another client had to go through

for them it really helped having you know setting up code labs for example really you know instituting a new set of certifications that helped motivate and and drive people to to change what they

do day dayto-day. And these these things really added up to the change they needed.

>> But building a robust measurement system that prioritizes outcomes and not just adoption is important not just to monitor progress but also pinpoint issues and course correct quickly. So

one surprising result from the survey was that these enterprises that were bottom performers were not even measuring speed and only 10% were measuring productivity.

But our goal is to make our clients top performing organizations. So we've

performing organizations. So we've worked with them to create a holistic measurement system that captures impact all the way down to inputs. So for

inputs this would include the investment into coding tools and other AI tools but also the time and resources in upskilling and change management. These

inputs would lead to direct outputs but a lot of organizations are just focusing on how the increased breath and depth of adoption with of AI tools is leading to increased velocity and capac capacity

increase. However, it's also important

increase. However, it's also important to understand how developers have uh different NPS scores and if they're enjoying their craft more um rather than feeling more frustrated. And it's also

important to understand whether the code is becoming more secure and have has better quality but also more resilient.

And one proxy for resiliency that we used for our client was the meantime to resolve priority bugs.

Now if we look at economic outcomes which is priority for um the seauite executives they look into what is the time to revenue target what is the increased price differential for higher

quality features or expanding the number of customers to meet the feature demand and also what is the cost reduction per pod for reduced human labor in

aggregate. Having these larger economic

aggregate. Having these larger economic outcomes can also lead um to for organizations to understand how there is an increased reinvestment in green field

and brownfield development. But as these tools evolve, the proxies for these metrics will also evolve. But hopefully

this provides a MECI framework as an initial starting point.

So what's next? The future of course is difficult to predict, let alone in the next 5 years. But we hope that with our vision of a new software development model, even as agents increase in their

intelligence and humans become more fluent in AI, that this model still stands. So hopefully this model that

stands. So hopefully this model that includes um shorter sprints, smaller teams, but large u smaller but larger number of teams will set enterprises up

for success in the long term.

>> So just leave you with some some key takeaways. um start now. I would say to

takeaways. um start now. I would say to to our our clients, this is a human change and it takes some times and it's a big change and and it's going to be a

journey and so I think um this is something that everyone needs to go on.

I think it's also important to figure out which model works for you and set a really bold ambition and with that say thank you so much for listening to us and and uh we have an article here if

you're more interested in in the research that we've conducted. Thank you

so much for having us.

>> [music]

Loading...

Loading video analysis...