Skip to main content
News Plus 13 May 2025 - 10 min read
AMI CPD: 0.5  Share  

'Innovation tripled': Procter & Gamble’s real-world AI trial collapses R&D-commercial silos, gives novices expert capability; marketing org structural rethink incoming

By Andrew Birmingham - Martech | Ecom | CX Editor

Sam Higgins, Simon Belousoff, Ethan Mollick, Lucio Ribeiro, and Scott Brinker

A landmark field study by Procter & Gamble and Wharton Business School run not in a lab, but in the chaotic churn of real workflows suggests a radical rethink of the way generative AI is used is urgently required – with deep implications for organisational structures. GPT-4 didn’t just keep up with seasoned human teams, it outperformed them in speed, output quality, and collaboration. P&G discovered that junior staff wielding AI could deliver at near-expert levels, flattening both learning curves and silo walls between R&D and commercial. Morale lifted and innovation tripled. But it also identified a new risk where staff are happier in their roles, but less trusting of their output. For leaders still viewing AI as a cost-cutting, this should be a wake-up call. Australian business leaders from super fund Rest, research analysts Forrester, and creative agency TBWA say their experiences tracks with the study’s findings.

What you need to know:

  • In a landmark study, researchers, GPT-4-equipped individuals matched the output of full human teams and completed innovation tasks up to 16.4 per cent faster, often with higher-quality results.
  • AI flattened the expertise curve. Junior or inexperienced staff using AI performed at near-expert levels, effectively collapsing traditional learning curves and levelling up organisational capability.
  • Cross-functional silos smashed. AI drove convergence between traditionally divergent R&D and commercial teams, fostering more integrated, hybrid solutions and breaking down internal barriers.
  • Morale up, but self-belief was down: Participants using AI reported higher enthusiasm and lower anxiety, yet paradoxically felt less confident in their outputs, highlighting a need for trust-focused AI training.
  • AI isn’t just a tool, it’s a teammate: The study, embedded in P&G’s live workflow, found that AI improved collaboration, balanced team contributions, and acted more like a cybernetic colleague than a software assistant.
  • Breakthrough ideas tripled. Teams using AI were 3x more likely to deliver top-decile innovation solutions – without extra time, cost or headcount.
  • Real work, real findings: Unlike lab-based experiments, this was a field trial run during P&G’s live upskilling programs, using real teams, products, and Microsoft’s secure GPT-4 environment.
  • High AI input = high human input. Users submitted up to 24 prompts per task, showing AI usage wasn’t passive – it demanded iteration and strategic engagement.
  • Experts argue AI belongs on your headcount – not to replace, but to augment and orchestrate workflows, enabling flexible, swarm-style collaboration over rigid hierarchies.
  • Reaction to the study by Australian business leaders suggests the study underscores a growing divide. Organisations embracing AI are pulling ahead, while those hesitating risk obsolescence. The biggest ROI? Upskilling the middle, not just the elite.

Remarkably, individuals using AI achieved similar levels of solution balance on their own, effectively replicating the knowledge integration typically achieved through team collaboration… AI serves not just as an information provider but as an effective boundary-spanning mechanism.

– Wharton-P&G study

When Procter & Gamble executives, along with researchers from the Wharton Business School including famed AI professor Ethan Mollick held a mirror up to the future of work, the reflection staring back wasn’t human. At least not entirely.

AI matched the brains of real human teams, delivered faster, and somehow made people feel better about their work while doing it. It bulldozed silos between R&D and marketing, and handed junior staff the strategic firepower of seasoned veterans. And the kicker: when paired with people, it delivered more top-shelf ideas than humans flying solo ever could.

Over 776 experienced professionals from across P&G’s commercial and R&D functions were randomly assigned to work individually or in pairs, with or without the assistance of GPT-4, accessed securely via Microsoft Teams through Azure OpenAI. The design was deliberate: replicate the firm’s standard innovation process, down to the tasks, business units (Baby Care, Grooming, Feminine Care, Oral Care), and tooling.

The results of a study built around the daily grind of product innovation at one of the world’s most relentlessly structured companies are outlined in a paper called The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise.

The experiment was led by a heavyweight cast of academics and practitioners from Harvard Business School, the Wharton School, and Procter & Gamble itself. Rather than a theoretical modelling exercise or simulated scenario cooked up in a campus lab, this was a tightly controlled field experiment embedded inside P&G’s actual product development workflow, conducted during a live organisational upskilling program.

Key findings:

  • In tests across product development tasks, individuals using AI produced outcomes that were on par with full teams operating without any digital assistance. In short: AI can mimic the collective horsepower of collaboration with a fraction of the headcount. 
  • AI wasn’t just keeping up. It was accelerating output. Individuals using AI completed tasks 16.4 per cent faster. Even team-based setups saved an average of 12.7 per cent in task time.
  • Those with little to no experience in product development suddenly started performing like grizzled veterans. In effect, AI flattened the experience curve, dragging novices up to expert-level performance. If you’re in learning or talent development, that should be both terrifying and exhilarating.
  • AI also appears to have cracked the goal of cross-functional thinking as well. Without it, R&D and commercial functions defaulted to predictably different solutions. With AI, they converged, producing more integrated, hybridised ideas.
  • Perhaps the most feel-good kicker of the report? Participants using AI weren’t just better – they felt better. Enthusiasm, energy, and excitement all lifted. Anxiety and frustration plummeted. Morale matters – and apparently, so does the machine.
  • If all this weren’t enough to ignite a few org charts, teams using AI were three times more likely to produce top decile-quality solutions. Think about that. Triple the breakthrough ideas, without triple the cost or effort.

But there is also a sting on the tail of that final finding: those same AI-empowered participants were less confident in their outputs. Despite better results, their self-belief lagged, revealing a potentially critical new skilling gap: Knowing how to use AI may not be enough, staff also need to understand it well enough to trust what it tells them.

P&G and Wharton's study also observed a profound shift in the mechanics of collaboration. AI helped level the playing field inside teams, dampening the classic dominance of louder or more senior contributors. Commercial and R&D participants contributed more equally – AI, it seems, is a democratiser.

It's also worth noting that on average, users entered between 18 and 24 prompts per task. That suggests that high AI involvement does not mean passive participation. If anything, the opposite is true and P&G effectively developed a strategic iteration loop between human and machine.

The research team included several senior P&G leaders, noted innovation scholar Karim Lakhani, and Wharton’s Ethan Mollick.  

For Mollick and Lakhani, this new work builds on their findings from a now-famous study in 2023 with Harvard Business School, and the consulting firm BCG. As Mi3 reported, that study found that knowledge workers – in that case, consultants working for BCG – who used ChatGPT-4 significantly outperformed their colleagues on every dimension measured, and no matter how their performance was calculated.

The data in the 2023 study also suggested Large Language Models (LLMs) narrowed the gap between under-performers and high achievers, with the bottom 50 per cent of performers achieving the greatest uplift. 

The earlier paper called Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality – also identifies key risks not just in terms of output, but also with implications for staff development and of course, responsible AI. 

Marketing model impacts

The new study reflects the more nuanced reality that marketing organisations are built on cross-functional teams. For instance, you bring the brand lead, the data analyst, the product owner, and the creative. You get them in a room. You whiteboard. You Slack. Then you schedule three more meetings. Sometimes something great comes out of it. But most of the time, you pay a tax: in coordination, in energy, in speed.

The new experiment appears to demonstrate that generative AI can replicate many of the benefits of team dynamics, without the drag. Professionals working alone with GPT-4 delivered ideas just as strongly as paired humans, often with fewer blind spots and a greater balance between commercial and technical considerations.

The AI wasn’t just faster. It was broader. A commercial lead stopped thinking like a marketer and started pitching viable R&D ideas while an engineer pitched brand strategy as the machine eroded functional myopia.

To successfully use AI, organisations will need to change their analogies. Our findings suggest AI sometimes functions more like a teammate than a tool. While not human, it replicates core benefits of teamwork: improved performance, expertise sharing, and positive emotional experiences.

Ethan Mollick, Associate Professor, The Wharton School

According to Ethan Mollick, Associate Professor at The Wharton School and also regarded as one of the most respected names in generative AI globally, the findings matter because for most organisations, AI has been treated like a glorified spreadsheet – handy for cranking out tasks faster, but not much more.

That framing made sense early on, says Mollick, who described the study and its findings in a blog, One Useful Thing. But as the models mature and the data rolls in, it’s clear users are leaning on AI less for grunt work and more for heavy lifting such as strategic thinking, decision-making, and complex problem solving. That suggests that companies still chasing marginal productivity gains risk missing the plot entirely. Worse, they risk building cultures where employees hoard their AI breakthroughs out of fear they’ll automate themselves out of a job, while leadership misses the far more valuable conversation about how work itself is being redefined.

“To successfully use AI, organisations will need to change their analogies. Our findings suggest AI sometimes functions more like a teammate than a tool. While not human, it replicates core benefits of teamwork – improved performance, expertise sharing, and positive emotional experiences, “ he says.

Per Mollick, “This teammate perspective should make organisations think differently about AI. It suggests a need to reconsider team structures, training programs, and even traditional boundaries between specialties. At least with the current set of AI tools, AI augments human capabilities. It democratises expertise as well, enabling more employees to contribute meaningfully to specialised tasks and potentially opening new career pathways.”

According to Mollick, “The most exciting implication may be that AI doesn't just automate existing tasks, it changes how we can think about work itself. The future of work isn't just about individuals adapting to AI, it's about organisations reimagining the fundamental nature of teamwork and management structures themselves. And that's a challenge that will require not just technological solutions, but new organisational thinking.”

The A(I) Team

Get used to the language of 'teammates' as the tech sector sets out on a mission to anthropomorphise cold analytical code. Despite the risks – AI teammates lack empathy ("open the bay door, Hal") and can not be held accountable – the teammate pitch is gathering pace.

Martech doyen Scott Brinker told Mi3 he's holding fire on the psychological merits for now, although he sees upside on the operational opportunity of robo-team members

"The psychology of this is, admittedly, outside my field of expertise. I do find it interesting that a survey published by Harvard Business Review last month showed that "therapy/companionship" has become the number one gen AI use case for consumers. I'm a little skeptical of that, but it's not unreasonable to see it in the top 10. Is that good or bad? Honestly, I don't know."

However, he added, "My interpretation of the 'AI as a teammate' narrative is that we tend to get the most out of these tools when we talk to them the way we would a human. After all, they were trained on a massive corpus of human communications. So when we explain what we want and have an interactive dialogue with these tools, like we would a human, we tend to get better results."

"It's a very different kind of user interface than anything we've had before – it definitely takes some getting used to!" he acknowledged.

"Companies are trying to encourage people to engage with these tools, to experiment and learn what's possible. And generally, most people are slow to embrace change, especially technological change. "Treat AI as a teammate" is a way to try to overcome that inertia."

I was just onsite with a global professional services business who built a vertical AI solution to identify specific offerings from across a plethora of service lines – from classic consulting to audit - given a handful of client account documents as input. The result was a 99 per cent reduction in the effort taken to analyse a client’s needs, match their services, and produce a custom proposal.

Sam Higgins VP and Principal Analyst, Forrester

The findings come as no surprise to Sam Higgins, VP and principal analyst for Forrester

For many companies, he says, large language models like ChatGPT are mostly applied to for horizontal use cases. "It’s a productivity play, not an efficiency play.”

But he adds, “Once people have that capacity, what we’re finding is that after experiencing the improvement in quality, in decision-making, and in all the ways Mollick describes, people start to say, 'Oh, what if we had a tool like this for what we’re doing here?'”

That’s when it moves into the vertical space.

I was just onsite with a global professional services business who built a vertical AI solution to identify specific offerings from across a plethora of service lines – from classic consulting to audit -  given a handful of client account documents as input. The result was basically a 99 per cent reduction in the effort taken to analyse a client’s needs, match their services, and produce a custom proposal. Not to mention a huge reduction in elapsed time! I was blown away”.

Simon Belousoff, AI product owner for super fund Rest, says the new P&G/Wharton study helps to reframe the conversation about what capability really looks like in the enterprise – and called out the cognitive dissonance playing out in many boardrooms across the country.

“AI is probably already in the top 20 per cent in terms of capability, as a human. But if you give it context and work with it, you're uplifting all the humans around it – especially those who are less capable,” he says.

For any exec still clinging to the belief that their team is top quartile by default, he offers a stark reality check: “Every organisation seems to have a bias when it comes to people. They think they’re good to great. They think they’re top quartile – but they can’t all be. Logically, that’s impossible.”

The implication is sharp: AI doesn’t just replicate capability, it democratises it. In resource-constrained environments, that changes the game. “Typically you're constrained by funding and resources. What AI allows you to do, across all teams, is provide extended support without the contention or bottlenecks you’d have with people.”

Per Belousoff, this isn’t theoretical, it’s operational. Want to run brand-safe creative at scale? Fire up a brand agent. Need an initial legal review? Use a compliance agent. Trying to win over finance for that campaign uplift? Drop in a CFO agent. “You just can’t do that with a people team. You can’t fund that kind of resource,” Belousoff says.

“You want deep personalisation? You can bring in a language translator agent, an accessibility agent, or even a Gen Z TikTok creative agent… stakeholder feedback on demand is part of that," he says nodding to the rise and rise of synthetic customers.

Belousoff reckons it’s not that humans get replaced, it’s that the game state changes before they even get on the field. “There’s probably still a gap, they don’t replace humans, but what they do is ensure that by the time you engage a human, you’re much further along.”

This is not about cost-cutting. It’s about fluidity. It’s about rethinking the nature of collaboration itself.

Lucio Ribeiro, Chief AI and Innovation Officer, TBWA

For Lucio Ribeiro, newly minted Chief AI and Innovation Officer at Omnicom-owned TBWA, the Wharton/P&G study aligns directly with what he is seeing in Australia and globally: “AI is a co-pilot, not an autopilot at this point.”

The nuance matters. Too many executives still treat large language models like vending machines: insert prompt, collect genius. But that’s not how it works, he suggests, and if you’re walking away while the bot "does the work," you're leaving value – and insight – on the table.

“Co-pilot is better than autopilot – that’s number one,” he insists.

But if that sounds like the usual “humans in the loop” chatter, don’t be fooled. Ribeiro is pointing to something more disruptive – and more democratic. The P&G study shows that AI doesn’t just make experts more efficient. It lifts the middle.

“The real focus should be on upskilling the middle, not just the gurus and the model supports that,” he says. “What I’ve found is that studies show the model levels expertise. So, for anyone reading that study – or looking to take advantage of it – the quickest return on investment is giving AI literacy to every rank and file employee, not just the experts.”

Ribeiro believes that one of the most under-appreciated insights in the generative AI hype cycle is that the biggest delta in ROI won’t come from turbocharging the top performers. Instead, it's a function of removing friction at scale. In the P&G experiment, employees outside their core roles matched the performance of traditional expert teams, when paired with AI. Not because they became savants overnight, but because the machine filled in the gaps.

“The real risk now is the widening gap between those using AI and those ignoring it,” says Ribeiro. “The study demonstrates that clearly.”

But where Ribeiro really fires up is when he talks about structural change – the invisible code behind how organisations run. “I’ve been playing around with this idea of redrawing the org chart,” he says. “It’s not about efficiency through firing, and it’s not about eliminating roles. It’s about rethinking how you structure your organisational workforce.”

In other words: AI doesn’t just belong on your tech stack. It belongs on your headcount. Not as replacement, but as augmentation, and orchestration.

“When a chatbot can double as a teammate – like a co-pilot – and also as a coach, it creates an opportunity for workflows to flex,” he says. “You can think about AI bots that come together to swarm a problem and then dissolve afterward, rather than working within a fixed hierarchy.”

That’s not theoretical. It’s exactly what P&G's cybernetic teammate experiment delivered: marketers and technologists combining with generative agents to produce more balanced, more innovative solutions. And then disbanding. Job done.

“This is not about cost-cutting,” Ribeiro concludes. “It’s about fluidity. It’s about rethinking the nature of collaboration itself.”

What do you think?

Search Mi3 Articles