Inside Microsoft’s sprint to integrate OpenAI’s GPT-4 into its ‘365’ app suite

 

By Mark Sullivan

“3-16.” 

It’s a term that’s been uttered thousands of times within the conference rooms and hallways at Microsoft over the past few months. It refers to March 16, the day the company announced that it had brought generative AI models codeveloped with OpenAI into its Microsoft 365 productivity suite. Under the new initiative, every app, from Outlook to Word to Teams, will have a generative AI-powered “Copilot.” The technology is currently being tested by 20 or so Microsoft corporate customers.

Building an AI assistant into that many apps is a big job, but one that Microsoft wanted to do quickly. OpenAI’s release of ChatGPT last November took the world by storm, set off an AI arms race, and accelerated everybody’s timelines for releasing new generative AI products and features. Not even Microsoft, which had already been working with—and investing in—OpenAI, was immune.

Microsoft design chief Jon Friedman, who led the product design of Copilot in Microsoft 365, says the project demanded long hours and many working weekends for hundreds of Microsoft employees—including designers, engineers, product managers, marketing people, data scientists, ethics teams, and others—over a period of “several months.” The initiative also gave people the sense that they had to suppress their egos and work together, Friedman says, to build something so big in such a short amount of time.

“There was this excitement that we could go do something together really bold and big,” Friedman says. “While we had a lot of experience with AI, this particular thing [generative AI] was more capable, so I think that got everybody into this learner’s mindset.”  

A new UX

Friedman says Copilot is a pioneer in “conversational UX,” by which he means a totally new type of user interface that calls on different, and more powerful, resources. The assistant represents “a new frontier of user interface design as paradigm changing as the first touchscreen devices,” he says. Copilot relies on OpenAI’s GPT-4 large language model, which is pretrained on mountains of internet content, but it also can access business data from the Microsoft Graph, so that it can generate things like email content and meeting summaries.

The design challenge was figuring out how and when to expose this new AI assistant within the context of the work people typically do in go-to apps like Word and Powerpoint. 

The “copilot” concept—that a familiar AI assistant would be available within and across every productivity app—was “a loose intent” at the beginning of the design process, Friedman says. But it began to solidify as the design team learned more about how the AI assistant would likely be used in real businesses. Discovering these use cases—tasks where the AI could demonstrably save the user time or spur their creativity in some way—was the very first step in the UX design process.  

The people who understand those use cases best are the engineers, product managers, designers, and computer scientists that work within the vertical product groups for each productivity app. Friedman’s design group works with all of them. At the start of the Copilot project, he asked all of these product teams to brainstorm likely use case scenarios for generative AI within their app. His group then established a special, horizontal design team to work with all these groups on the presentation of Copilot within each app.   

As the use cases began to gell within the app groups, the horizontal design team began to notice commonalities—use cases for AI that were relevant across multiple apps, Friedman says.  

“So it was like, OK, for this [type of] meeting Copilot would be really valuable . . . what kind of specific features would that need?” Friedman says, recounting the thought process. “How can the new generative language model help us do a better job of that same task within email summarization?”

As these cross-app use cases became more apparent, the horizontal design group began to feel that the presentation of the AI features didn’t have to be different for every app.

“Because you have people . . . trying to look across scenarios while they’re happening and doing this sort of broad sense-making like, ‘ah, there’s this thing emerging’.” 

 

They began to conceive of a design framework where a common assistant could work in several different but predictable ways within the apps.

One Copilot, three altitudes

Friedman’s design group generated a deep library of documents meant to help designers across the project create entry points for the AI within a given app. They guided designers on how Copilot appear within the context of different tasks users might engage in. “There was this notion that Copilot should show up at the right altitude for the right job,” Friedman says.  

The design framework specified that Copilot would display within the app’s UX in one of three ways.  

The first was an immersive experience where the assistant seems to focus on a business initiative rather than a specific app, and in fact can pull data or insights from multiple apps in service of the task at hand. For example, Copilot might glean project milestones or risks from Teams meetings, presentations, or email threads, then summarize those in a project planning document. 

This “immersive” mode is Copilot’s most ambitious role within the productivity suite, and potentially the most impactful. It may also help adress a long-standing knock on Microsoft’s productivity suite—that its constituent apps aren’t integrated tightly enough, points out Enderle Group principal analyst Rob Enderle. Enderle says this may be because Microsoft originally acquired the apps from other companies and they didn’t share a common code base. Copilot may act as an “overlay” atop all of the apps that at least creates the appearance that the user can make the apps work together in the service of some business task.

The second kind of presentation is “assistive,” meaning that Copilot acts something like a “sidecar” that rides along with the user within a specific app, helping them get the most out of the app’s functionality, Friedman says. In PowerPoint it might show the user how the app’s deep graphical features could be applied to tell the story of certain set of data. In Outlook it might help the user understand the most important points in an email thread. In Word, it might provide feedback on how a document could be better written, or conform better to a certain writing style.  

Inside Microsoft’s sprint to integrate OpenAI’s GPT-4 into its ‘365’ app suite | DeviceDaily.com

In Copilot’s “embedded” presentation the AI can act in a generative and creative capacity within apps. The AI might, for example, present as a small pop-up within the text of a Word document. “It’s like this in-the-flow experience,” Friedman says. “When you’re in deep work Copilot can help you with writer’s block or help you get a cold start on something (a slide deck in PowerPoint, perhaps).” 

The horizontal Copilot design team began describing the work in terms of the “three altitude” concept, and people started buying in, Friedman says, starting with people working on Copilot within the vertical app teams. 

“We shared this sort of framing with Satya [CEO Nadella] and others in the senior leadership team and it kind of clicked for people—this idea that it’s this one thing, but it adapts to you at these three different altitudes of work.” 

The “Copilot” concept wasn’t exactly invented for Microsoft 365. The name was originally adopted by (Microsoft-owned) GitHub in 2021 for its coding assistant, which is also powered in part by OpenAI large language models. But Microsoft’s work to create a consistent Copilot assistant that performs some standardized functions across a diverse suite of productivity apps is new work. That uniformity is likely to ease the friction that long-time users of the productivity apps will experience when “Microsoft 365 with Copilot” finally goes to general availability. 

It also suggests that the Copilot brand and concept could extend to other Microsoft user interfaces such as the Windows OS or even LinkedIn as generative AI works its way further into the company’s consumer and enterprise products.

“Satya is a big fan of this name because it aptly describes what Copilot does,” Friedman says. “It’s there to assist you and keep you in the pilot seat across many tasks . . .”

Fast Company

(32)