How a Competitive Benchmark Transformed Microsoft’s iPad Office Suite Strategy

As a UX researcher and Microsoft, I led a large-scale benchmarking study to evaluate the ease of use of Microsoft Word, Excel, and PowerPoint on iPad compared to competitors. I partnered with product, design, and fellow researchers over four months to produce a quantifiable baseline of usability across apps. The results directly influenced product roadmaps and development priorities for the next fiscal year.

Note: Insights and deliverables are confidential, but please reach out if you have any questions.

How do the M365 apps on iPad compare to competitors, and how can we improve our apps to beat the competition?

At the time of this study, iPad usability had become a renewed focus due to its growing use in commercial settings. Our team needed a clear comparison of our iPad apps (Word, Excel, and PowerPoint) to competitors to identify where to invest for maximum impact.

I designed a competitive benchmark to measure usability at scale, gather supporting qualitative insights, uncover UX issues and opportunities, and inform the product roadmap.

Image generated with ChatGPT

I used a mixed-methods approach in a large-scale benchmark, enabling both deep insights and broad comparative data across apps.

I began with quick foundational interviews to understand how commercial workers and executives use Word, Excel, and PowerPoint on iPad. These interviews helped identify common use cases and important tasks in commercial settings, many of which aligned with the tasks we planned to test in the benchmark study.

Once we had a deeper understanding of iPad use, we ran a large-scale benchmark study comparing 10 apps (3 Microsoft, 7 competitors) with 200 participants (20 per app). Each participant was assigned one app and asked to complete a set of tasks using that app, then rate the ease of use of each task. We scoped to the most relevant apps for commercial users and designed tasks grounded in real-world, high-value scenarios. Collaboration with product management ensured our benchmark reflected priority workflows and stakeholder needs.

Expand each section to learn about my rationale for key decisions when designing this study.

Methods

We recruited commercial iPad users with varying levels of app familiarity and accessory use, and balanced demographics to ensure realistic and representative data.
Dscout allowed for remote, flexible task-based research with precise participant targeting and screen recording capabilities.
Early in scoping, we decided to start with foundational interviews to deeply understand how commercial iPad users work, then followed with a competitive benchmark to measure usability, balancing depth of insight with the need for comparative data.
As a team, we aligned on direct competitors that are widely used in commercial settings and recognized for strong usability, providing a high bar we aimed to meet or exceed.
To reduce participant fatigue and maintain data quality while covering the most critical and high-priority workflows, we worked with PM to reduce a list of 30+ tasks down to 20 per app.

I paired quantitative and qualitative analysis to triangulate ease-of-use metrics with rich human insights to identify pain points and understand how our apps stack up to competitors.

Quantitative analysis answered: How does the ease of use of our apps compare to competitors?

Measured the average ease of use score (on a scale of 1-5) per task per app with confidence intervals
Created color-coded scorecards to show performance relative to competitors
Split the ease of use data by accessory use to evaluate how keyboard use affected scores

Qualitative analysis answered: Why did apps receive the scores they did?

Reviewed user videos from low-performing tasks to identify specific UX issues
Analyzed a random sample of videos from high-scoring tasks to confirm success or surface hidden issues
Coded themes and pain points within apps

I synthesized ease of use scores, participant reflections, and insights from the foundational interviews to explain which apps were perceived as easiest to use on iPad and why, as well as identify UX issues and areas of opportunity in our apps.

Analysis

Findings shaped the next fiscal year's priorities and empowered the team with clear, actionable UX insights.

Deliverables included a cross-app issue tracker, detailed scorecards, and a comprehensive report. These directly influenced the product roadmap.

Issue Tracker: 50+ UX issues organized by app, issue type, severity, task, and theme; included links to participant video clips and engineering tickets
Scorecards: Quantitative ease of use breakdown across apps and tasks
Research Report: Key insights, deep dives on lowest-scoring tasks, design recommendations

We shared insights ahead of the planning cycle and worked with PM to prioritize the most impactful issues affecting ease of use. From there, we identified 10 top-priority issues, and improvements to the iPad apps are in progress.

Findings & Impact

This study pushed my ability to manage scale, prioritize ruthlessly, and adapt to complexity.

This project was an exercise in balancing research rigor with constraints. Recruitment proved challenging; we ideally wanted a sample of 400, but quickly learned that it's difficult to find commercial iPad users through traditional UX research platforms! Next time, I’d consider tapping market research panels to better reach hard-to-find participants. Additionally, there was a lot of data to sort through; I learned to trust a focused analysis plan and prioritize stakeholder needs to distill the most important stories from a sea of findings.

What went well:

Strong alignment with product and design
Successful hybrid method execution
High stakeholder engagement

What I’d do differently:

Explore broader recruitment channels for niche user groups
Increase the sample size and apply inferential statistical tests to determine whether observed differences in ease-of-use scores between apps are statistically significant

Reflection

How a Competitive Benchmark Transformed Microsoft’s iPad Office Suite Strategy

How do the M365 apps on iPad compare to competitors, and how can we improve our apps to beat the competition?

I used a mixed-methods approach in a large-scale benchmark, enabling both deep insights and broad comparative data across apps.

Who should we recruit?

What platform enabled unmoderated testing at scale?

When did we decide to combine foundational and benchmark research?

Where did we focus our competitive comparison?

Why did we limit the task lists to 20 per app?

I paired quantitative and qualitative analysis to triangulate ease-of-use metrics with rich human insights to identify pain points and understand how our apps stack up to competitors.

Findings shaped the next fiscal year's priorities and empowered the team with clear, actionable UX insights.

This study pushed my ability to manage scale, prioritize ruthlessly, and adapt to complexity.

Driving Product Decisions: How UX Research Improved App Integrations in Chrome for iOS