How a Competitive Benchmark Transformed Microsoft’s iPad Office Suite Strategy

As a UX researcher and Microsoft, I led a large-scale benchmarking study to evaluate the ease of use of Microsoft Word, Excel, and PowerPoint on iPad compared to competitors. I partnered with product, design, and fellow researchers over four months to produce a quantifiable baseline of usability across apps. The results directly influenced product roadmaps and development priorities for the next fiscal year.

Note: Insights and deliverables are confidential, but please reach out if you have any questions.

How do the M365 apps on iPad compare to competitors, and how can we improve our apps to beat the competition?

At the time of this study, iPad usability had become a renewed focus due to its growing use in commercial settings. Our team needed a clear comparison of our iPad apps (Word, Excel, and PowerPoint) to competitors to identify where to invest for maximum impact.

I designed a competitive benchmark to measure usability at scale, gather supporting qualitative insights, uncover UX issues and opportunities, and inform the product roadmap.

Image generated with ChatGPT

I used a mixed-methods approach in a large-scale benchmark, enabling both deep insights and broad comparative data across apps.

I began with quick foundational interviews to understand how commercial workers and executives use Word, Excel, and PowerPoint on iPad. These interviews helped identify common use cases and important tasks in commercial settings, many of which aligned with the tasks we planned to test in the benchmark study.

Once we had a deeper understanding of iPad use, we ran a large-scale benchmark study comparing 10 apps (3 Microsoft, 7 competitors) with 200 participants (20 per app). Each participant was assigned one app and asked to complete a set of tasks using that app, then rate the ease of use of each task. We scoped to the most relevant apps for commercial users and designed tasks grounded in real-world, high-value scenarios. Collaboration with product management ensured our benchmark reflected priority workflows and stakeholder needs.

Expand each section to learn about my rationale for key decisions when designing this study.

Methods

  • We recruited commercial iPad users with varying levels of app familiarity and accessory use, and balanced demographics to ensure realistic and representative data.

  • Dscout allowed for remote, flexible task-based research with precise participant targeting and screen recording capabilities.

  • Early in scoping, we decided to start with foundational interviews to deeply understand how commercial iPad users work, then followed with a competitive benchmark to measure usability, balancing depth of insight with the need for comparative data.

  • As a team, we aligned on direct competitors that are widely used in commercial settings and recognized for strong usability, providing a high bar we aimed to meet or exceed.

  • To reduce participant fatigue and maintain data quality while covering the most critical and high-priority workflows, we worked with PM to reduce a list of 30+ tasks down to 20 per app.

I paired quantitative and qualitative analysis to triangulate ease-of-use metrics with rich human insights to identify pain points and understand how our apps stack up to competitors.

Quantitative analysis answered: How does the ease of use of our apps compare to competitors?

  • Measured the average ease of use score (on a scale of 1-5) per task per app with confidence intervals

  • Created color-coded scorecards to show performance relative to competitors

  • Split the ease of use data by accessory use to evaluate how keyboard use affected scores

Qualitative analysis answered: Why did apps receive the scores they did?

  • Reviewed user videos from low-performing tasks to identify specific UX issues

  • Analyzed a random sample of videos from high-scoring tasks to confirm success or surface hidden issues

  • Coded themes and pain points within apps

I synthesized ease of use scores, participant reflections, and insights from the foundational interviews to explain which apps were perceived as easiest to use on iPad and why, as well as identify UX issues and areas of opportunity in our apps.

Analysis

Findings shaped the next fiscal year's priorities and empowered the team with clear, actionable UX insights.

Deliverables included a cross-app issue tracker, detailed scorecards, and a comprehensive report. These directly influenced the product roadmap.

  • Issue Tracker: 50+ UX issues organized by app, issue type, severity, task, and theme; included links to participant video clips and engineering tickets

  • Scorecards: Quantitative ease of use breakdown across apps and tasks

  • Research Report: Key insights, deep dives on lowest-scoring tasks, design recommendations

We shared insights ahead of the planning cycle and worked with PM to prioritize the most impactful issues affecting ease of use. From there, we identified 10 top-priority issues, and improvements to the iPad apps are in progress.

Findings & Impact

This study pushed my ability to manage scale, prioritize ruthlessly, and adapt to complexity.

This project was an exercise in balancing research rigor with constraints. Recruitment proved challenging; we ideally wanted a sample of 400, but quickly learned that it's difficult to find commercial iPad users through traditional UX research platforms! Next time, I’d consider tapping market research panels to better reach hard-to-find participants. Additionally, there was a lot of data to sort through; I learned to trust a focused analysis plan and prioritize stakeholder needs to distill the most important stories from a sea of findings.

What went well:

  • Strong alignment with product and design

  • Successful hybrid method execution

  • High stakeholder engagement

What I’d do differently:

  • Explore broader recruitment channels for niche user groups

  • Increase the sample size and apply inferential statistical tests to determine whether observed differences in ease-of-use scores between apps are statistically significant

Reflection

Next
Next

Driving Product Decisions: How UX Research Improved App Integrations in Chrome for iOS