Nielsen Norman Group: The methodological problems hiding in your research tools

Published in March 2026 on Nielsen Norman Group, this article by senior researcher Maria Rosala argues that a structural problem has followed UX research tools since their earliest versions: the people building these platforms often lack deep research expertise. As tools have grown more capable — progressing from simple test hosting in the mid-2000s, to analysis and repository features in the mid-2010s, to AI-driven study planning and moderation today — that gap has become more consequential.

Context

Rosala traces three generations of research tool development. First came unmoderated testing platforms like UserTesting and Userlytics, which made remote research accessible but introduced habits like running usability tests without a moderator. Then came analysis repositories such as Dovetail and EnjoyHQ, which helped teams organize qualitative data at scale. Now, AI-powered systems can recruit participants, generate study materials, moderate sessions, and produce insight reports with minimal human input. The problem in each generation is the same: technical capability outpaces methodological rigor.

Key problems identified

The article identifies three recurring failure modes across widely used tools.

The first involves missing quantitative features. UserTesting’s interaction test, intended for quantitative benchmarking, lacks task randomization and does not support multiple success URLs — both standard requirements for controlled measurement. Without task randomization, order effects can skew results in ways that are invisible in the final report.

The second concerns analysis tools that cannot work with video directly. Dovetail allows tagging only of transcripts, not of video footage. In usability research, critical moments frequently happen without accompanying speech — a hesitation, a misclick, a return to a previous screen. These moments are lost when tools force analysts to work only with text.

The third is the persistent conflation of user interviews with usability tests. Many platforms describe or label usability tasks as “interviews,” reinforcing a conceptual confusion that produces studies designed for the wrong purpose. Asking open-ended interview-style questions during a task-based test, or vice versa, undermines the validity of both methods.

AI makes the stakes higher

The article gives specific examples of AI-generated tasks that introduce bias. A task created by TheySaid asked participants to “imagine you are interested in improving skills in Information Architecture” — phrasing that primes users toward the site’s own terminology rather than letting them arrive at it naturally. Userology generated a task that directed users explicitly to a “Consulting” section, removing the navigation challenge that the test was presumably meant to evaluate.

When flawed methodology is baked into a template or an AI-generated default, it scales. A single researcher making a methodological mistake affects one study. A tool making the same mistake shapes how hundreds of teams across thousands of studies understand what good research looks like.

Who this is useful for

The article is relevant to anyone who commissions or runs research using commercial platforms — from junior researchers who learned the field through tool documentation, to ResearchOps professionals evaluating vendor claims about AI features. Rosala’s recommendations include vetting AI-generated study materials against independent methodological sources, piloting tools with trained researchers before rolling them out to broader product teams, and treating tool certifications as only one input in researcher training, not a substitute for methodological education.

For research leaders deciding which platforms to adopt or extend with AI capabilities, this piece offers a grounded framework for asking the right questions before the contract is signed.