Since late 2023, when the buzz of generative AI promise reached its initial peak, a fascinating debate has been taking place within the insight industry. One that has worried just as many researchers as it has enticed. It is, of course, the viability and implications of synthetic data.
Synthetic data is, in short, computer-generated (rather than human-generated) information that has been initially used to train AI models, but is now seeping into real-world application. As opposed to its human-generated counterpart, synthetic data can be produced on-demand, at near unlimited scale and to precise specifications.
The implications for market research almost speak for themselves. And fear is a natural reaction. Especially as theoretical applications creep closer to common practice. As just one example, by early 2024 Wharton scholars were able to build a US automotive industry perceptual map with astonishing accuracy when compared against a simultaneous survey of 530 consumers.
As many marketers are aware, perceptual maps are both important tools and notoriously difficult to build. Few besides brand managers working with household names have had the luxury of building one. The effects of this application alone have the potential to be transformative. Instantaneous access to accurate market analyses levels the playing field between established brands and challengers – those with big budgets and those without. Well, assuming synthetic data isn’t gated by prohibitively expensive commercial models.
Tweet This | |
The synthetic data promise of on-demand access to accurate insight is an alluring one - leveling the playing field between established brands and challengers. |
At this point, it’s tempting to get sucked into a world of tech-driven hyper-speculation. But despite how clear it might seem; the future remains unpredictable. So instead, I want to offer a few creative ways researchers and marketers can engage with synthetic datasets today – before exploring what the frontiers of this AI revolution means for the profession.
Applications of Synthetic Data in 2024
As a marketer, the possibilities for how synthetic data could change the profession feel almost endless. But before we examine some of the most interesting use cases, let’s pause to remember there is a limit. Synthetic data may be able to replicate aggregates extremely well – but it is no replacement for real experiences. Which means your CSAT, customer experience and voice of customer programmes (among others) should remain firmly backed by real-world data.
With that caveat in mind, let’s take a look at three ways you can enhance existing research programmes with synthetic data today:
- Interactive Personas: A key marketing asset that tend to remain static for years at a time. The reluctance to update personas is based both on the fact they change slowly and adoption can be a glacial process. But this static nature is a key weakness – offering little opportunity to re-communicate them across the business. Training AI models on existing personas provides the opportunity to synthesise data on how that persona might react to current events, key issues and potential action in real-time.
- Accessible Trendspotting: Identified as a key driver of insight team success by Debra Walmsley on a recent MRX Lab episode, trendspotting requires huge volumes of data and predictive analysis. By acting as an automated synthesis of online content and surfacing through anonymised summaries – synthetic data is capable of making trendspotting programmes accessible to many brands.
- Exploratory Perception Maps: Okay, so maybe this point is cheating given the attention synthetic perception maps have received since Mark Ritson’s Marketing Week coverage. But to tentatively add a build on the proven possibilities, by reducing the research lead time to near zero – synthetic data may make it possible to treat perception mapping as an exploratory exercise. It’s previously been crucial to narrow down axis labels and dimensions prior to fielding (because you typically only get one shot at such big-budget, time-sensitive projects). But with the capacity to explore hundreds of dimensions and maps in seconds, it’s possible to be much more experimental and explore values that may have otherwise been discarded.
In all of these examples, we’re not just witnessing AI enable faster and cheaper insight, but significant adaptations in the relationship businesses have with the research activities they engage in. Which leads to the final question we’ll explore today…
What is a Market Researcher in the Age of Synthetic Data?
Of course, it’s important to first state that, no – synthetic data will not be the end of market research and insight as a discipline. Just as the spreadsheet did not cause the death of accounting, AI will not be the end of research. It is, ultimately, a tool to be used by people to achieve the goals we set. What it should do, however, is prompt a healthy debate about what the role of a researcher is.
It's easy to boil down the profession to a collection of process-oriented skillsets. Data collection. Survey writing. Report writing. These are all things that can be automated. So, let’s take a step back and ask more plainly – what is the purpose of a research function?
There are two important answers to this. The first is simply: to ask questions. The most successful research teams aren’t those that are simply conveyor belts for brand trackers and service satisfaction. No, they have outgrown the role of order taker and become strategic advisors to their stakeholders. This means knowing which questions are commercially important to the business, building relationships and practices that cement insight as a decision-making tool, and blending data with strategic, contextual recommendations. These innately human aspects of the profession which directly suffer from automation.
Tweet This | |
Synthetic data represents a tool which researchers can use to automate the operational process of research, empowering a faster transition from order takers to strategic advisors. |
The second is to be a source of competitive advantage. We need to remember that business doesn’t happen in a bubble. As tools like generative AI and synthetic data become available to us – they become equally accessible to our competitors. And in a world of perfect information, there would be no competition. But that’s not the world we live in.
Competitive advantage is an advantage won by those able to ask the right questions to the right people and use the answers to inform the right decisions better than others. More advanced tools that speed up, or even trivialise, the operational process of research make little difference to that fundamental truth.
A View from the Experts
In compiling this article, I reached out to two prominent thinkers on AI in market research. Mike Stevens is the Founder & Editor of Insight Platforms, and organiser of The AI Summit for Research, Insights and Experience Management. He believes the synthetic data revolution can be understood across four categories of research (in which the majority of present day research questions fall into categories one and two, but category three is also growing fast). They are:
- Category 1 - Synthetic as a complement to primary data - a useful early stage for hypothesis building or interim validation of Qual etc. The blend of synthetic / primary will change for different briefs and evolve over time as LLMs improve relevance / decency of training data.
- Category 2 - Synthetic as a substitute for primary data. Some research questions have been asked so many times that the answers are clear and primary data is unnecessary. Even for new, worthwhile questions, synthetic will give as valid an answer as primary.
- Category 3 - Synthetic as a data source where primary research was never even on the table, for reasons of business budget, time or ignorance.
- Category 4 - Synthetic data as an inferior alternative to primary data, and should be avoided: getting deep qualitative insights into users’ implicit attitudes, motivations, behaviours; or finding golden nuggets of insight to spark innovation, campaign ideas, and more.
Phil Sutcliffe, Managing Partner of Nexxt Intelligence, highlights how synthetic data can help with many of the operational challenges that researchers face today - such as ensuring sample quality or augmenting primary data. He posed the following as an example of this in practice, "If you consider a sample that with 50 men and 150 women, weighting it to be representative with current approaches means doubling the responses from each of the 50 men. However, an approach where synthetic data simulates responses from an additional 50 ‘men’ could well be more reliable."
Preparing for the Future of Research
So, finally, how should insight teams prepare for a future in which the use of generative AI and synthetic data are not just likely, but inevitable? I believe the core change lies in embracing systems stewardship. Put simply, this means acknowledging that AI and synthetic data will accelerate the democratisation of research, enabling just about anyone in your organisation to do their own research.
As an insight function, that means your team won’t just be responsible for producing insight – but shaping the systems, governance and decision-making cultures that ensure it remains a competitive advantage, not a liability.