Synthetic Data and Digital Twins in Marketing Research: Promise, Pitfalls, and Practicality

November 5, 2024

For various reasons, marketing researchers are increasingly exploring synthetic data and digital twins. This paper examines the distinctions between synthetic data, particularly synthetic respondents, and digital twins, as well as their potential applications in marketing research.

While synthetic data, synthetic respondents, and digital twins offer promising applications in marketing research, researchers must first assess whether they need synthetic data or digital twins and then consider whether these technologies have reached sufficient maturity to be effective. Like many emerging tools in the marketing research technology space, these solutions are offered by numerous startups, not all of which fulfill a clear customer need or ensure successful implementation.

What Are Synthetic Data and Synthetic Respondents?

Synthetic data is artificially generated through methods like AI (e.g., Generative AI or Natural Language Processing), statistical modeling, or algorithm-based simulation software. Unlike data collected from human respondents, synthetic data is not derived from real human interactions. Instead, it is designed to reflect human patterns and characteristics, making it resemble data that people might generate in real-world scenarios.

Synthetic respondents, or “synths,” are artificially created avatars that can simulate traditional human interactions within a marketing research study. They can serve as proxies for human respondents, enabling data collection and testing when access to a robust respondent pool is limited.

What Are Digital Twins?

While digital twins could be considered a subset of synthetic data, they are more accurately viewed as a technology that generates synthetic data within a simulated environment. A digital twin is a virtual representation of a real-world object, system, or process. In marketing and marketing research, digital twins focus on representing individuals. For example, a digital twin might emulate a person’s demographics, psychographics, geographic location, and physical characteristics to test how that “twin” might respond to marketing stimuli.

The Promise of Synthetic Data and Respondents and Digital Twins in Marketing Research 

Both synthetic respondents and digital twins can enhance marketing research by enabling data collection when a viable respondent pool is unavailable or insufficient. They also hold potential for applications such as market testing, simulating customer journeys, assessing channel effectiveness, and evaluating new concepts.

Synthetic data and respondents lend themselves well to predictive analytics, personalization, and customer segmentation. Digital twins are proving promising for simulating consumer behavior, refining product development, optimizing marketing channels and campaigns, and conducting scenario analysis.

The Deficiencies in Marketing Research Addressed by Synthetic Data and Digital Twins

Aside from being intriguing new technologies, synthetic data and digital twins offer tangible benefits that address specific challenges in marketing research. Here are key reasons for considering their use instead of solely relying on traditional approaches:

Key reasons synthetic respondents would be considered include:

  • Compensating for hard-to-reach or fragmented respondents pools
  • Filling in the gaps in datasets to enhance data completeness
  • Ensuring time efficiency
  • Lowering costs
  • Reducing the need for sensitive or personally identifiable information (PII)
  • Achieving scalability

Key reasons digital twins might be used include

  • Eliminating  the need for sensitive or personally identifiable information (PII)
  • Enhancing time and cost efficiency
  • Providing real time monitoring
  • Relieving consumers of the need to participate in research

A Study Comparing Synthetic Versus Human Respondents

GroupSolver, an AI-driven platform for marketing research, recently conducted a study comparing responses from 310 human respondents with those from 310 synthetic lookalikes. To create the synthetic lookalikes, they developed personas based on the human respondents’ profiles. Following data collection, the study revealed that synthetic respondents: 

  • Performed reasonably well on multiple-choice questions but struggled with questions requiring logical reasoning
  • Gave logical but overly obvious answers to open-ended questions, lacking the nuanced and unexpected responses that human participants provided
  • Produced responses that lacked depth and breadth

While the synthetic panel GroupSolver used may not represent the most advanced synthetic respondents available, the study highlights potential challenges and limitations in current synthetic respondent technology. However, as more data from human respondents becomes available for creating refined synthetic avatars, it is likely that synthetic respondents will more closely emulate human responses over time.

The Debate About Synthetic Respondents and Digital Twins

Within marketing research circles, debate persists around the utility and practicality of synthetic respondents and digital twins. Practically speaking, unless there is a compelling reason to use synthetic respondents or digital twins, the industry still gathers data effectively from human participants despite challenges such as a decline in survey participation and rising rates of online survey fraud due to use of bots. Human respondents inherently offer superior reliability, authenticity, and nuanced responses compared with artificial alternatives, and there is a risk of bias introduced by synthetic respondents. Given that the use of synthetic respondents and digital twins is still in its infancy, current engagement methods often lack the technological sophistication needed for consistent accuracy.

In addition to their benefits, synthetic respondents and digital twins come with important considerations around data quality, ethical transparency, and human oversight. These tools require rigorous validation and continuous improvement to ensure data accuracy, as they lack the full complexity of human emotions and cultural context. Ethical use demands transparency, avoiding over-reliance to prevent potential biases or misrepresentation of insights. Human researchers remain essential for interpreting and aligning synthetic insights with real-world nuances.

At best, synthetic respondents and digital twins should be seen as complementary to human respondents. Their outputs should be carefully scrutinized, and the limitations of the data should be clearly acknowledged and documented.

In Conclusion

As marketing researchers continue to adapt to changing technology and consumer engagement trends, synthetic data, synthetic respondents, and digital twins offer promising avenues for innovation. These tools enable researchers to overcome traditional limitations, such as limited respondent pools, privacy concerns, and cost barriers, while providing new ways to test, simulate, and predict market behaviors. However, as with any emerging technology, their application requires caution, a critical eye, and ongoing evaluation. Synthetic respondents and digital twins hold significant potential but are not yet replacements for the depth and authenticity provided by human respondents.

Ultimately, synthetic data and digital twins are likely to become valuable complements to traditional research methodologies, expanding researchers’ ability to generate insights without compromising quality or integrity. By balancing human insight with artificial simulations, marketing researchers can harness the strengths of both to create richer, more reliable research outcomes that drive meaningful marketing strategies.

Kirsty Nunez is the President and Chief Research Strategist at Q2 Insights a research and innovation consulting firm with international reach and offices in San Diego. Q2 Insights specializes in many areas of research and predictive analytics, and actively uses AI products to enhance the speed and quality of insights delivery while still leveraging human researcher expertise and experience.