An industry-led initiative backed by major advertising associations and Springboards is launching the world's first benchmark to evaluate large language models (LLMs) for creativity within advertising.
The benchmark, involving organisations such as ACA, 4A's, APG, D&AD, IAA, IPA and The One Club for Creativity, is designed to identify which LLMs provide the most valuable support for creative inspiration, ideation, variation and problem-solving throughout the advertising process.
The new project arises amid growing adoption of AI tools by strategists and creatives, and addresses a recognised need for industry-specific metrics around creative performance. The initiative sets out to distinguish the models that offer not just accurate or logical responses, but genuine creative contribution—a point emphasised by several stakeholders involved in the project.
"Existing AI benchmarks test logic, accuracy, and comprehension," said Pip Bingemann, CEO and Co-founder at Springboards. "But advertising isn't about right answers, it's about originality, insight, and impact. This will be the first benchmark designed around the real creative instincts we value in agencies and brands so that people in creative industries can understand what models are good for the work they do."
The benchmark is the result of collaboration between a technical and research team, including specialist PhDs in machine learning, AI engineers, and former Google researchers. The group has sought to combine quantitative data analysis with qualitative human judgement, to reflect the nuanced and subjective nature of creativity in commercial communications.
Creative instincts at the core
Unlike traditional AI assessments which focus on accuracy and correctness, the new evaluation centres on criteria directly relevant to creative output. This includes human judgement of idea quality, the degree of creativity shown (both in insight and more unconventional 'wild' ideas), the variance or originality of output, and problem-solving abilities that test thinking beyond conventional confines. The results of LLMs are compared both by human judges and through AI, to assess how closely model outputs align with genuine human taste.
An interactive approach is also planned for the benchmarking process. Global participants will review and judge AI-generated ideas through an interface described as "Tinder for Ideas", while receiving personal feedback about their own creative preferences, as well as recommendations for LLMs best suited to their style.
Industry leaders on the benchmark
Industry executives from supporting organisations have underlined the importance of the benchmark, particularly as agencies worldwide look to integrate AI in more meaningful and effective ways within their creative workflows.
"This benchmark is an exciting moment for our industry," said Tony Hale, CEO, Advertising Council Australia. "Harnessing the potential of LLMs to complement and elevate creative thinking is critical. We're proud to help lead this industry-first initiative, one that's not just pushing boundaries, but shaping the future of creativity itself."
Tom Roach, Vice President Brand Strategy at Jellyfish/Brandtech and speaking on behalf of the APG, highlighted the current gap in resources for the creative sector:
"There's a ton of stuff about how good different LLMs are at solving maths problems, coding and logic-based tasks and tests. But that's not much use for the creative industries. We need to know which models are best at generating new and original creative ideas, but at the moment we're all flying blind on that. It's exciting to be involved in a project that can help us be more creative with this incredible new technology and help the industry as a whole make better informed decisions on the AI tech we use to drive better results for our clients."
Zoe Scaman, Founder of Bodacious, addressed the benchmarking project's potential to clarify how AI contributes to creative processes:
"We're at a crossroads in the creative industries. AI is already embedded in our workflows, shaping ideas and generating outputs, but we've had no meaningful way to assess its contribution to creativity itself. This benchmark changes that. It's not about pitting machines against humans, but about understanding which tools can genuinely elevate our thinking, unlock new directions, and push creative boundaries. If we want to harness AI not just for efficiency but for originality, we need standards rooted in real creative instincts, not just logic and grammar. That's what this initiative delivers. And that's why it's vital, especially now."
James Hurman, Founding Partner of Previously Unavailable, also commented on the value of AI for stimulating human creativity:
"The future of creativity is human – but AI's ability to stimulate human creativity is exciting. Especially when we can learn which models produce the kind of interesting, divergent, and sometimes bonkers provocations that trigger our own individual creative brains in just the right way."
Participant involvement
The study is currently open to participants, who are invited to judge the outputs from different LLMs and contribute to the development of the benchmark. The results are intended not only to inform best practice within the advertising and marketing sectors, but also to provide practical insights for individuals seeking to enhance creative output using AI tools.