Skip to main content

Humans and AI Safety Testing

July 11, 2025, by Rebecca Balebako, CEO BrandWorthy.AI

As organizations integrate advanced AI systems into their business, ensuring their safety, reliability, and robustness becomes a critical priority.  However, AI testing remains a complex and evolving field. Key challenges that businesses face when considering how to run AI testing in practice are:

  • What expertise is required to ensure secure application?
  • How much testing is enough and how much will it cost?  
  • Testing and security frameworks often rely on the LLMs or AI themselves to judge the safety.  Are the AI’s guarding themselves correctly? 
  • How can businesses decide between existing testing frameworks, human testers or both?

We gathered 20 experts on an April morning in 2025 to discuss these questions.  The workshop combined interactive attempts to “hack” an AI judge, and smaller group discussions centered on practical themes that Swiss companies will face when testing their AI systems.

The goals of the workshop were to explore three challenges:

  1. AI safety discussions are fragmented by varying terminology and approaches across security, legal, and operational domains.  We convened a multidisciplinary group of experts in AI security and safety to foster a cross-domain discussion on AI testing.
  1. The challenge lies in validating AI’s self-judgment and understandingwhen AI judgement aligns with human intelligence.  We evaluated two AI-as-judge testing frameworks against human expectations by using a real-world use case and a live jailbreaking challenge.
  1. For businesses, the new field of AI safety testing presents a challenge in determining what to test and who should conduct it.  We discussed practical AI testing challenges for Swiss businesses through breakout groups focused on metrics, expertise, and opportunities.

We addressed each of these goals through an interactive group activity, followed by breaking out into three smaller groups. 

Workshop Outcomes and Decisions

Multidisciplinary Collaboration

Experts from various fields, including security engineers, lawyers, project managers, and model builders, engaged in open and productive discussions on AI testing. The diversity of perspectives was particularly valuable in highlighting different aspects of AI security and safety.  Attendees were largely industry professionals who are engaging in technical safety and security from various angles and for different companies.  Startups, medium-sized businesses, and larger B2C companies were represented.

AI-as-Judge Evaluation:

Through the use case of a cosmetic surgeon chatbot, the workshop demonstrated that existing AI-as-judge frameworks may not always align with human judgment. Multiple instances were found where chatbots gave responses that humans considered “bad” but were not flagged by AI evaluation systems. This highlights the need for supplementing AI evaluations with human feedback.

Breakout Group Findings and Decisions:

  • Metrics for AI Risk Assessment: Participants emphasized the importance of aligning AI behavior with “what a reasonable person would expect.” This requires teams to carefully examine the AI’s data, analyze target users, and clearly define the scope of testing. Standards can serve as a benchmark for these expectations.
    • Opportunities and Strategic Considerations: The group highlighted the need for a human-centered approach and multi-stakeholder input in defining AI project requirements. They also identified the need for practical enforcement mechanisms, beyond just regulations, to ensure AI safety.
    • Expertise and Skillsets for AI Safety: It was recognized that the required expertise for AI safety is context dependent. Key considerations include human rights, interdisciplinary “red teams,” and protecting human testers from harmful content.

These discussions provided valuable insights that should inform future AI safety and testing practices within Swiss businesses.

Summary

This workshop proved invaluable in addressing the complex and evolving landscape of AI testing and security. By bringing together a diverse group of experts, we gained critical insights into aligning AI behavior with human expectations and identifying practical steps for ensuring AI safety in Swiss businesses.

The collective knowledge shared, and the collaborative discussions fostered will significantly contribute to shaping future AI safety practices. We extend our sincere gratitude to our generous sponsors, Innovation Booster Artificial Intelligence, Cyberfy, and BrandWorthy.AI, whose support made this important workshop possible.