CaliberAI

Building Trust in AI: Designing for Editorial Integrity

CaliberAI | Design Director | 2020-2022

The Problem Nobody Wanted to Admit

Newsrooms move fast. Deadlines are measured in hours, sometimes minutes. And in that pressure cooker, one word, one careless phrase, can cost millions in legal fees or destroy a reputation.

Editors and legal teams knew this. They lived with the risk every day. But they lacked the tools to catch potentially defamatory or harmful language before publication without creating bottlenecks that would grind journalism to a halt.

CaliberAI wanted to solve this with AI. But here's where it got interesting: The very people we wanted to help, journalists, editors, legal teams, were deeply skeptical of AI making decisions about language, truth, and risk.

My role: I led product design and user research to figure out how we could build AI tools that people would actually trust and use. This meant:

•       Understanding professional resistance to AI among journalists and legal teams

•       Designing transparent interfaces that made AI decisions understandable

•       Partnering with linguistic experts to improve model accuracy and reduce bias

•       Working directly with the CEO and engineers on weekly iteration cycles

The Real Challenge: Trust, Not Technology

CaliberAI had strong technology. The AI could detect linguistic patterns that might indicate defamation, hate speech, or harmful content. But technical capability wasn't the problem.

The problem was trust. And trust had three dimensions:

1. Professional Identity

Journalists saw themselves as guardians of truth and language. The idea of AI 'correcting' their writing felt like an attack on their professional judgment and editorial autonomy.

2. The Black Box Problem

Legal teams needed to understand why something was flagged. 'The AI said so' wasn't good enough when defending editorial decisions or advising on publication risk.

3. Workflow Disruption

Any tool that slowed down the publication process was dead on arrival. We had to fit into existing workflows, not create new ones.

Research: Learning Why AI Tools Fail

I spent weeks talking to editors, legal teams, and PR professionals. Not just about what they needed, but about why they'd rejected previous AI tools.

What I learned:

•       They wanted explanation, not just detection. Why was this flagged? What's the legal risk?

•       They needed control. The ability to set their own risk thresholds based on their organisation's tolerance

•       They valued speed. Any friction would kill adoption

•       They needed audit trails. Legal teams wanted documentation for compliance and defense

"I don't mind being told something might be risky, but I need to know why, and I need to make the final call. This is my editorial judgment on the line."

Senior Editor, National Newspaper

How We Built Trust Into the Design

I knew we couldn't just build a better interface. We needed to fundamentally rethink how AI communicates risk to humans who are experts in their domain.

1. AI Transparency as Core Design

Every flagged phrase needed context:

•       Risk level: High, medium, or low, with visual hierarchy

•       Legal explanation: What type of harm this could represent (defamation, hate speech, privacy violation)

•       Confidence score: How certain the model was about this assessment

•       Context awareness: Why this phrase in this context matters

This wasn't just 'nice to have', it was the difference between a tool people used and one they ignored.

2. Human in Command, AI in Support

I reframed the entire product philosophy. Instead of 'AI that catches your mistakes,' it became 'AI that gives you more information to make better decisions.'

This meant:

•       Adjustable sensitivity: Users could set their own risk tolerance

•       Accept/reject patterns: The system learned from editorial decisions

•       Editorial override: Always respected, always documented

3. Language Matters

Small wording changes had huge impact on adoption:

Instead of...

We said...

"Error detected"

"Potential risk identified"

"This is defamatory"

"This phrase may carry defamation risk"

"Flagged content"

"Highlighted for review"

 These weren't cosmetic changes. They fundamentally shifted how users perceived the tool, from adversarial to collaborative.

4. Partnering with Linguistic Experts

I worked closely with linguists and legal experts to improve the model's accuracy and reduce bias. My role wasn't technical, it was translational:

•       How do we surface uncertainty without overwhelming users?

•       How do we communicate cultural and linguistic nuance in the interface?

•       How do we let users report false positives in a way that improves the system?

Early-Stage Iteration: Weekly Cycles

Working at an early-stage startup meant speed was everything. I collaborated directly with the CEO and engineering team on weekly iteration cycles:

•       Monday: Review user feedback and usage data

•       Tuesday-Wednesday: Design updates and prototypes

•       Thursday: Testing with users or internal team

•       Friday: Ship updates, plan next week

This rapid cadence meant we could respond to real-world feedback fast, adjusting language, tweaking confidence displays, refining risk hierarchies based on how people actually used the tool.

Impact

The work paid off:

•       Built transparent workflows that improved trust and adoption among legal and editorial teams

•       Reduced professional resistance through explainable AI design and respectful messaging

•       Created interfaces that integrated into existing editorial workflows without adding friction

•       Helped validate product-market fit by addressing core trust barriers

What I Learned

AI Transparency Isn't Optional

When you're asking people to trust AI with high-stakes decisions, legal risk, reputation, truth, transparency isn't a nice feature. It's the foundation of the entire product.

Language Shapes Trust

Small wording changes, from 'error' to 'potential risk,' from 'flagged' to 'highlighted', fundamentally changed how users perceived the system. Words matter, especially when you're challenging professional expertise.

Professional Identity Drives Resistance

Journalists and legal professionals weren't resisting AI because they didn't understand technology. They were resisting because AI felt like a threat to their professional judgment and autonomy. Design that respects expertise wins.

Iteration Speed Matters in Early-Stage Startups

Working in weekly cycles with the CEO and engineers meant we could respond to real feedback immediately. This velocity was critical for product-market fit validation.

Reflection

CaliberAI taught me that building AI products isn't just about technology, it's about trust, communication, and respect for human expertise.

The hardest design problems weren't technical. They were human: How do you convince skeptical professionals that AI can support their judgment without replacing it? How do you make black-box models transparent enough to earn trust? How do you integrate sophisticated technology into fast-moving workflows without creating friction?

This project reinforced my belief that the best AI design doesn't hide complexity, it translates it. It gives people the information they need to make better decisions, while respecting their expertise and autonomy.