AI in the Real World: Operational Tools That Actually Move the Needle

Part 3 of 3: The AI Proof Series

Mar 03, 2026

In the first two parts of this series, we explored the LLMs themselves and then the marketing and creative tools built on top of them. Now we turn to the sharp end of the stick: the operational tools that promise to make your business run faster, leaner, and smarter. We’re talking about email assistants, meeting notetakers, workflow automation, AI assistants, and coding agents.

As with everything in this series, we’ve tested these tools with real workloads and real money. Some delivered genuine value from day one. Others were expensive lessons in hype. Here’s what we found.

Email Assistants: The Promise That Hasn’t Landed Yet

The pitch is seductive. An AI that lives in your inbox, triages your messages, drafts replies, and keeps you at inbox zero without lifting a finger. We loved the idea. The reality, unfortunately, hasn’t caught up.

We trialled several of the leading email assistants on the market, and the pattern was consistent: what you’re actually paying for is a glorified email classification tool. They tag your messages with labels and categories, which is mildly useful, but they rarely create anything of genuine value. The drafts they produce tend to miss the tone, misread the intent, or reply to the wrong thread entirely. In any kind of client-facing or sensitive context, you’d never let them send unsupervised, and most of the drafts need rewriting anyway, which rather defeats the purpose.

The one we had the most success with was the Perplexity Email Assistant, available exclusively to Perplexity Max subscribers at $200 per month. It connects to Gmail and Outlook, can draft contextual replies, apply smart labels, and even manage calendar scheduling via email. It does draft some interesting responses that could save time on routine correspondence. But reliability has been a persistent issue. Emails sometimes fail to send, or you’re left uncertain whether they actually went out. We found ourselves restarting the integration more often than felt acceptable for a tool at that price point. The assistant also only works with a single email address at present, which limits its usefulness for anyone juggling multiple accounts.

Our verdict on email assistants as a category: unconvinced. The technology isn’t mature enough to trust with anything beyond the most routine correspondence, and the cost-to-value ratio doesn’t stack up. We’ve parked this category for now and will revisit as the tools improve.

Meeting Notetakers: The Gateway Drug to AI Adoption

If there’s one AI use case that has genuinely earned its place in the modern workplace, it’s the meeting notetaker. When surveys report that 80% of companies are now using AI, this is overwhelmingly where that adoption sits, and for good reason. The workflow is simple and the value is immediate: an AI joins your meeting, records the audio, transcribes the conversation, and then generates a structured summary with action items and minutes. It saves time, improves accountability, and ensures nothing falls through the cracks.

We’ve tested several vendors in this space, each with their own strengths and pricing models.

Otter.ai is very capable. Its Pro plan runs at $8.33 per user per month on an annual subscription (or $16.99 monthly), offering 1,200 minutes of transcription and the ability to join meetings on your behalf when you’re double-booked. The Business plan at $20 per user per month bumps that to 6,000 minutes with enhanced team features like speaker tagging and collaborative action items.

Cluely takes a different approach. Rather than just recording and summarising after the fact, it positions itself as a real-time AI assistant that sits alongside your call, surfacing suggestions and talking points as the conversation happens. It reads both the live transcript and whatever’s on your screen. The Pro tier costs $20 per month. It’s an interesting concept, though it doesn’t record audio, which means you lose a layer of verifiability. There are also broader ethical questions about the tool, given its origins in interview assistance, and some companies now treat its use as a policy violation in assessment contexts.

Microsoft Copilot in Teams is worth mentioning for organisations already in the Microsoft ecosystem. It handles transcription and summarisation competently, though it’s bundled into the broader Copilot licensing rather than priced as a standalone tool.

The one we use day in, day out is Fireflies.ai. It records the audio, produces a clean transcription, and generates summaries with action items. The Pro plan is $10 per user per month (annual billing), and for most small teams that’s more than sufficient. The Business plan at $19 per user per month adds video recording, conversation intelligence, and unlimited storage. It’s reliable, it’s well-integrated, and it just works.

That said, we’d sound a note of caution about the long-term economics of this category. Apple Notes on Mac now includes a record-and-transcribe feature. If you already have access to an LLM, there’s nothing stopping you from recording a meeting natively and then throwing the transcript at the model for summarisation and action items. What tools like Fireflies give you is convenience: everything in one package without the multi-step process. But as native OS capabilities and local LLMs improve, the paid notetaker market may face significant commoditisation. For now, though, the value is real and the entry price is low enough to justify.

Workflow Automation: Where AI Starts Earning Its Keep

This is where things get genuinely exciting, and where we’d encourage any business that’s moved beyond basic prompting to focus next.

There’s a natural hierarchy to getting value from AI. It starts with the one-shot prompt: type something into an LLM and get a result. The problem is that one-shot prompts are inconsistent. There’s no structure, no repeatable format, and the quality varies wildly. The first upgrade is to adopt structured prompting, using a clear framework of role, context, task, and format (ideally in XML-style tags). Follow that discipline and your outputs become dramatically more consistent and useful.

But the real step change comes when you start chaining structured prompts together into automated workflows. Instead of manually running prompts one at a time, you build a pipeline where the output of one step feeds into the next, with logic gates and filters along the way. This is where AI stops being a novelty and starts becoming operational infrastructure.

There are several platforms competing in this space:

Zapier is the most established, with plans starting on a free tier (100 tasks per month) and a Professional plan from $19.99 per month for 750 tasks. It’s powerful and well-integrated with a vast library of app connections. The unified plan structure now bundles workflows, data storage, forms, and AI tool connections into single subscriptions. Our honest assessment: it’s excellent, but we find the interface more complicated than it needs to be, particularly when building multi-step automations with branching logic.

Gumloop is a newer entrant backed by Y Combinator, offering a visual drag-and-drop interface for building AI-powered workflows. The Solo plan starts at $37 per month for 10,000 credits. Be aware, though, that credit consumption can be unpredictable. Each advanced AI call (using models like GPT-4o or Claude) costs 20 credits, and a single multi-step workflow can burn through over 100 credits in one execution. We found the token-burn pricing model made costs difficult to forecast and expensive for routine tasks.

Clay is worth mentioning, though it’s more focused on sales intelligence and data enrichment than general-purpose automation. Plans range from free to $800 per month, and it’s extremely powerful for go-to-market workflows. But it’s a specialist tool rather than a broad automation platform.

Our preferred platform, and the one we’d recommend for most businesses, is n8n. It’s open-source, it’s flexible, and its pricing model is the most sensible in the category. The cloud-hosted Starter plan begins at EUR 24 per month for 2,500 executions, with Pro at EUR 60 for 10,000. Crucially, a workflow with 15 steps still counts as a single execution, not 15 billable actions, which is a fundamental difference from Zapier’s per-task billing. And if you’re technically inclined, the self-hosted Community Edition is completely free with unlimited executions.

We run n8n both on a virtual private server and locally. Our primary use case is sector intelligence: every night around 1am, an automated workflow spins up and scans the last 24 hours across a range of news sources, blogs, Reddit threads, and industry publications. It pulls in roughly a thousand stories, then passes them through three sequential AI filtering stages, each using a different LLM call to assess relevance. By the time it’s finished, it commits somewhere between 10 and 20 genuinely relevant stories and delivers them to our inbox by morning. That single workflow replaces what would otherwise be hours of manual scanning and reading, and it runs every single day without intervention.

These kinds of automations are absolute game changers. If you’ve got comfortable with AI prompting and you’re looking for the next step, workflow automation is where you should be heading. Find the tasks in your business that grind, the repetitive, low-value work that keeps people from doing higher-value thinking, and automate them. The tools do break occasionally and they’re not all perfect, but the return on investment is substantial.

AI Assistants: The Future, But Not Quite the Present

The concept of a persistent AI assistant, one that knows your preferences, manages your calendar, handles your admin, and operates across your messaging platforms, is arguably the most compelling long-term vision in the entire AI space. And in early 2026, one project brought that vision closer to reality than anything else: OpenClaw.

Originally published in November 2025 by Austrian developer Peter Steinberger under the name Clawdbot, the project was renamed to Moltbot in January 2026 following trademark complaints from Anthropic, and then to OpenClaw three days later. Despite the naming turbulence, the project exploded in popularity, amassing 140,000 stars and 20,000 forks on GitHub by early February, with a community registry (ClawHub) hosting over 5,700 skills.

The concept is powerful. OpenClaw is a self-hosted AI assistant that connects to the messaging platforms you already use: WhatsApp, Telegram, Slack, Discord, Teams, even iMessage. Unlike a ChatGPT or Claude conversation that starts fresh each time, OpenClaw maintains continuous memory across all interactions. It remembers your preferences, past conversations, and contextual details. It can manage emails, browse the web, and automate tasks autonomously.

We built our own OpenClaw agent, which we named Arthur, and it worked impressively well for what it is. But we’ve mothballed Arthur for now, and here’s why.

First, the API costs add up quickly. Every interaction routes through an LLM API, and for an assistant that’s meant to be always-on and responsive, the token consumption becomes significant. Second, when we looked honestly at the use cases people are building with it, most of them can be achieved more simply and cheaply with a structured prompt or a scheduled automation. We get a daily briefing every morning from an LLM covering the weather, key business news, what happened with Arsenal, the latest from Formula One, and overnight AI developments. That doesn’t need to route through an API-powered agent; it’s a straightforward automation.

The technology itself is still firmly in beta. There are bugs, reliability issues, and the kind of rough edges you’d expect from a project that’s been around for barely three months. On February 14th, Steinberger announced he was joining OpenAI and that the project would transfer to an open-source foundation, which should help with long-term governance but also introduces uncertainty about direction.

We think AI assistants are unquestionably the future. The idea of an AI employee that you can task, message, and delegate to is extraordinarily powerful. But right now, adopting one means accepting early-adopter pain for limited practical gain over simpler alternatives. It’s a bit like the MP3 player era before the iPhone arrived: the technology works, but something transformative is coming that will make the current generation look primitive. When a major LLM provider fully backs this paradigm, perhaps OpenAI building on Steinberger’s work, that’s when it will become a must-have. For now, it’s one to watch closely rather than one to bet your operations on.

Coding Agents: The Truly Agentic Frontier

If the tools above represent the current state of AI in operations, coding agents represent where the entire field is heading. This is AI at its most autonomous: you describe what you want, the agent plans the work, writes the code, tests it, and delivers the result. It’s not process automation in the way n8n workflows are. It’s genuinely agentic behaviour, where the AI makes decisions, spins up sub-agents, and course-corrects independently.

The major players have all entered this arena. Google launched Antigravity, an AI-first IDE built on a modified fork of Visual Studio Code (reportedly influenced by their $2.4 billion acquisition of the Windsurf team). It’s currently in free public preview, powered by Gemini 3 models, with a tiered pricing model expected later in 2026. Its standout feature is the Manager View, where a developer can dispatch multiple agents to work on different tasks simultaneously. OpenAI offers Codex, available to ChatGPT Plus subscribers at $20 per month, with 30 to 150 local tasks every five hours. The Pro tier at $200 per month unlocks significantly higher throughput. Grok from xAI and all of the major Chinese models also offer coding capabilities.

But the tool that consistently generates the most discussion and the most enthusiasm from developers is Claude Code from Anthropic. Available with a Claude Pro subscription at $20 per month (or the Max plan at $100-$200 per month for heavier usage), Claude Code operates in the terminal and takes a genuinely agentic approach: you provide a product requirement specification, it plans the implementation, works through the task list, tests along the way, and spins up different agents as needed. For anyone who takes the time to write a proper spec upfront, the results are remarkable. The API is also available on a pay-per-token basis for developers who want to integrate it into their own toolchains using the Claude Agent SDK.

We’re not a coding house by trade, but we’ve run several projects through these tools and the output has been genuinely impressive. When you invest the time in creating a proper product requirement specification, which AI itself can help you shape, the agents can autonomously work through the build, checking off requirements and running tests as they go.

For those who want a more visual, less terminal-driven approach, platforms like Lovable offer a compelling alternative. Lovable lets you describe what you want in natural language and generates full-stack web applications through a chat interface. The Pro plan starts at $21 per month (billed annually) for 100 credits, making it extraordinarily accessible. We’ve used it to build several internal tools: a mileage tracker, a subscription renewal manager (to stop those annoying auto-renewals catching us out), and other lightweight utilities. For around $250 a year, you can prototype and build bespoke internal tools that would cost thousands to commission from a developer.

A few caveats are worth noting. Context window management matters: the longer a chat session runs on these platforms, the more the model loses track of earlier context, and you end up spending more tokens debugging than building. Every developer we’ve spoken to is clear that vibe-coded applications are prototypes, not production systems. The security, scalability, and code quality considerations that matter in production software are not yet handled reliably by these tools. That will change, but for now, treat the output as internal tooling and proof-of-concept work rather than client-facing product.

The broader point, though, is transformative. If you have an internal need, say you want a CRM system tailored exactly to your workflow and nothing on the market quite fits, you can now build one yourself for the cost of a single software subscription. For internal tools and prototypes, the economics are extraordinary.

We expect our own workflow to evolve from platforms like Lovable towards the more powerful agentic coding tools (Claude Code, Codex, Antigravity) as those mature and as our requirements grow more sophisticated.

The Bottom Line

If you take nothing else from this piece, here’s our operational AI hierarchy:

Meeting notetakers are the clear starting point. Regardless of which vendor you choose, they all add value, they’re all competitively priced, and they represent the lowest-risk, highest-reward entry into AI for most businesses. Just pick one and start using it.

Email assistants are a long way off. Right now, they’re expensive email categorisers that can’t be trusted with tone, context, or autonomous sending. We wouldn’t recommend investing here until the technology matures significantly.

Workflow automation is where the real operational value lives. Once you’re comfortable with structured prompting, building automated workflows is the natural next step. Find the repetitive, mind-numbing tasks in your business and automate them. n8n is our tool of choice, but Zapier and others are perfectly capable alternatives. This is where every business should be heading.

AI assistants are the future, but adopting them today means signing up for the early-adopter experience: bugs, high API costs, and capabilities that can largely be replicated with simpler tools. OpenClaw has shown what’s possible, and when a major provider fully productises this paradigm, it will be transformative. For now, keep watching.

Coding agents are game changers, full stop. Whether you use a visual platform like Lovable for quick prototypes or a full agentic tool like Claude Code for more serious development, the ability to turn a written specification into working software is here today. Be clear on your requirements upfront, manage your context windows carefully, and treat the output as internal tooling rather than production code, and you’ll get extraordinary value.

The operational AI landscape is moving fast. Some of these categories will look completely different in twelve months. But the tools that work today, notetakers, automation workflows, and coding agents, work well enough to justify adoption right now. The rest is coming. And when it arrives, the businesses that have already built the muscle memory of working with AI will be the ones best positioned to capitalise.

Next week, we’ll be back with a deeper dive into agents and AI. Stay tuned.

Discussion about this post

Ready for more?