Lukas Fischer

The doomers say AI will destroy us. The optimists say it will save us. Both are wrong in the same way: they skip the part where someone has to actually make it work reliably. That is what this post is about.

I am writing this blog post in a terminal. No editor, no CMS dashboard. I type my thoughts into Claude Code, and a custom skill I built fixes my spelling, structures the content, and publishes it directly to NodeHive, my CMS. This is what harnessing AI looks like in practice. But the strategies behind it go far beyond blogging.

The Origin

OpenAI published a post called Harness Engineering and the article went somewhat viral. They explain how they went from an empty git repository to a full-blown application over five months, generating approximately one million lines of code. Their engineering team's primary job became designing environments, specifying intent, and building feedback loops that allow AI agents to do reliable work.

Martin Fowler picked up the concept and described it as the tooling and practices used to keep AI agents in check while maintaining large applications.

In short, harnessing AI means keeping the beast on track, where the beast is the AI model. There are several strategies for doing that. And while the original article focuses on software development, I believe these strategies apply far beyond code.

I grouped my strategies into four categories: constraining the AI, validating its output, building knowledge over time, and evolving the system.

Part 1: Constraining the AI

Context

The most fundamental strategy is giving the AI more context. An AI model without context is like a new employee on their first day without an onboarding. They might be brilliant, but they have no idea how things work around here.

In software, this means providing documentation, coding conventions, architectural decisions, and background knowledge directly in the repository. OpenAI treated a brief AGENTS.md file as a table of contents, with the actual knowledge base living in a structured docs/ directory. Instead of one giant instruction file, they built a layered system of context that agents could navigate.

But context goes far beyond documentation. In the OpenAI post, they also talk about giving the AI access to Google Docs where discussions about the feature set are written, or access to Slack where engineers, designers, and product people exchange ideas. This is organisational knowledge that usually lives in people's heads or scattered across tools. I can see a future where the AI has full access to all the thinking happening around an organisation. Not just the code, not just the docs, but the conversations, the decisions, the debates. The more of that context you can feed into the system, the better the AI understands not just what to build, but why.

Beyond software, this principle holds everywhere. If you ask an AI to write a marketing email, giving it your brand guidelines, tone of voice, and examples of past emails will produce dramatically better results than a cold prompt. Context is the single most important lever you have.

Permission Boundaries

Not everything should be automated. Permission boundaries define what the AI is allowed to do and, more importantly, what it is not allowed to do.

In Claude Code, for example, you can configure which tools the agent may use and which actions require explicit approval. The AI can read files freely but needs your permission before executing shell commands or modifying code. This is a practical implementation of a permission boundary.

In a broader sense, permission boundaries are about trust levels. You might let AI draft emails but require human approval before sending. You might let it generate reports but not publish them. The key is to be intentional about where the boundary sits rather than discovering it after something goes wrong.

Skills with Deterministic Software

One of the most powerful patterns is combining AI with deterministic tools and CLIs. Instead of letting the AI figure everything out from scratch, you give it access to reliable, tested software that handles the predictable parts.

The blog post you are reading right now is a living example. I built a skill in Claude Code that knows how to interact with the NodeHive CLI. I write the content, and the AI fixes my spelling and helps structure it based on the skill's instructions. The actual uploading, authentication, and content structuring happens through a deterministic CLI tool. The AI does not guess how to authenticate with my CMS. It follows a defined process.

OpenAI did the same thing at scale. They enforced architectural rules through custom linters and structural tests, with error messages designed to inject remediation instructions back into the agent's context. The deterministic tools act as guardrails that keep the AI on the rails.

Established Patterns

If the AI needs to work with well-established patterns, it is much easier to harness. The pattern itself becomes the harness.

A good example is using Drizzle ORM to build your database structure. An ORM is a very well-defined pattern, and Drizzle offers a clearly defined way to describe and migrate data structures. We use Drizzle on all our projects, and I fully trust the AI to work with it. The pattern is so well-documented and consistent that the AI can operate within it reliably.

The same applies beyond software. Any well-established workflow, template, or standard operating procedure constrains the AI in a productive way. The more standardized the pattern, the less room there is for the AI to drift off course.

Regulations

This one is interesting and goes beyond individual workflows. It might be necessary to develop global regulations for how much we are allowed to do with AI.

While regulations can slow down innovation, we as humans need to think about clear rules. Consider the car industry: a lot of standards evolved and are enforced globally. Your annual car inspection to keep your licence plate, the seat belt, vehicle dimensions, the standardization of tyres so they are interchangeable. All of this happened as a global effort to standardize and regulate cars, and it dramatically increased safety on the streets.

There are still hundreds of car manufacturers. Instead of competing on arbitrary metrics, they now compete within clearly defined safety standards. Innovation did not stop. It was channelled.

The same could happen in AI. Imagine standardized safety benchmarks, mandatory transparency reports, or required human oversight for certain categories of decisions. Not to kill innovation, but to ensure it moves in a direction that benefits everyone.

Part 2: Validating the Output

Quality Gates

Quality gates are automated checkpoints that verify the AI's output meets defined standards before it moves forward. Think of them as automated code review, but for everything.

In software, these are tests, linters, type checkers, and CI pipelines. If the AI writes code that breaks a test, it gets feedback and tries again. OpenAI ran agents periodically to find inconsistencies in documentation or violations of architectural constraints.

Here is a real example: this blog post. The AI helped me draft it, structure it, and publish it to my CMS. But that was not the end. I then spent significant time in the preview view of the CMS, reading through the article, improving the flow, sharpening arguments, making it more engaging, again with support of AI. The first draft was a starting point, not the finish line. The quality gate here was me, looking at the actual rendered output and asking: "Would I want to read this?"

Outside software, quality gates could be automated checks on generated content: Does it match the brand voice? Is it within the approved word count? Does it contain any prohibited terms? The principle is the same: automated verification before the output reaches the next stage.

Human in the Loop

Some decisions should not be automated, period. Human-in-the-loop gates are moments where the process pauses and waits for a human to review, approve, or redirect.

This is not about distrusting AI. It is about recognizing that certain decisions carry consequences that warrant human judgment. Publishing a blog post, sending a client proposal, deploying to production: these are moments where a human should look at what is about to happen and say "yes, go ahead" or "wait, let me adjust this".

The art is placing these gates at the right points. Too many, and you lose all the efficiency gains. Too few, and you lose control. The sweet spot is different for every workflow, and it shifts over time as you build trust in the system.

Testability

Very much like software testing, AI skills and AI-powered processes can be tested and evaluated. Not in production, but as part of the integration of a given process.

Let's say you want to use an AI copilot to automatically create complex graphs which are then published on a dashboard. You can create integration tests where you define 100 test cases, each with a specific skill or prompt and input data. The input data could be messy, real-world content from the wild internet. Then you check automatically if the results match your expectations.

This is a mindset shift. We are used to testing deterministic software where the same input always produces the same output. With AI, we need to test for ranges of acceptable outcomes. But the principle is the same: define what "good" looks like, run automated checks, and catch problems before they reach users.

Observability

You cannot harness what you cannot see. Observability means having clear visibility into what the AI is doing, why it made certain decisions, and where things might be going off track.

In software, OpenAI gave their agents access to observability data and browser navigation, creating a feedback loop where agents could see the consequences of their actions. When something broke, the system could trace back to understand what happened.

When people talk about observability, they typically mean logging, metrics, and dashboards. But I think we need to think beyond that. A tool like Playwright CLI, for example, can take screenshots of web pages and navigate through a site, letting you verify the actual rendered output in the browser. That is observability too: not just checking what the code says, but checking what the user actually sees. The more ways you have to observe the AI's real-world impact, the better you can harness it.

For any AI workflow, observability means logging decisions, tracking outputs, and maintaining audit trails. If an AI generates a report with a wrong number, you need to be able to trace back and understand where the error originated. Without observability, you are flying blind, and that is the opposite of harnessing.

Follow Best Practices

Software engineering was automated a lot in the past with CI/CD pipelines, pull request workflows, code review processes, and deployment strategies. With AI, these processes become even more important because they reinforce many of the points mentioned above: human in the loop, observability, quality gates.

A pull request workflow, for example, is simultaneously a human-in-the-loop gate, a quality gate (through automated checks), and an observability tool (through the diff and review history). These practices were not invented for AI, but they turn out to be perfectly suited for harnessing it. Do not abandon them. Double down on them.

Part 3: Building Knowledge

Self-Learning and Memory

I started creating skills together with Claude Code. In a session, I help the AI build the solution I want. I guide it very much like I would guide a junior developer. Then I say: write the learnings to the skill. I do that repeatedly while I use the skill, and it improves over time. It "learns".

Another approach is that the AI itself writes its learnings and reflections to a memory, very much like OpenClaw is doing. I see a future where this information becomes very valuable and the learnings compound over time.

Specifically when an error happens, this could lead to new gates or even allow the AI to self-restrict so that a specific error does not happen again. Imagine an AI that, after making a mistake, updates its own guardrails. That is a powerful feedback loop.

Play to Build Trust

Using AI is still a lot of work. To build trust and get a feeling for the AI's capabilities, and ultimately to harness it, I needed to play a lot with several AI tools.

There is no shortcut for this. You need hands-on experience to understand where the AI excels and where it falls short. You need to see it fail to know where the guardrails should go. You need to see it succeed to know where to let it run.

So go and play. Give your team time to play. Make it part of the work, not something that happens on the side. The teams that experiment the most will be the ones who harness AI the best, because they will have the deepest intuition for what works.

Part 4: Evolving the System

Let the AI Self-Correct

The "Ralph Wiggum loop", named after the Simpsons character who keeps failing and cheerfully trying again, is a pattern where you let the AI attempt a task, evaluate its own output against defined criteria, and retry until it gets it right.

I do not have enough experience yet to comment on this deeply, but I can see that there will be a time where AI just tries by itself to build something that matches the desired output. I also think that AI can become very good and very quick at doing this in novel ways.

Combined with quality gates and testability, self-correction becomes less scary. If the AI can try ten different approaches in the time it takes a human to try one, and each attempt is automatically validated against defined criteria, the loop becomes a feature, not a risk.

Wrapping Up

These are my current strategies for harnessing AI. Current, because this is an evolving field. Six months from now, I might have twenty strategies or I might have condensed them into five. The models will get better, the tools will get sharper, and some of these guardrails might become unnecessary while entirely new ones emerge.

But the core principle will remain: the more structure you provide, the more powerful the AI becomes. Constraints are not the enemy of capability. They are the foundation of it.

Thanks for reading. If any of this resonated, I would love to hear how you are harnessing AI in your own work.