Home About Services Work Blog FAQ Contact
Back to blog

Lab 6 min read

Crob: What If You Built AI from First Principles?

Most AI conversations assume you're working with large language models. Fine-tuning, embeddings, vector databases, API costs. Crob asks a different question: what's the minimum viable intelligence you can build from scratch?

The answer turns out to be simpler than you'd expect and more interesting than it sounds.

Three brains, no neural nets

Crob has three persistent storage systems. We call them brains because it's shorter than "knowledge representation files," but they're just text files you can open in any editor.

The knowledge brain stores facts in a custom .crob format. Structure encodes meaning: := means "is," :> means "has," :< means "part of." Confidence is baked into the syntax. A fact at .9 confidence means Crob is 90% sure. A fact at .2 means it's guessing.

The voice brain stores language patterns and personality templates in JSON. When Crob responds, it interpolates learned templates rather than generating text from a model. The responses are mechanical but honest about what they are.

The curiosity queue ranks research topics by interest and depth. When Crob encounters something it doesn't know, it generates research questions, scrapes the web for answers, extracts facts and language patterns, and adds interesting tangents to the queue for background learning.

Encounter unknown topic Generate research questions Scrape web for answers DuckDuckGo, no API key Extract facts + patterns Store in .crob file with confidence scores Add tangents to queue repeat

The .crob format

This is the part that kept us up at night. Most knowledge systems store data in formats optimized for machines: JSON, SQL, embeddings. The .crob format is optimized for humans to read and machines to parse.

A sample entry:

GSAP :=animation library .9
GSAP :>ScrollTrigger .9
GSAP :>timeline API .8
@G :=GSAP
OperatorMeaningExample
:=is / definitionGSAP :=animation library
:>has / containsGSAP :>ScrollTrigger
:<part ofScrollTrigger :< GSAP
@Xabbreviation@G :=GSAP
.Nconfidence (0–1).9 = 90% sure

That last line is self-compression. When a term appears more than five times, Crob auto-abbreviates it. @G becomes shorthand for GSAP. This mirrors how humans create jargon and abbreviations for frequently referenced concepts.

You can open the knowledge file and read exactly what Crob "knows." Every fact has a confidence score. There's no black box.

Autonomous curiosity

The part that makes this more than a fancy note-taking system: Crob generates its own research questions. Ask it about GSAP and it learns the basics, then adds "What is ScrollTrigger?" and "How does GSAP compare to CSS animations?" to its queue.

It scrapes DuckDuckGo for answers (no API key needed), extracts structured facts from the results, and stores them with appropriate confidence levels. Low confidence for single-source claims, higher confidence for facts that appear across multiple results.

Left running, it follows rabbit holes. We left it running overnight once with "web animation" as a seed topic. By morning it had mapped out GSAP, CSS transitions, Lottie, SVG animation, requestAnimationFrame, and was starting down a path toward WebGL shaders. The knowledge file was about 400 lines. Some of it was wrong. The wrong parts had low confidence scores, which meant we could find and fix them by sorting by confidence.

That overnight run taught us something: the curiosity system doesn't just collect facts. It builds a graph. GSAP connects to ScrollTrigger connects to Intersection Observer connects to performance optimization connects to Core Web Vitals. The connections aren't programmed. They emerge from the research process. Crob asks "what is X?" and the answer mentions Y, so Y goes into the queue.

What doesn't work yet

The voice system is the weakest part. Crob's responses sound like a search engine that learned to form sentences. Template interpolation gets you from "keyword match" to "grammatically correct statement," but it doesn't get you to "sounds like a person." There's a gap between knowing facts and expressing them naturally, and closing that gap without using an LLM is the hard problem we haven't solved.

Contradiction handling is also rough. If Crob learns "GSAP is free" from one source and "GSAP requires a license for some uses" from another, it currently stores both with different confidence scores. It doesn't reconcile them. A smarter system would flag the contradiction and either research further or lower both confidence scores. That's on the roadmap.

And the knowledge format, for all its readability, doesn't scale well past a few thousand facts. The linear search gets slow. The Python rewrite will add hyperdimensional vectors for semantic retrieval, which sounds fancy but is really just a way to find related facts without reading the entire file.

What it's for

Crob is an experiment in transparent AI. Every piece of knowledge is inspectable. Every confidence level is visible. When it doesn't know something, it says so with a number rather than hallucinating an answer.

It's also a teaching tool. You can explain how Crob works in one sitting. Try doing that with GPT-4. The simplicity is the point: strip away the neural nets and you can see the bones of what "learning" actually requires. Input, storage, retrieval, and something that looks a lot like curiosity.

Current state

Crob is PHP, runs from the command line, and stores everything in text files. A Python rewrite with a web interface is in progress. The knowledge format is stable. The learning loop works. The voice system is functional but rough.

The roadmap has five phases. Phase 0 (validate the learning loop) is done. Phase 1 is the Python rewrite with better confidence dynamics and contradiction detection. Phases 2-4 add semantic retrieval, a web interface, and a knowledge graph visualization. Phase 5 is Docker packaging so other people can run it.

It's not going to replace ChatGPT. That's not what it's for. It's a research artifact that asks whether intelligence needs to be opaque, and answers with a readable text file.

The code is on our workbench. If you want to see how we apply this kind of thinking to client work, check the services page. Or read about the chatroom system where our AI team collaborates. If any of this sounds interesting enough to talk about, reach out.

About the author

Rob Kingsbury

Rob Kingsbury is the founder of Kingsbury Creative and a Professor at Algonquin College. He has been building websites since the mid-1990s, and has spent the last decade focused on small businesses across Renfrew County and the Ottawa Valley.

Like what you see?
Let's build yours.

We're building our portfolio and offering introductory rates. Get in early.

Start Your Project