AI series

The End of the Keyword Era (Kinda)

How Keyword Search Got Us This Far, and Why It Can’t Take Us the Rest of the Way

Pouring One Out For the Dewey Decimal System (RIP)

Before the internet, searching for information was slow, but at least it made sense. You went to a library, flipped through catalogs, and relied on structured classification systems like the Dewey Decimal System. Sure, it was tedious, but it was also predictable. You couldn’t possibly process vast amounts of information on your own, and you didn’t have to - human-curated indexes and librarians helped you find what you needed.

Then came the digital era, and suddenly, the entire world’s information was at your fingertips. No more library visits; no more manual searching. But with that came a new challenge: how do you sift through an ocean of information to find exactly what you need? Unlike libraries, the internet didn’t come pre-organized by experts. Authority and accuracy varied wildly, and finding the right information became just as much of a problem as access itself.

The world needed a better way to sort through massive amounts of data en masse. And voila! Keyword searching was the best way to do it. But did you know that not all keyword searches are created equally? More on this later…

I recently listened to a fascinating episode of 99% Invisible, “Search and Ye Might Find,” which provides a super interesting overview of the history of keyword search and how it transformed the way we retrieve information. Inspired by that episode, we wanted to explore how keyword search took over, why it was a great solution (at the time) - and why it’s no longer enough.

When Keyword Search Was… Dumb (But Fast!)

The earliest versions of keyword search were, frankly, kind of dumb. They were a heck of a lot faster than good ol’ Dewey, but they weren’t nearly as smart. These early systems treated every word as equally important, with no sense of hierarchy. Type in a word, and the search engine would return every document with that word, regardless of relevance. Want to find information on apple farming? Great! Here’s a bunch of articles about Apple laptops!

This kind of literal matching obviously had utility - it was far quicker than heading to the library and flipping through a catalogue, but it came with massive tradeoffs. It couldn’t distinguish between relevant and irrelevant uses of the same word (think apple vs. Apple). It didn’t understand synonyms or related concepts. And it offered no help if you didn’t know the exact term you were searching for in the first place. It was like asking a stranger to fetch you a book without telling them the genre or context. 

Still, for a special moment in time, this barebones system seemed like magic. Suddenly you could “search” documents digitally, even on the vastness of the world wide web, retrieving pages in seconds that might have taken you hours to locate in a library. The sheer speed of keyword searching was a revelation. But the lack of nuance was a real problem that became a lot more poignant as the vastness of the web expanded and the amount of searchable data exploded.

The technology powering these early search engines was known as an inverted index. It worked kind of like the inverse of the library’s system; instead of cataloguing every topic, it indexed every word and mapped it back to the documents in which it appeared. This allowed for extremely fast lookup speeds. Rather than crawling the entire internet in real time every time someone entered a search term, these systems retrieved pre-crawled results instantly. Speedy? Yes. Smart? Not quite.

Suffice it to say, there was a lot of room for improvement. The first incremental step in making search smarter came with improved ranking algorithms like Okapi BM25 (where BM stands for Best Matching). BM25 ranked search results by taking into account the length of documents, the rarity of words, and other factors that improved the relevancy of results.

Then Came Google

Google’s impact on search cannot be overstated. It didn’t invent keyword search, but it refined it tremendously. And, more importantly, it completely reimagined how results should be ranked. Rather than treating every mention of a word on the internet equally, Google used context and relevance to determine which results were likely to be most useful. How? By evaluating not just the presence of keywords, but also things like link structure, site authority, and user behaviour. Witchcraft? No, just really clever engineering. This is Google’s infamous PageRank technology. 

We’d be remiss not to mention that PageRank was patented. In fact, it’s commonly included in lists like this one, identifying it as one of the most influential and important patents in history. The “magic” of PageRank, combined with the speed of crawling and indexing created a user experience that felt - dare I say - smart? Google just seems to eerily know what you mean. And once users got a taste for that kind of intuitive relevance and simplicity, there was no going back. Many readers will be too young to remember the prior dominance of Yahoo or “Ask Jeeves” but this millennial does (RIP, RIP).

But even as Google was revolutionizing how we find things on the internet, it also opened the door to a new kind of problem: search optimization. People quickly learned how to game the system and they did so en masse. They stuffed websites with keywords (sometimes in invisible font buried in headers and footers - tricky buggers!), bought backlinks, and flooded the web with low quality content designed solely to rise to the top of search results. Search became not just a tool, but a battlefield.

To its credit, Google fought back with regular algorithm updates, each one a move in an ongoing arms race against spammy Search Engine Optimization (SEO) tactics. The system got better - more contextual, more semantic, more robust. But underneath it all, keyword-based intexing still played a central role. 

Same Same But Different

While Google kept evolving, keyword search in many specialty systems stayed frozen in time (like, umm, patent search databases). We’re talking strict (I mean STRICT!) keyword matching. If you’re a patent professional you will know exactly what I mean.

If you don’t use the exact words that match the documents you’re trying to find, you might miss them entirely. And, of course, keywords are completely void of context (back to the apple vs. Apple problem). Still, some people live and breathe keywords and rely on them for reproducibility of search results. However, as I alluded to earlier, not all keyword systems return the same results, even when pointed at identical datasets.

We tested this by searching for patents with the keywords “semiconductor” AND “RRAM” on Espacenet (with the US country filter) vs. the same query on USPTO’s Patent Public Search. The results? Different outcomes, even with the exact same query (!!!):

Screenshot of top three search results on Espacenet (USA jurisdiction only) followed by top three results for the same query on USPTO’s Patent Public Search reveal different outcomes after searching the same database of ~46k patent documents.

Why on earth does this happen? Because every search engine tokenizes, normalizes, indexes, and ranks information differently. For example, TF-IDF ranks documents by weighing rare terms more heavily, while Okapi bm25 further refines this by considering term frequency and document length. Some filter results before scoring them; others do the reverse. So even if two systems are searching the same haystack, you may be shown very different needles.

Keyword Search in Patent Land

Nowhere is the tension between keyword simplicity and semantic nuance more obvious than in patent search. The stakes are high. The language is technical. The vocabulary is wildly inconsistent, not least because patent drafters are free to define their own technical terms!

Patent professionals often rely on Boolean keyword searches to pinpoint relevant prior art or surface filings related to a specific technology area. But patents are notoriously opaque - filled with euphemisms, jargon, and vague terminology. A single missed synonym can mean overlooking the most important document. And when outcomes hinge on finding the right references, that’s a serious risk. There’s got to be a better way, right? ;)

*AI has entered the chat*

^ This is NLPatent’s raison d’être.

That said, while we’re known for our AI-based search capabilities, we’ve never dismissed the value of traditional keyword search. In fact, there’s a time and place for both - used alone or in tandem. That’s why NLPatent supports both: (a semantic AI search engine designed specifically for the complexities of patent search) AND (a robust keyword engine powered by OpenSearch (a modern keyword search system known for its speed and flexibility). See what I did there? Our AI engine allows users to search in natural language and retrieve conceptually relevant results, even if the phrasing is completely different. Our keyword feature allows advanced boolean logic, nesting, and filtering without distorting result rankings.

What Keyword Systems Can’t Do

In 2025, the power and utility of AI-based search systems is no longer a question. The writing is on the wall. As a result, many platforms have tried to retrofit their legacy keyword databases with a dusting of AI. But the truth is, you can’t just sprinkle in some machine learning and expect to achieve semantic understanding. 

Why not? Keyword search engines and semantic search engines are built on fundamentally different architectures. Keyword engines rely on inverted indexes - speedy, no doubt, but limited to matching exact words or phrases. Semantic systems, by contrast, are often powered by vector databases that map meaning, not just text.

Why does this matter? Surface-level integrations of AI into keyword systems can only go so far. True semantic search requires a system that understands concepts, relationships, and context - not just strings of characters. That kind of true understanding has to be designed from the ground up. 

Platforms like NLPatent are built with this foundation in mind, offering true semantic discovery that can either be used alone or in combination with traditional keyword search. Because the future of search isn’t either-or. It’s about using the right tool for the job.

Where Do We Go From Here?

This might sound shocking coming from me, but keyword search isn’t going away. It’s still an essential tool in many contexts, especially for those who know exactly what they're looking for. When used well, keyword search can deliver speed and precision that’s hard to beat. But in today’s world, it’s no longer enough on its own.

The way we search (and what we expect from it) has fundamentally changed. We need search systems that go beyond matching terms to understanding meaning. That means interpreting language in context, recognizing intent, and surfacing information that’s actually useful, even when the exact words don’t line up.

Crucially, the next generation of search tools won’t be built by layering AI on top of outdated infrastructure. They’ll be built from the ground up to integrate semantic understanding at their core, paired with the structure and speed of keyword search. That’s the only right approach (in my humble opinion!). 

We’re not pouring one out for keywords just yet. But we are making room at the table for something smarter.

More Blog Posts Coming Soon!

Sign up to receive alerts when we publish more blog posts about how NLPatent has brought success to the field of IP and AI.

Thank you!
Your submission has been received!
Oops!
Something went wrong while submitting the form.
Ready to get started?
Thank you!
Your submission has been received!
Oops!
Something went wrong while submitting the form.