Semantic Search — Landbase Toolkit

Examples

How customers use it

Finding a whole role family with AI

Ask for every revenue leader in SaaS and the titles fan out — CRO, VP RevOps, Head of GTM, SVP Go-to-Market. Semantic Search interprets the concept and returns the whole role family in one query, no manual enumeration required.

Targeting by concept, not keywords

Say "let's target fintech" without specifying industry codes or filter logic. Semantic Search treats "fintech" as a meaning cluster and pulls every company that fits — payments, lending, neobanks, B2B fintech infrastructure.

Finding people by career arc

Look for engineers who became founders, both ends of the arc captured in a single search. Semantic Search recognizes the trajectory and surfaces the right people, even when the current title reads "Founder & CEO" and the prior one was "Staff Software Engineer."

Capturing a whole function in one query

Ask for "data leaders" — but the function goes by a hundred names: Head of Data, Analytics Lead, VP Insights, Chief Data Officer, BI Director, Data Science Manager. One semantic query returns all of them.

Under the hood+

How it's used

It runs underneath every Landbase workflow. Title expansion before list builds. Concept filters where keyword search misses synonyms. Audience scoping when the customer can't enumerate every label that should count.

Why it matters

Most "filter UI" queries are quietly a semantic-search call underneath. Without it, every workflow narrows to the literal strings the customer remembered to type — and silently drops the variants they didn't.

The Similar Company Graph

Company Data

21.6M companies

→

Embedding

1,024-dim vectors

→

Indexing

Random Partition Forest

→

Retrieval

ANN + neighbor exploration

→

Reranking

with external signals

→

Similar Company Graph

21.5B edges

A naive pairwise comparison would be 233 trillion operations. The optimized pipeline does it in 361 billion — about 0.15% of brute force.

Stage by stage

Embedding. Every company is turned into a 1,024-dimensional vector that captures its meaning — description, services, signals, the whole picture compressed into a numeric fingerprint.
Random Partition Forest. A spatial index over the vectors. Lets us look up "close" without comparing every pair.
Approximate nearest neighbors + neighbor exploration. Two retrieval stages stacked. ANN gets the rough candidate set; a follow-up exploration step adds 140B candidates to fill in what ANN missed.
Reranking with external data. Top 1K neighbors get re-scored using signals the embedding alone doesn't see. Final recall lands at 100% at k=1, 99.7% at k=10, 99.5% at k=100.
Similar Company Graph. The output of the pipeline — a graph with 21.5 billion edges connecting every company to its closest peers. Lookups against it are instant.

By the numbers

21.6MCompanies indexed

1,024-dimEmbedding vectors

21.5BEdges in the graph

99.7%Recall @ k=10