Redpine
Senior Knowledge Graph Engineer
Company
Role
Senior Knowledge Graph Engineer
Location
Job type
Full-time
Found on Mokaru
23 hours ago
Salary
Job description
Why this role exists
Agents are only as good as the knowledge they can reach. Most retrieval today is flat: chunk a document, embed it, hope the nearest vector is the right answer. That breaks the moment a question needs more than one hop, needs to know which source to trust, or needs to distinguish two entities that happen to share a name. Redpine is building the layer that fixes this: a knowledge graph over the licensed data we hold across medicine, science, law, and finance, served directly to agents over our API, MCP, and CLI.
You will own that graph end to end, from raw multimodal sources to the structured, provenance-bearing knowledge that agents query in production. This is an early, high-ownership role. The decisions you make about how we model and link knowledge will shape what every agent built on Redpine can actually reason about.
Why knowledge graphs are core to agentic retrieval
Agents don't just look things up, they traverse. They follow a claim to its source, a company to its filings, a drug to its trials to the patients those trials enrolled. That kind of multi-hop reasoning needs explicit structure: typed entities, named relations, and a graph a planner can walk.
What you'll work on
- Ontology and schema design. Define the entity types, relations, and constraints that licensed data from each domain maps into, building on the established ontologies those fields already use rather than reinventing them. You'll decide where a shared ontology helps and where a domain needs its own.
- Entity resolution at scale. Deduplicate, canonicalize, and disambiguate entities across heterogeneous, multimodal sources, so the same thing in the world is one node in the graph.
- Confidence and conflict. Attach confidence to every link, and define what happens when two sources disagree. Decide what the graph asserts, what it flags, and what it surfaces to the agent.
- Provenance as a first-class property. Preserve attribution on every node and edge, back to the source document and the license that covers it. At Redpine this is not optional, it is the product.
- Keeping the graph live. Detect when an upstream source changes and propagate that change, so the graph reflects current knowledge instead of a stale snapshot.
- Graph-powered retrieval. Build multi-hop traversal, hybrid graph-and-vector retrieval, and the kind of structure a reranker can actually exploit.
What we're looking for
- Deep, direct experience with knowledge graphs: graph databases, property or RDF graphs, entity resolution, and ontology design. You've shipped one, not just read about them.
- Strong Python and a track record of building data pipelines from scratch.
- Real experience with retrieval and RAG, ideally graph-backed, plus the judgment to know when a graph earns its keep and when it's just overhead.
- Judgment about messy data. You can look at multi-source, contradictory input and design a schema that survives contact with reality.
- A bias toward small, clear systems. You question whether something needs to be built before you build it.
- A genuinely curious mind, the kind that wants to understand a domain well enough to model it honestly.
About Redpine
If models were the first wave of AI, and compute the second, we're building the data layer that comes next.
Only a small fraction of the world's data is on the open internet. The rest, high-quality, domain-specific, often critical, sits behind paywalls, in databases, or with rights holders. Redpine is building the infrastructure to unlock it.
We provide AI builders and autonomous agents with access to licensed, high-quality, multimodal data through a unified platform and API. The goal is simple: make AI systems more accurate, more useful, and grounded in real-world information.
We're backed by Nordic Ninja, Node VC, and Luminar, alongside angels from OpenAI, Spotify, and Perplexity.


