Bring your own data
You can ingest your own typed knowledge graph into OWLGraph and query it via the SDK. v1 supports the “I already have typed data” path — you bring four files (ontology + entities + triples + passages); OWLGraph loads them. If you have only a corpus and want OWLGraph to induce the ontology + triples, that’s Phase 2E.2 (LLM-driven extraction) and not shipping yet.
What you need
Section titled “What you need”| File | Format | What it is |
|---|---|---|
ontology.ttl | OWL 2 RL Turtle | Your classes + properties. Defines the typed graph schema. |
entities.jsonl | One JSON per line | Your canonical entities + their type + aliases + the passages that mention them. |
triples.jsonl | One JSON per line | Typed edges between entities. |
passages.jsonl | One JSON per line | Your corpus chunks. Embeddings optional but recommended. |
Total size limit today: 50 MB combined (nginx body cap). With 1536-dim embeddings, that’s room for ~3000 passages. Larger corpora need the S3-backed upload path which isn’t built yet.
File details
Section titled “File details”ontology.ttl
Section titled “ontology.ttl”Standard OWL 2 RL Turtle. Declare your classes, properties, and (optionally) owlcore:textEmbedding if you want HNSW vector search.
@prefix : <http://your-domain.example.org/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix owlcore: <https://owlgraph.ai/owl/2026/core#> .@prefix owlv: <https://owlgraph.ai/owl/2026/vector#> .
:Drug a owl:Class .:Disease a owl:Class .:Approval a owl:Class .
:treats a owl:ObjectProperty ; rdfs:domain :Drug ; rdfs:range :Disease .
:approvedFor a owl:ObjectProperty ; rdfs:domain :Drug ; rdfs:range :Disease .
owlcore:textEmbedding a owl:DatatypeProperty ; rdfs:domain owlcore:Passage ; rdfs:range owlv:Float32Vector ; owlv:vectorDimension 1536 ; owlv:vectorIndex "hnsw(metric: \"cosine\", exponent: \"5\")" ; owlv:embedderHint "openai/text-embedding-3-small" .entities.jsonl
Section titled “entities.jsonl”One JSON object per line. Required: name, top_type. Optional but recommended: aliases, passage_ids.
{"name": "Pembrolizumab", "top_type": "Drug", "all_types": ["Drug"], "aliases": ["Keytruda", "MK-3475"], "passage_ids": ["doc-0001", "doc-0017", "doc-0042"]}{"name": "Melanoma", "top_type": "Disease", "all_types": ["Disease"], "aliases": ["malignant melanoma"], "passage_ids": ["doc-0001", "doc-0042"]}| Field | Type | Notes |
|---|---|---|
name | str | Canonical surface form. Used as the primary key. |
name_lower | str (optional) | Lowercased. The loader fills this in if absent. |
top_type | str | The entity’s primary ontology class. Must match a class in your ontology.ttl. |
all_types | list[str] (optional) | Full type set. Defaults to [top_type]. |
aliases | list[str] (optional) | Alternative surface forms for entity_search. |
passage_ids | list[str] | Ids of passages mentioning this entity. Becomes mentions/ edges in the graph. |
triples.jsonl
Section titled “triples.jsonl”One JSON object per line. Required: subject, subject_top_type, predicate, object, object_top_type.
{"subject": "Pembrolizumab", "subject_top_type": "Drug", "predicate": "treats", "object": "Melanoma", "object_top_type": "Disease", "evidence": "approved for unresectable or metastatic melanoma", "confidence": 1.0}| Field | Type | Notes |
|---|---|---|
subject | str | Must match an entity name from entities.jsonl. |
subject_top_type | str | The subject’s top_type (used to disambiguate). |
predicate | str | An OWL ObjectProperty or DatatypeProperty from your ontology. |
object | str | Either another entity name OR a literal value (date, number, string). |
object_top_type | str | The object’s top_type. For literals, use "Literal". |
evidence | str (optional) | A source snippet supporting the triple. Surfaced in evidence chains. |
confidence | float (optional) | 0.0–1.0. Defaults to 1.0. |
passages.jsonl
Section titled “passages.jsonl”One JSON object per line.
{"id": "doc-0001", "text": "Pembrolizumab (Keytruda) is approved for...", "source": "FDA-Drug-Approvals/2014/pembrolizumab.pdf", "vec": [0.0123, -0.0456, 0.1234, ...]}| Field | Type | Notes |
|---|---|---|
id | str | Unique chunk id. Referenced by entities.jsonl’s passage_ids and triples.jsonl’s evidence. |
text | str | The passage content. Surfaced verbatim in read_chunk tool calls. |
source | str (optional) | Document/section identifier. Shown in evidence-chain UI. |
vec | list[float] (optional) | Pre-computed embedding. Must match your ontology’s owlv:vectorDimension. Omit to let the platform embed at ingest time. |
Upload via the CLI
Section titled “Upload via the CLI”-
Sign up + mint an API key. Quick start covers the first half.
-
Install the SDK.
Terminal window pip install owlgraph-core -
Create a database.
Terminal window owlgraph db create --name my-knowledge-graph --size sRecord the UUID it prints — you’ll pass it to subsequent commands.
-
Ingest your four files.
Terminal window owlgraph ingest custom <db_id> \--ontology ontology.ttl \--entities entities.jsonl \--triples triples.jsonl \--passages passages.jsonl \--follow--followstreams per-stage events to stdout. Without it, the command returns the job id and you tail it later withowlgraph ingest logs <db_id> <job_id> --follow. -
Query it.
from owlgraph_core import sdk as owldb = owl.connect(database_id="<db_id>", api_key="<key>")result = db.retrieve("Which drugs treat melanoma?")print(result.answer)for p in result.passages:print(p.source, p.text[:200])
Common gotchas
Section titled “Common gotchas”invalid_jsonl in entities.jsonl line 14 — one of your JSONL rows isn’t valid JSON. Check trailing commas, unescaped quotes, missing braces.
entity name not found: <foo> — a triple references a subject or object that isn’t in entities.jsonl. Make sure every entity in the triples has a row in entities.
Schema conflicts — if the ontology declares a property with a different type than your data writes (e.g. textEmbedding as string instead of float32vector), Dgraph rejects later writes. Drop the database and re-ingest cleanly if you change the ontology.
Body too large — the API rejects total payloads >50 MB. Either drop the vec field (platform will embed at ingest cost) or split the corpus into multiple databases.
What’s not in v1
Section titled “What’s not in v1”- Re-ingest of just one file. v1 reloads everything per call. Re-ingest deletes existing data first (drop_all on the alpha) — back up before re-running.
- S3-backed upload for >50 MB corpora. Tracked.
- LLM-driven entity/triple extraction. If you have only a corpus and want OWLGraph to build the graph for you, that’s Phase 2E.2 — talk to us about design-partner access.