Bring your own data

You can ingest your own typed knowledge graph into OWLGraph and query it via the SDK. v1 supports the “I already have typed data” path — you bring four files (ontology + entities + triples + passages); OWLGraph loads them. If you have only a corpus and want OWLGraph to induce the ontology + triples, that’s Phase 2E.2 (LLM-driven extraction) and not shipping yet.

What you need

File	Format	What it is
`ontology.ttl`	OWL 2 RL Turtle	Your classes + properties. Defines the typed graph schema.
`entities.jsonl`	One JSON per line	Your canonical entities + their type + aliases + the passages that mention them.
`triples.jsonl`	One JSON per line	Typed edges between entities.
`passages.jsonl`	One JSON per line	Your corpus chunks. Embeddings optional but recommended.

Total size limit today: 50 MB combined (nginx body cap). With 1536-dim embeddings, that’s room for ~3000 passages. Larger corpora need the S3-backed upload path which isn’t built yet.

File details

`ontology.ttl`

Standard OWL 2 RL Turtle. Declare your classes, properties, and (optionally) owlcore:textEmbedding if you want HNSW vector search.

@prefix : <http://your-domain.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owlcore: <https://owlgraph.ai/owl/2026/core#> .
@prefix owlv: <https://owlgraph.ai/owl/2026/vector#> .

:Drug a owl:Class .
:Disease a owl:Class .
:Approval a owl:Class .

:treats a owl:ObjectProperty ;
    rdfs:domain :Drug ;
    rdfs:range :Disease .

:approvedFor a owl:ObjectProperty ;
    rdfs:domain :Drug ;
    rdfs:range :Disease .

owlcore:textEmbedding a owl:DatatypeProperty ;
    rdfs:domain owlcore:Passage ;
    rdfs:range owlv:Float32Vector ;
    owlv:vectorDimension 1536 ;
    owlv:vectorIndex "hnsw(metric: \"cosine\", exponent: \"5\")" ;
    owlv:embedderHint "openai/text-embedding-3-small" .

`entities.jsonl`

One JSON object per line. Required: name, top_type. Optional but recommended: aliases, passage_ids.

{"name": "Pembrolizumab", "top_type": "Drug", "all_types": ["Drug"],
 "aliases": ["Keytruda", "MK-3475"],
 "passage_ids": ["doc-0001", "doc-0017", "doc-0042"]}
{"name": "Melanoma", "top_type": "Disease", "all_types": ["Disease"],
 "aliases": ["malignant melanoma"],
 "passage_ids": ["doc-0001", "doc-0042"]}

Field	Type	Notes
`name`	str	Canonical surface form. Used as the primary key.
`name_lower`	str (optional)	Lowercased. The loader fills this in if absent.
`top_type`	str	The entity’s primary ontology class. Must match a class in your `ontology.ttl`.
`all_types`	list[str] (optional)	Full type set. Defaults to `[top_type]`.
`aliases`	list[str] (optional)	Alternative surface forms for entity_search.
`passage_ids`	list[str]	Ids of passages mentioning this entity. Becomes `mentions/` edges in the graph.

`triples.jsonl`

One JSON object per line. Required: subject, subject_top_type, predicate, object, object_top_type.

{"subject": "Pembrolizumab", "subject_top_type": "Drug",
 "predicate": "treats",
 "object": "Melanoma", "object_top_type": "Disease",
 "evidence": "approved for unresectable or metastatic melanoma",
 "confidence": 1.0}

Field	Type	Notes
`subject`	str	Must match an entity name from `entities.jsonl`.
`subject_top_type`	str	The subject’s `top_type` (used to disambiguate).
`predicate`	str	An OWL ObjectProperty or DatatypeProperty from your ontology.
`object`	str	Either another entity name OR a literal value (date, number, string).
`object_top_type`	str	The object’s `top_type`. For literals, use `"Literal"`.
`evidence`	str (optional)	A source snippet supporting the triple. Surfaced in evidence chains.
`confidence`	float (optional)	0.0–1.0. Defaults to 1.0.

`passages.jsonl`

One JSON object per line.

{"id": "doc-0001", "text": "Pembrolizumab (Keytruda) is approved for...",
 "source": "FDA-Drug-Approvals/2014/pembrolizumab.pdf",
 "vec": [0.0123, -0.0456, 0.1234, ...]}

Field	Type	Notes
`id`	str	Unique chunk id. Referenced by `entities.jsonl`’s `passage_ids` and `triples.jsonl`’s `evidence`.
`text`	str	The passage content. Surfaced verbatim in `read_chunk` tool calls.
`source`	str (optional)	Document/section identifier. Shown in evidence-chain UI.
`vec`	list[float] (optional)	Pre-computed embedding. Must match your ontology’s `owlv:vectorDimension`. Omit to let the platform embed at ingest time.

Upload via the CLI

Sign up + mint an API key. Quick start covers the first half.
Install the SDK.
Terminal window
```
pip install owlgraph-core
```
Create a database.
Terminal window
```
owlgraph db create --name my-knowledge-graph --size s
```
Record the UUID it prints — you’ll pass it to subsequent commands.

Ingest your four files.

owlgraph ingest custom <db_id> \
    --ontology  ontology.ttl \
    --entities  entities.jsonl \
    --triples   triples.jsonl \
    --passages  passages.jsonl \
    --follow

--follow streams per-stage events to stdout. Without it, the command returns the job id and you tail it later with owlgraph ingest logs <db_id> <job_id> --follow.

Query it.

from owlgraph_core import sdk as owl

db = owl.connect(database_id="<db_id>", api_key="<key>")
result = db.retrieve("Which drugs treat melanoma?")
print(result.answer)
for p in result.passages:
    print(p.source, p.text[:200])

Common gotchas

invalid_jsonl in entities.jsonl line 14 — one of your JSONL rows isn’t valid JSON. Check trailing commas, unescaped quotes, missing braces.

entity name not found: <foo> — a triple references a subject or object that isn’t in entities.jsonl. Make sure every entity in the triples has a row in entities.

Schema conflicts — if the ontology declares a property with a different type than your data writes (e.g. textEmbedding as string instead of float32vector), Dgraph rejects later writes. Drop the database and re-ingest cleanly if you change the ontology.

Body too large — the API rejects total payloads >50 MB. Either drop the vec field (platform will embed at ingest cost) or split the corpus into multiple databases.

What’s not in v1

Re-ingest of just one file. v1 reloads everything per call. Re-ingest deletes existing data first (drop_all on the alpha) — back up before re-running.
S3-backed upload for >50 MB corpora. Tracked.
LLM-driven entity/triple extraction. If you have only a corpus and want OWLGraph to build the graph for you, that’s Phase 2E.2 — talk to us about design-partner access.