Skip to content

Bring your own data

You can ingest your own typed knowledge graph into OWLGraph and query it via the SDK. v1 supports the “I already have typed data” path — you bring four files (ontology + entities + triples + passages); OWLGraph loads them. If you have only a corpus and want OWLGraph to induce the ontology + triples, that’s Phase 2E.2 (LLM-driven extraction) and not shipping yet.

FileFormatWhat it is
ontology.ttlOWL 2 RL TurtleYour classes + properties. Defines the typed graph schema.
entities.jsonlOne JSON per lineYour canonical entities + their type + aliases + the passages that mention them.
triples.jsonlOne JSON per lineTyped edges between entities.
passages.jsonlOne JSON per lineYour corpus chunks. Embeddings optional but recommended.

Total size limit today: 50 MB combined (nginx body cap). With 1536-dim embeddings, that’s room for ~3000 passages. Larger corpora need the S3-backed upload path which isn’t built yet.

Standard OWL 2 RL Turtle. Declare your classes, properties, and (optionally) owlcore:textEmbedding if you want HNSW vector search.

@prefix : <http://your-domain.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owlcore: <https://owlgraph.ai/owl/2026/core#> .
@prefix owlv: <https://owlgraph.ai/owl/2026/vector#> .
:Drug a owl:Class .
:Disease a owl:Class .
:Approval a owl:Class .
:treats a owl:ObjectProperty ;
rdfs:domain :Drug ;
rdfs:range :Disease .
:approvedFor a owl:ObjectProperty ;
rdfs:domain :Drug ;
rdfs:range :Disease .
owlcore:textEmbedding a owl:DatatypeProperty ;
rdfs:domain owlcore:Passage ;
rdfs:range owlv:Float32Vector ;
owlv:vectorDimension 1536 ;
owlv:vectorIndex "hnsw(metric: \"cosine\", exponent: \"5\")" ;
owlv:embedderHint "openai/text-embedding-3-small" .

One JSON object per line. Required: name, top_type. Optional but recommended: aliases, passage_ids.

{"name": "Pembrolizumab", "top_type": "Drug", "all_types": ["Drug"],
"aliases": ["Keytruda", "MK-3475"],
"passage_ids": ["doc-0001", "doc-0017", "doc-0042"]}
{"name": "Melanoma", "top_type": "Disease", "all_types": ["Disease"],
"aliases": ["malignant melanoma"],
"passage_ids": ["doc-0001", "doc-0042"]}
FieldTypeNotes
namestrCanonical surface form. Used as the primary key.
name_lowerstr (optional)Lowercased. The loader fills this in if absent.
top_typestrThe entity’s primary ontology class. Must match a class in your ontology.ttl.
all_typeslist[str] (optional)Full type set. Defaults to [top_type].
aliaseslist[str] (optional)Alternative surface forms for entity_search.
passage_idslist[str]Ids of passages mentioning this entity. Becomes mentions/ edges in the graph.

One JSON object per line. Required: subject, subject_top_type, predicate, object, object_top_type.

{"subject": "Pembrolizumab", "subject_top_type": "Drug",
"predicate": "treats",
"object": "Melanoma", "object_top_type": "Disease",
"evidence": "approved for unresectable or metastatic melanoma",
"confidence": 1.0}
FieldTypeNotes
subjectstrMust match an entity name from entities.jsonl.
subject_top_typestrThe subject’s top_type (used to disambiguate).
predicatestrAn OWL ObjectProperty or DatatypeProperty from your ontology.
objectstrEither another entity name OR a literal value (date, number, string).
object_top_typestrThe object’s top_type. For literals, use "Literal".
evidencestr (optional)A source snippet supporting the triple. Surfaced in evidence chains.
confidencefloat (optional)0.0–1.0. Defaults to 1.0.

One JSON object per line.

{"id": "doc-0001", "text": "Pembrolizumab (Keytruda) is approved for...",
"source": "FDA-Drug-Approvals/2014/pembrolizumab.pdf",
"vec": [0.0123, -0.0456, 0.1234, ...]}
FieldTypeNotes
idstrUnique chunk id. Referenced by entities.jsonl’s passage_ids and triples.jsonl’s evidence.
textstrThe passage content. Surfaced verbatim in read_chunk tool calls.
sourcestr (optional)Document/section identifier. Shown in evidence-chain UI.
veclist[float] (optional)Pre-computed embedding. Must match your ontology’s owlv:vectorDimension. Omit to let the platform embed at ingest time.
  1. Sign up + mint an API key. Quick start covers the first half.

  2. Install the SDK.

    Terminal window
    pip install owlgraph-core
  3. Create a database.

    Terminal window
    owlgraph db create --name my-knowledge-graph --size s

    Record the UUID it prints — you’ll pass it to subsequent commands.

  4. Ingest your four files.

    Terminal window
    owlgraph ingest custom <db_id> \
    --ontology ontology.ttl \
    --entities entities.jsonl \
    --triples triples.jsonl \
    --passages passages.jsonl \
    --follow

    --follow streams per-stage events to stdout. Without it, the command returns the job id and you tail it later with owlgraph ingest logs <db_id> <job_id> --follow.

  5. Query it.

    from owlgraph_core import sdk as owl
    db = owl.connect(database_id="<db_id>", api_key="<key>")
    result = db.retrieve("Which drugs treat melanoma?")
    print(result.answer)
    for p in result.passages:
    print(p.source, p.text[:200])

invalid_jsonl in entities.jsonl line 14 — one of your JSONL rows isn’t valid JSON. Check trailing commas, unescaped quotes, missing braces.

entity name not found: <foo> — a triple references a subject or object that isn’t in entities.jsonl. Make sure every entity in the triples has a row in entities.

Schema conflicts — if the ontology declares a property with a different type than your data writes (e.g. textEmbedding as string instead of float32vector), Dgraph rejects later writes. Drop the database and re-ingest cleanly if you change the ontology.

Body too large — the API rejects total payloads >50 MB. Either drop the vec field (platform will embed at ingest cost) or split the corpus into multiple databases.

  • Re-ingest of just one file. v1 reloads everything per call. Re-ingest deletes existing data first (drop_all on the alpha) — back up before re-running.
  • S3-backed upload for >50 MB corpora. Tracked.
  • LLM-driven entity/triple extraction. If you have only a corpus and want OWLGraph to build the graph for you, that’s Phase 2E.2 — talk to us about design-partner access.