Documentation

Guides

Step-by-step guides for common workflows: RAG pipelines, knowledge graph construction, hybrid queries, and cluster management.

Building a RAG pipeline with Veculo

Retrieval-Augmented Generation (RAG) improves LLM responses by retrieving relevant context from your own data before generating an answer. Veculo is purpose-built for this workflow because it combines vector similarity search with graph traversal, producing richer context than vector-only stores.

Step 1: Ingest your documents

Split your documents into chunks, generate embeddings with your model of choice, and insert them as vertices. Connect chunks that belong to the same document with edges.

Insert a document chunk with embeddingbash
# Insert a chunk of a research paper
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices/embedding" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "chunk:arxiv-2401.001-sec3-p2",
    "label": "chunk",
    "properties": {
      "document_id": "doc:arxiv-2401.001",
      "title": "Attention Is All You Need",
      "section": "3. Model Architecture",
      "paragraph": 2,
      "text": "The encoder maps an input sequence of symbol representations to a sequence of continuous representations. Given z, the decoder then generates an output sequence of symbols one element at a time."
    },
    "embedding": [0.023, -0.114, 0.891, ...],
    "visibility": "public"
  }'

Step 2: Build document structure edges

Connect chunks to their parent document and to each other to capture structural relationships:

Connect chunk to documentbash
# Link chunk to its parent document
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "chunk:arxiv-2401.001-sec3-p2",
    "target": "doc:arxiv-2401.001",
    "edge_type": "part_of",
    "properties": { "section": "3", "order": 2 },
    "visibility": "public"
  }'

# Link sequential chunks
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "chunk:arxiv-2401.001-sec3-p2",
    "target": "chunk:arxiv-2401.001-sec3-p3",
    "edge_type": "next_chunk",
    "visibility": "public"
  }'

Step 3: Query with hybrid vector + graph search

When a user asks a question, embed the query and search for similar chunks. Then traverse the graph to include surrounding chunks and the parent document for additional context:

RAG retrieval querybash
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.019, -0.108, 0.875, ...],
    "top_k": 5,
    "edge_type": "next_chunk",
    "depth": 1,
    "label": "chunk"
  }'

This returns the 5 most similar chunks, plus their neighboring chunks via next_chunk edges. You now have a wider context window to pass to your LLM.

Why graph + vector beats vector alone

Pure vector search retrieves semantically similar chunks in isolation. With Veculo, you also get structural context: the paragraphs before and after each match, the parent document metadata, and related documents via citation edges. This typically improves RAG answer quality by 20-40%.

Step 4: Assemble the LLM prompt

Take the retrieved chunks and their graph context, format them into a prompt, and send them to your LLM. A simple template:

Prompt templatetext
You are a research assistant. Answer the user's question using only the
context below. If the context doesn't contain the answer, say so.

Context:
---
[Document: "Attention Is All You Need" by Vaswani et al., 2017]

Section 3, Paragraph 2:
The encoder maps an input sequence of symbol representations to a sequence
of continuous representations. Given z, the decoder then generates an
output sequence of symbols one element at a time.

Section 3, Paragraph 3:
At each step the model is auto-regressive, consuming the previously
generated symbols as additional input when generating the next.
---

Question: How does the transformer decoder generate output?

Knowledge graph construction

A knowledge graph represents entities and their relationships as a structured graph. Veculo makes it straightforward to build and query knowledge graphs at scale.

Define your entity types

Plan the types of vertices and edges in your graph. For an academic knowledge graph:

Vertex labelExample properties
papertitle, abstract, year, doi, venue
authorname, affiliation, orcid
institutionname, country, type
conceptname, domain, description
Edge typeFromTo
authored_bypaperauthor
citespaperpaper
affiliated_withauthorinstitution
discussespaperconcept

Ingest entities and relationships

Build a knowledge graphbash
# Add a paper
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices/embedding" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "paper:arxiv-2401.001",
    "label": "paper",
    "properties": {
      "title": "Attention Is All You Need",
      "year": 2017,
      "venue": "NeurIPS"
    },
    "embedding": [0.023, -0.114, 0.891, ...],
    "visibility": "public"
  }'

# Add an author
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "author:vaswani",
    "label": "author",
    "properties": {
      "name": "Ashish Vaswani",
      "affiliation": "Google Brain"
    },
    "visibility": "public"
  }'

# Connect them
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "paper:arxiv-2401.001",
    "target": "author:vaswani",
    "edge_type": "authored_by",
    "properties": { "position": "first" },
    "visibility": "public"
  }'

Query the knowledge graph

Find all papers by a specific author, or discover the citation network around a topic:

Traverse the graphbash
# Find all papers authored by Vaswani and who they cite
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/neighbors" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vertex_id": "author:vaswani",
    "edge_type": "authored_by",
    "depth": 2,
    "direction": "in"
  }'

Hybrid vector + graph queries

Hybrid queries combine the strengths of vector similarity search and graph traversal. Here are common patterns:

Pattern 1: Similarity + citation graph

Find papers semantically similar to a query, then traverse the citation graph to discover related work:

Similarity + citationsbash
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.045, -0.223, 0.667, ...],
    "top_k": 10,
    "edge_type": "cites",
    "depth": 2,
    "min_score": 0.75,
    "label": "paper"
  }'

Pattern 2: Find similar, then group by author

Search for similar documents, then follow authored_by edges to discover prolific authors in a research area:

Similarity + author graphbash
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.045, -0.223, 0.667, ...],
    "top_k": 20,
    "edge_type": "authored_by",
    "depth": 1,
    "label": "paper"
  }'

Pattern 3: Context expansion for RAG

Find the most relevant chunk, then walk next_chunk and part_of edges to gather a wider context window:

Context expansionbash
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
  -H "Authorization: Bearer $VECULO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.012, -0.089, 0.934, ...],
    "top_k": 3,
    "edge_type": "next_chunk",
    "depth": 2
  }'

Multiple edge types

Currently, each query supports a single edge type for traversal. To traverse multiple edge types, make separate API calls and merge the results. Multi-edge traversal in a single query is on the roadmap.

Scaling your cluster

Veculo clusters scale by adjusting the number of Veculo Units (VUs). Scaling is live — no downtime required.

When to scale up

  • Query latency increases — If your p99 latency is consistently above 100ms, adding VUs distributes the scan load across more tablet servers
  • Throughput ceiling — If you are hitting rate limits or seeing queued requests
  • Large graph traversals — Deep traversals (depth 3+) benefit from more tablet servers

How to scale

In the dashboard, navigate to your cluster and click Scale. Choose the new VU count and confirm. Veculo will:

  1. Add new tablet servers to the cluster
  2. Rebalance tablets across all servers
  3. Update the load balancer to include the new servers

Rebalancing takes a few minutes depending on data size. Your cluster remains fully available during this process — reads and writes continue without interruption.

Scaling down

You can also reduce VUs to save costs. Veculo migrates tablets off the servers being removed before shutting them down, ensuring no data is lost.

Managing API keys

API keys are managed in the dashboard under Settings → API Keys.

Key types

PrefixTypePermissions
vk_live_ProductionFull read/write access
vk_test_TestRead-only access

Best practices

  • Use separate keys per service — If your ingestion pipeline and query API are separate services, give each its own key with appropriate authorizations
  • Rotate keys regularly — Generate new keys and deprecate old ones on a regular cadence
  • Minimize authorizations — Give each key only the authorizations it needs. A read-only analytics service should not have "admin" authorizations.
  • Use environment variables — Never hardcode API keys in source code. Use environment variables or a secrets manager.

Key revocation

Revoking a key takes effect immediately. All in-flight requests using the revoked key will be rejected. Make sure the new key is deployed before revoking the old one.