Guides
Step-by-step guides for common workflows: RAG pipelines, knowledge graph construction, hybrid queries, and cluster management.
Building a RAG pipeline with Veculo
Retrieval-Augmented Generation (RAG) improves LLM responses by retrieving relevant context from your own data before generating an answer. Veculo is purpose-built for this workflow because it combines vector similarity search with graph traversal, producing richer context than vector-only stores.
Step 1: Ingest your documents
Split your documents into chunks, generate embeddings with your model of choice, and insert them as vertices. Connect chunks that belong to the same document with edges.
# Insert a chunk of a research paper
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices/embedding" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "chunk:arxiv-2401.001-sec3-p2",
"label": "chunk",
"properties": {
"document_id": "doc:arxiv-2401.001",
"title": "Attention Is All You Need",
"section": "3. Model Architecture",
"paragraph": 2,
"text": "The encoder maps an input sequence of symbol representations to a sequence of continuous representations. Given z, the decoder then generates an output sequence of symbols one element at a time."
},
"embedding": [0.023, -0.114, 0.891, ...],
"visibility": "public"
}'Step 2: Build document structure edges
Connect chunks to their parent document and to each other to capture structural relationships:
# Link chunk to its parent document
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": "chunk:arxiv-2401.001-sec3-p2",
"target": "doc:arxiv-2401.001",
"edge_type": "part_of",
"properties": { "section": "3", "order": 2 },
"visibility": "public"
}'
# Link sequential chunks
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": "chunk:arxiv-2401.001-sec3-p2",
"target": "chunk:arxiv-2401.001-sec3-p3",
"edge_type": "next_chunk",
"visibility": "public"
}'Step 3: Query with hybrid vector + graph search
When a user asks a question, embed the query and search for similar chunks. Then traverse the graph to include surrounding chunks and the parent document for additional context:
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"embedding": [0.019, -0.108, 0.875, ...],
"top_k": 5,
"edge_type": "next_chunk",
"depth": 1,
"label": "chunk"
}'This returns the 5 most similar chunks, plus their neighboring chunks via next_chunk edges. You now have a wider context window to pass to your LLM.
Why graph + vector beats vector alone
Step 4: Assemble the LLM prompt
Take the retrieved chunks and their graph context, format them into a prompt, and send them to your LLM. A simple template:
You are a research assistant. Answer the user's question using only the context below. If the context doesn't contain the answer, say so. Context: --- [Document: "Attention Is All You Need" by Vaswani et al., 2017] Section 3, Paragraph 2: The encoder maps an input sequence of symbol representations to a sequence of continuous representations. Given z, the decoder then generates an output sequence of symbols one element at a time. Section 3, Paragraph 3: At each step the model is auto-regressive, consuming the previously generated symbols as additional input when generating the next. --- Question: How does the transformer decoder generate output?
Knowledge graph construction
A knowledge graph represents entities and their relationships as a structured graph. Veculo makes it straightforward to build and query knowledge graphs at scale.
Define your entity types
Plan the types of vertices and edges in your graph. For an academic knowledge graph:
| Vertex label | Example properties |
|---|---|
paper | title, abstract, year, doi, venue |
author | name, affiliation, orcid |
institution | name, country, type |
concept | name, domain, description |
| Edge type | From | To |
|---|---|---|
authored_by | paper | author |
cites | paper | paper |
affiliated_with | author | institution |
discusses | paper | concept |
Ingest entities and relationships
# Add a paper
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices/embedding" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "paper:arxiv-2401.001",
"label": "paper",
"properties": {
"title": "Attention Is All You Need",
"year": 2017,
"venue": "NeurIPS"
},
"embedding": [0.023, -0.114, 0.891, ...],
"visibility": "public"
}'
# Add an author
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/vertices" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "author:vaswani",
"label": "author",
"properties": {
"name": "Ashish Vaswani",
"affiliation": "Google Brain"
},
"visibility": "public"
}'
# Connect them
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/edges" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": "paper:arxiv-2401.001",
"target": "author:vaswani",
"edge_type": "authored_by",
"properties": { "position": "first" },
"visibility": "public"
}'Query the knowledge graph
Find all papers by a specific author, or discover the citation network around a topic:
# Find all papers authored by Vaswani and who they cite
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/neighbors" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"vertex_id": "author:vaswani",
"edge_type": "authored_by",
"depth": 2,
"direction": "in"
}'Hybrid vector + graph queries
Hybrid queries combine the strengths of vector similarity search and graph traversal. Here are common patterns:
Pattern 1: Similarity + citation graph
Find papers semantically similar to a query, then traverse the citation graph to discover related work:
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"embedding": [0.045, -0.223, 0.667, ...],
"top_k": 10,
"edge_type": "cites",
"depth": 2,
"min_score": 0.75,
"label": "paper"
}'Pattern 2: Find similar, then group by author
Search for similar documents, then follow authored_by edges to discover prolific authors in a research area:
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"embedding": [0.045, -0.223, 0.667, ...],
"top_k": 20,
"edge_type": "authored_by",
"depth": 1,
"label": "paper"
}'Pattern 3: Context expansion for RAG
Find the most relevant chunk, then walk next_chunk and part_of edges to gather a wider context window:
curl -X POST "https://api.veculo.com/v1/$VECULO_CLUSTER_ID/query/vector" \
-H "Authorization: Bearer $VECULO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"embedding": [0.012, -0.089, 0.934, ...],
"top_k": 3,
"edge_type": "next_chunk",
"depth": 2
}'Multiple edge types
Scaling your cluster
Veculo clusters scale by adjusting the number of Veculo Units (VUs). Scaling is live — no downtime required.
When to scale up
- Query latency increases — If your p99 latency is consistently above 100ms, adding VUs distributes the scan load across more tablet servers
- Throughput ceiling — If you are hitting rate limits or seeing queued requests
- Large graph traversals — Deep traversals (depth 3+) benefit from more tablet servers
How to scale
In the dashboard, navigate to your cluster and click Scale. Choose the new VU count and confirm. Veculo will:
- Add new tablet servers to the cluster
- Rebalance tablets across all servers
- Update the load balancer to include the new servers
Rebalancing takes a few minutes depending on data size. Your cluster remains fully available during this process — reads and writes continue without interruption.
Scaling down
You can also reduce VUs to save costs. Veculo migrates tablets off the servers being removed before shutting them down, ensuring no data is lost.
Managing API keys
API keys are managed in the dashboard under Settings → API Keys.
Key types
| Prefix | Type | Permissions |
|---|---|---|
vk_live_ | Production | Full read/write access |
vk_test_ | Test | Read-only access |
Best practices
- Use separate keys per service — If your ingestion pipeline and query API are separate services, give each its own key with appropriate authorizations
- Rotate keys regularly — Generate new keys and deprecate old ones on a regular cadence
- Minimize authorizations — Give each key only the authorizations it needs. A read-only analytics service should not have "admin" authorizations.
- Use environment variables — Never hardcode API keys in source code. Use environment variables or a secrets manager.
Key revocation