Core Concepts
The fundamental building blocks of Veculo: vertices, edges, vector embeddings, cell-level security, and cluster scaling.
What is a graph database?
A graph database stores data as vertices (nodes) and edges (relationships between nodes). Unlike relational databases, where relationships require expensive JOINs, graph databases make relationship traversal a first-class operation. This makes them ideal for:
- Knowledge graphs — modeling entities and their relationships (people, organizations, documents, concepts)
- RAG pipelines — combining semantic search with structural context for better retrieval
- Recommendation engines — traversing user-item-tag graphs for personalized suggestions
- Fraud detection — identifying suspicious patterns across interconnected entities
Veculo adds vector embeddings to the graph model, enabling hybrid queries that combine similarity search with graph traversal. This is something traditional graph databases cannot do natively.
Vertices
A vertex represents an entity in your graph. Every vertex has:
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier. You choose the format — we recommend prefixed IDs like doc:arxiv-2401.001 or user:u_7f3a. |
label | string | The type of entity (e.g., "document", "person", "concept"). Used for filtering and schema validation. |
properties | object | Arbitrary key-value pairs. Values can be strings, numbers, booleans, or arrays. |
visibility | string | A visibility expression controlling who can read this vertex. Optional — defaults to the cluster's default visibility. |
embedding | float[] | Optional vector embedding for similarity search. See Vector embeddings. |
Vertex IDs are unique within a cluster. Inserting a vertex with an existing ID will update the existing vertex (upsert behavior).
Edges
An edge represents a directed relationship between two vertices. Edges connect a source vertex to a target vertex and have a type that describes the relationship.
| Field | Type | Description |
|---|---|---|
source | string | The ID of the source vertex. |
target | string | The ID of the target vertex. |
edge_type | string | The kind of relationship (e.g., "cites", "authored_by", "related_to"). |
properties | object | Arbitrary key-value pairs on the edge itself (e.g., weight, timestamp, context). |
visibility | string | Visibility expression for the edge. Can differ from the vertices it connects. |
Edges are directed: an edge from A to B is distinct from an edge from B to A. If you need a bidirectional relationship, create two edges.
Edge uniqueness
Properties
Both vertices and edges can carry arbitrary key-value properties. These are stored as a JSON object and can contain:
- Strings —
"title": "Attention Is All You Need" - Numbers —
"year": 2017 - Booleans —
"peer_reviewed": true - Arrays —
"tags": ["nlp", "transformers"]
Properties are stored alongside the graph structure in Accumulo's sorted key-value store, ensuring they are always read together with the vertex or edge — no secondary lookups required.
Vector embeddings
A vector embedding is a fixed-length array of floating-point numbers that represents the semantic meaning of an entity. Embeddings are typically generated by an embedding model (such as OpenAI's text-embedding-3-small or Cohere's embed-v3).
When you attach an embedding to a vertex, Veculo indexes it for approximate nearest-neighbor (ANN) search. You can then query for vertices whose embeddings are closest to a given query vector.
{
"id": "doc:arxiv-2401.001",
"label": "document",
"properties": {
"title": "Attention Is All You Need"
},
"embedding": [0.023, -0.114, 0.891, 0.445, -0.067, ...],
"visibility": "public"
}Key characteristics of vector search in Veculo:
- Dimension flexibility — Veculo supports embeddings of any dimension. All embeddings within a cluster must have the same dimension.
- Cosine similarity — Results are ranked by cosine similarity, returned as a score between 0 and 1.
- Graph-aware — Vector search results can be enriched with graph context by specifying an edge type and traversal depth.
Hybrid queries
Cell-level security (ABAC)
Every vertex and edge in Veculo carries an optional visibility expression — a boolean expression that determines which users can see that piece of data. This is attribute-based access control (ABAC) enforced at the storage layer.
For details on visibility syntax and how to use it, see the Security & ABAC page.
Veculo Units (VUs)
A Veculo Unit (VU) is the unit of compute and storage capacity for your cluster. Each VU provides a fixed amount of:
- CPU and memory for query processing
- Tablet server capacity for read/write throughput
- Storage bandwidth for scan operations
You choose how many VUs your cluster runs. More VUs means more throughput, more concurrent connections, and faster scan performance. Veculo distributes your data across tablet servers proportionally to your VU count.
| Tier | VUs | Best for |
|---|---|---|
| Starter | 2 | Development, prototyping, low-traffic applications |
| Growth | 4 – 8 | Production applications with moderate throughput |
| Scale | 12 – 32 | High-throughput workloads, large knowledge graphs |
| Enterprise | Custom | Dedicated infrastructure, compliance requirements |
Scaling is live — you can add or remove VUs without downtime. Veculo rebalances tablets across the new tablet server count automatically.
Clusters
A cluster is a dedicated, isolated Veculo deployment. Each cluster runs its own:
- Accumulo instance (manager, tablet servers, garbage collector)
- ZooKeeper ensemble for coordination
- Dedicated GCS storage prefix for data isolation
Clusters are fully isolated from each other — there is no shared infrastructure between tenants beyond the underlying cloud platform. This ensures strong security boundaries and predictable performance.
Each cluster is identified by a unique ID (e.g., cls_abc123) and runs in the region you select at creation time. Clusters can be paused and resumed to save costs during periods of inactivity.