Concepts¶
The mental model in four pieces.
1. The workflow graph¶
A workflow is the logical agent — a set of steps with data dependencies. Each step is one LLM call. Edges represent "step B's input comes from step A's output."
A typical RAG agent:
Some workflows have parallelism. If two steps don't depend on each other, they can run concurrently:
Sploink doesn't author workflows. It consumes them — either explicit ones from a framework like LangGraph / DSPy, or implicit ones inferred from observed LLM calls.
2. The substrate graph¶
A substrate is a compute pool sploink can route to. Each has a different cost/latency profile and supports different models:
| Substrate type | Examples | Cost | Latency |
|---|---|---|---|
| Edge | Ollama on your laptop | $0 (electricity) | 5–15s/call |
| LPU | Groq | low | ~200ms/call |
| GPU | Together, Modal | medium | ~500ms/call |
| Frontier | Anthropic, OpenAI | high | ~1–3s/call |
The substrate graph is the set of substrates available in a given deployment. A single-laptop developer has 1–2 substrates; an enterprise customer might have all four.
3. The router¶
The router decides, for each workflow node, which substrate it should run on. Today this is a static rule table:
# sploink/router.py
DEFAULT_RULES = {
"classify": ("ollama", "qwen2.5:7b"),
"rerank": ("ollama", "qwen2.5:7b"),
"extract": ("ollama", "qwen2.5:7b"),
"verify": ("ollama", "qwen2.5:7b"),
"reason": ("groq", "llama-3.3-70b-versatile"),
# ...
}
The output of routing is an execution plan — the workflow with every node assigned to a specific substrate. This is the actual runnable artifact.
Future versions of the router will learn from observed telemetry (which routes preserved quality, which didn't) instead of using a static table.
4. The trace¶
Every LLM call produces a CallRecord:
class CallRecord(BaseModel):
call_id: str
workflow_id: str
step_index: int
step_label: str # 'classify', 'rerank', 'extract', ...
model: str
tokens_in: int
tokens_out: int
latency_ms: float
cost_usd: float
substrate: str # 'ollama' / 'groq' / ...
hardware_type: str # 'edge' / 'lpu' / 'gpu' / 'frontier_api'
Records are scoped to the current workflow (via contextvars.ContextVar, so async tasks don't interleave), kept in memory for fast queries, and persisted to JSONL on disk for offline analysis.
How they compose¶
Workflow IR (logical agent) Substrate graph (available compute)
│ │
└───────────┬───────────────────────────┘
▼
Router
│
▼
Execution plan (workflow with per-node substrate assignment)
│
▼
Executor
│
▼
Trace (CallRecords per workflow)
The router is the only place sploink makes a decision. Everything else is plumbing.
Visualizing the assignment¶
The bipartite mapping between workflow and substrate is sploink's central abstraction. The architecture viewer renders it directly:
Workflow on the left, substrate on the right, solid blue edges in the middle showing which step is assigned to which substrate. Switch strategies to see how each one redirects the workload.
Configuration — explicit inputs¶
Everything described above has sensible defaults; you can also configure each layer explicitly at runtime. Sploink is configured in Python, not in a config file or UI (a UI lives in the future hosted product, not the open-source SDK).
Wire sploink in (always)¶
import sploink
sploink.wrap() # idempotent — patches every supported SDK at import time
sploink.enable_routing() # opt into Layer-1 routing; without this it's observe-only
sploink.disable_routing() # turn routing back off (observations still recorded)
sploink.is_routing_enabled()
Scope a workflow¶
with sploink.workflow() as wf:
# ... any number of LLM calls ...
pass
records = wf.records() # list[CallRecord]
graph = wf.graph(method="overlap") # sploink.Graph inferred from trace
summary = wf.summary() # cost / latency / per-step aggregates
Pin a specific workflow_id (e.g. for FastAPI request-scoping):
sploink.trace.set_workflow_id(request.headers["X-Request-Id"])
# ... calls inside this context now belong to that workflow_id
Force a step label when the heuristic misclassifies¶
with sploink.step("classify"):
client.chat.completions.create(...) # forced label, regardless of prompt content
Customize the routing table (Layer 1)¶
The defaults are in sploink.router.DEFAULT_RULES. Mutate them, or pass a custom rules dict per call.
from sploink.router import Route, DEFAULT_RULES
# Mutate the global table — affects every subsequent call
DEFAULT_RULES["classify"] = Route(substrate="ollama", model="qwen2.5:7b")
DEFAULT_RULES["reason"] = Route(substrate="groq", model="llama-3.3-70b-versatile")
# Or use a custom dict for one specific lookup, without touching globals
my_rules = {
"classify": Route("ollama", "llama3.1:8b"),
"reason": Route("groq", "llama-3.1-8b-instant"),
}
route = sploink.router.choose("classify", rules=my_rules)
Any step label not present in your rules dict falls through to sploink.router.FALLBACK.
Customize substrate instances (Layer 2 — bench-side today)¶
The SUBSTRATE_INSTANCES catalog in bench/strategies.py maps each hardware_type to a list of provider instances. Today this is edited in source code; a runtime API is planned.
# bench/strategies.py — edit this dict to add providers
SUBSTRATE_INSTANCES = {
"cpu": [
SubstrateInstance(provider="ollama", model="llama3.1:8b"),
# add a second CPU provider here when you want fallback:
# SubstrateInstance(provider="salad", model="llama3.1:8b"),
],
"lpu": [
SubstrateInstance(provider="groq", model="llama-3.1-8b-instant"),
],
}
select_substrate(hardware_type) currently picks the first instance. Future selection logic (availability-aware, cost-aware, region-aware) will live behind that same function.
Trace storage location¶
Traces persist to ~/.sploink/traces/<workflow_id>.jsonl by default. Override via env var:
export SPLOINK_TRACE_DIR=/path/to/my/traces
# or to suppress persistence in CI/tests:
export SPLOINK_TRACE_DIR=/dev/null
Build a Graph by hand¶
Useful when you're authoring an experimental workflow or want to inspect topology before running:
from sploink.graph import Graph, Node
g = Graph(
nodes=(
Node(id="classify", step="classify", max_tokens=8, build_prompt=lambda ex, st: ...),
Node(id="rerank", step="rerank", max_tokens=400, build_prompt=lambda ex, st: ...),
# ...
),
edges=(("classify", "rerank"), ...),
answer_node="rerank",
)
print(g.topological_layers()) # the parallelism structure of your DAG
Graph.from_trace(records, method="overlap") produces the inverse — observed calls → inferred Graph — for analysis (the inferred Graph can be visualized but isn't re-executable; the original build_prompt closures aren't recoverable from the trace).
The one rule of thumb¶
If you can write it as Python, sploink can be configured with it. There's no separate config file, no YAML, no UI. The configuration surface is
sploink.*andsploink.router.*.
A hosted dashboard for routing-rule editing, multi-tenant policy management, and managed substrate catalogs lives in the (private) sploink-cloud repo and isn't built yet — that's the future product surface, not v0.1.