# AGENTS.md — dbx Universal guide for AI coding agents working with this codebase. ## Overview `git.codelab.vc/pkg/dbx` is a Go PostgreSQL cluster library built on **pgx/v5**. It provides master/replica routing, automatic retries, load balancing, background health checking, panic-safe transactions, and context-based Querier injection. ## Package map ``` dbx/ Root — Cluster, Node, Balancer, retry, health, errors, tx, config ├── dbx.go Interfaces: Querier, DB, Logger, MetricsHook ├── cluster.go Cluster — routing, write/read operations ├── node.go Node — pgxpool.Pool wrapper with health state ├── balancer.go Balancer interface + RoundRobinBalancer ├── retry.go retrier — exponential backoff with jitter and node fallback ├── health.go healthChecker — background goroutine pinging nodes ├── tx.go RunTx, RunTxOptions, InjectQuerier, ExtractQuerier ├── errors.go Error classification (IsRetryable, IsConnectionError, etc.) ├── config.go Config, NodeConfig, PoolConfig, RetryConfig, HealthCheckConfig ├── options.go Functional options (WithLogger, WithMetrics, WithRetry, etc.) └── dbxtest/ └── dbxtest.go Test helpers: NewTestCluster, TestLogger ``` ## Routing architecture ``` ┌──────────────┐ │ Cluster │ └──────┬───────┘ │ ┌───────────────┴───────────────┐ │ │ Write ops Read ops Exec, Query, QueryRow ReadQuery, ReadQueryRow Begin, BeginTx, RunTx CopyFrom, SendBatch │ │ ▼ ▼ ┌──────────┐ ┌────────────────────────┐ │ Master │ │ Balancer → Replicas │ └──────────┘ │ fallback → Master │ └────────────────────────┘ Retry loop (retrier.do): For each attempt (up to MaxAttempts): For each node in [target nodes]: if healthy → execute → on retryable error → continue Backoff (exponential + jitter) ``` ## Common tasks ### Add a new node type (e.g., analytics replica) 1. Add a field to `Cluster` struct (e.g., `analytics []*Node`) 2. Add corresponding config to `Config` struct 3. Connect nodes in `NewCluster`, add to `all` slice for health checking 4. Add routing methods (e.g., `AnalyticsQuery`) ### Customize retry logic 1. Provide `RetryConfig.RetryableErrors` — custom `func(error) bool` classifier 2. Or modify `IsRetryable()` in `errors.go` to add new PG error codes 3. Adjust `MaxAttempts`, `BaseDelay`, `MaxDelay` in `RetryConfig` ### Add a metrics hook 1. Add a new callback field to `MetricsHook` struct in `dbx.go` 2. Call it at the appropriate point (nil-check the hook and the field) 3. See existing hooks in `cluster.go` (queryStart/queryEnd) and `health.go` (OnNodeDown/OnNodeUp) ### Add a new balancer strategy 1. Implement the `Balancer` interface: `Next(nodes []*Node) *Node` 2. Must return `nil` if no suitable node is available 3. Must check `node.IsHealthy()` to skip down nodes ## Gotchas - **Close() is required**: `Cluster.Close()` stops the health checker goroutine and closes all pools. Leaking a Cluster leaks goroutines and connections - **RunTx panic safety**: `runTx` uses `defer` with `recover()` — it rolls back on panic, then re-panics. Do not catch panics outside `RunTx` expecting the tx to be committed - **Context-based Querier injection**: `ExtractQuerier` returns the fallback if no Querier is in context. Always pass the cluster/pool as fallback so code works both inside and outside transactions - **Health checker goroutine**: Starts immediately in `NewCluster`. Uses `time.NewTicker` — the first check happens after one interval, not immediately. Nodes start as healthy (`healthy.Store(true)` in `newNode`) - **readNodes ordering**: `readNodes()` returns `[replicas..., master]` — the retrier tries replicas first, master is the last fallback - **errRow for closed cluster**: When cluster is closed, `QueryRow`/`ReadQueryRow` return `errRow{err: ErrClusterClosed}` — the error surfaces on `Scan()` - **No SQL parsing**: Routing is purely method-based. If you call `Exec` with a SELECT, it still goes to master ## Commands ```bash go build ./... # compile go test ./... # all tests go test -race ./... # tests with race detector go test -v -run TestName ./... # single test go vet ./... # static analysis ``` ## Conventions - **Struct-based Config** with `defaults()` method for zero-value defaults - **Functional options** (`Option func(*Config)`) used via `ApplyOptions` (primarily in dbxtest) - **stdlib only** testing — no testify, no gomock - **Thread safety** — `atomic.Bool` for `Node.healthy` and `Cluster.closed` - **dbxtest helpers** — `NewTestCluster` skips on unreachable DB, auto-closes via `t.Cleanup`; `TestLogger` routes to `testing.T` - **Sentinel errors** — `ErrNoHealthyNode`, `ErrClusterClosed`, `ErrRetryExhausted` - **retryError** uses multi-unwrap (`Unwrap() []error`) so both `ErrRetryExhausted` and the last error can be matched with `errors.Is`