Some checks failed
CI / test (push) Failing after 13s
- slog.go: SlogLogger adapts *slog.Logger to dbx.Logger interface - scan.go: Collect[T] and CollectOne[T] generic helpers using pgx.RowToStructByName - cluster.go: slow query logging via Config.SlowQueryThreshold (Warn level in queryEnd) - stats.go: PoolStats with Cluster.Stats() aggregating pool stats across all nodes - config.go/node.go: NodeConfig.Tracer passthrough for pgx.QueryTracer (OpenTelemetry) - options.go: WithSlowQueryThreshold and WithTracer functional options - dbxtest/tx.go: RunInTx runs callback in always-rolled-back transaction for test isolation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.1 KiB
6.1 KiB
AGENTS.md — dbx
Universal guide for AI coding agents working with this codebase.
Overview
git.codelab.vc/pkg/dbx is a Go PostgreSQL cluster library built on pgx/v5. It provides master/replica routing, automatic retries, load balancing, background health checking, panic-safe transactions, and context-based Querier injection.
Package map
dbx/ Root — Cluster, Node, Balancer, retry, health, errors, tx, config
├── dbx.go Interfaces: Querier, DB, Logger, MetricsHook
├── cluster.go Cluster — routing, write/read operations, slow query logging
├── node.go Node — pgxpool.Pool wrapper with health state, tracer passthrough
├── balancer.go Balancer interface + RoundRobinBalancer
├── retry.go retrier — exponential backoff with jitter and node fallback
├── health.go healthChecker — background goroutine pinging nodes
├── tx.go RunTx, RunTxOptions, InjectQuerier, ExtractQuerier
├── errors.go Error classification (IsRetryable, IsConnectionError, etc.)
├── config.go Config, NodeConfig, PoolConfig, RetryConfig, HealthCheckConfig
├── options.go Functional options (WithLogger, WithMetrics, WithRetry, WithTracer, etc.)
├── slog.go SlogLogger — adapts *slog.Logger to dbx.Logger
├── scan.go Collect[T], CollectOne[T] — generic row scan helpers
├── stats.go PoolStats — aggregate pool statistics via Cluster.Stats()
└── dbxtest/
├── dbxtest.go Test helpers: NewTestCluster, TestLogger
└── tx.go RunInTx — test transaction isolation (always rolled back)
Routing architecture
┌──────────────┐
│ Cluster │
└──────┬───────┘
│
┌───────────────┴───────────────┐
│ │
Write ops Read ops
Exec, Query, QueryRow ReadQuery, ReadQueryRow
Begin, BeginTx, RunTx
CopyFrom, SendBatch
│ │
▼ ▼
┌──────────┐ ┌────────────────────────┐
│ Master │ │ Balancer → Replicas │
└──────────┘ │ fallback → Master │
└────────────────────────┘
Retry loop (retrier.do):
For each attempt (up to MaxAttempts):
For each node in [target nodes]:
if healthy → execute → on retryable error → continue
Backoff (exponential + jitter)
Common tasks
Add a new node type (e.g., analytics replica)
- Add a field to
Clusterstruct (e.g.,analytics []*Node) - Add corresponding config to
Configstruct - Connect nodes in
NewCluster, add toallslice for health checking - Add routing methods (e.g.,
AnalyticsQuery)
Customize retry logic
- Provide
RetryConfig.RetryableErrors— customfunc(error) boolclassifier - Or modify
IsRetryable()inerrors.goto add new PG error codes - Adjust
MaxAttempts,BaseDelay,MaxDelayinRetryConfig
Add a metrics hook
- Add a new callback field to
MetricsHookstruct indbx.go - Call it at the appropriate point (nil-check the hook and the field)
- See existing hooks in
cluster.go(queryStart/queryEnd) andhealth.go(OnNodeDown/OnNodeUp)
Add a new balancer strategy
- Implement the
Balancerinterface:Next(nodes []*Node) *Node - Must return
nilif no suitable node is available - Must check
node.IsHealthy()to skip down nodes
Gotchas
- Close() is required:
Cluster.Close()stops the health checker goroutine and closes all pools. Leaking a Cluster leaks goroutines and connections - RunTx panic safety:
runTxusesdeferwithrecover()— it rolls back on panic, then re-panics. Do not catch panics outsideRunTxexpecting the tx to be committed - Context-based Querier injection:
ExtractQuerierreturns the fallback if no Querier is in context. Always pass the cluster/pool as fallback so code works both inside and outside transactions - Health checker goroutine: Starts immediately in
NewCluster. Usestime.NewTicker— the first check happens after one interval, not immediately. Nodes start as healthy (healthy.Store(true)innewNode) - readNodes ordering:
readNodes()returns[replicas..., master]— the retrier tries replicas first, master is the last fallback - errRow for closed cluster: When cluster is closed,
QueryRow/ReadQueryRowreturnerrRow{err: ErrClusterClosed}— the error surfaces onScan() - No SQL parsing: Routing is purely method-based. If you call
Execwith a SELECT, it still goes to master
Commands
go build ./... # compile
go test ./... # all tests
go test -race ./... # tests with race detector
go test -v -run TestName ./... # single test
go vet ./... # static analysis
Conventions
- Struct-based Config with
defaults()method for zero-value defaults - Functional options (
Option func(*Config)) used viaApplyOptions(primarily in dbxtest) - stdlib only testing — no testify, no gomock
- Thread safety —
atomic.BoolforNode.healthyandCluster.closed - dbxtest helpers —
NewTestClusterskips on unreachable DB, auto-closes viat.Cleanup;TestLoggerroutes totesting.T - Sentinel errors —
ErrNoHealthyNode,ErrClusterClosed,ErrRetryExhausted - retryError uses multi-unwrap (
Unwrap() []error) so bothErrRetryExhaustedand the last error can be matched witherrors.Is