Prometheus
Prometheus Metrics
Complete reference for all Prometheus metrics exported by Godwit Sync, including counters, histograms, gauges, and Go runtime metrics.
Enabling Metrics
Pass --status-addr to expose a /metrics endpoint in Prometheus exposition format.
godwit sync --source s3://src --dest s3://dst --status-addr :8080
# Scrape metrics
curl localhost:8080/metricsAdd a scrape job to your Prometheus configuration:
# prometheus.yml
scrape_configs:
- job_name: godwit
static_configs:
- targets: ["localhost:8080"]Counters
Monotonically increasing values tracking cumulative totals.
| Metric | Type | Labels | Description |
|---|---|---|---|
| godwit_objects_total | CounterVec | status | Objects processed by terminal status (completed, failed, storage_type_glacier, unsupported_key, excluded) |
| godwit_bytes_total | CounterVec | dir | Bytes processed by direction (read, write, storage_type_glacier, unsupported_key) |
| godwit_requests_total | CounterVec | op, code | S3 API requests by operation and HTTP status code |
| godwit_upload_type_total | CounterVec | type | Upload type distribution (single, multipart) |
| godwit_multipart_sessions_total | CounterVec | action | Multipart session lifecycle (created, resumed, aborted) |
| godwit_multipart_parts_total | CounterVec | action | Multipart parts by action (uploaded, skipped) |
| godwit_retries_total | CounterVec | dir | Task/part retry attempts by direction (read = source error, write = sink error) |
| godwit_source_list_retries_total | Counter | — | Source listing retry attempts |
| godwit_partial_upload_wasted_bytes_total | Counter | — | Bytes lost due to failed multipart uploads |
| godwit_verify_total | CounterVec | result | Verification outcomes (matched, mismatched, error) |
| godwit_run_transfer_bytes_total | CounterVec | run_id, dir | Cumulative bytes transferred per run; use rate() for throughput |
Histograms
Observe values bucketed into configurable ranges. Each histogram exposes _bucket, _sum, and _count series.
| Metric | Type | Labels | Buckets | Description |
|---|---|---|---|---|
| godwit_part_latency_seconds | HistogramVec | op | Defaults (0.005–10 s) | Latency of multipart operations |
| godwit_task_duration_seconds | Histogram | — | Exp 0.01×2ⁿ (10 ms–163 s) | Per-object transfer duration |
| godwit_object_size_bytes | Histogram | — | Exp 1×4ⁿ (1 B–256 GB) | Object size distribution at plan time |
| godwit_task_attempts | Histogram | — | Linear 1..10 | Attempt count per task (1 = first try) |
| godwit_multipart_parts_per_object | Histogram | — | Exp 1×2ⁿ (1–8192) | Multipart parts per object |
| godwit_s3_request_seconds | HistogramVec | run_id | 0.01–60 s | Duration of individual S3 API requests |
Gauges
Point-in-time values that can go up or down.
| Metric | Type | Labels | Description |
|---|---|---|---|
| godwit_objects | GaugeVec | state | Object count by state (total, done, failed, skipped, pending, running, excluded) |
| godwit_bytes | GaugeVec | state | Byte count by state (total, done, failed, skipped, pending, running, excluded) |
| godwit_eta_seconds | Gauge | — | Estimated seconds to completion |
| godwit_duration_seconds | GaugeVec | phase | Duration by phase (plan, sync, elapsed) |
| godwit_throughput_bytes_per_second | Gauge | — | Current transfer throughput |
| godwit_warnings | GaugeVec | reason | Warning counts (glacier_skipped, unsupported_key_skipped) |
| godwit_key_issues | GaugeVec | type | Key issues (case_conflict, unsupported_key) |
| godwit_storage_class_objects | GaugeVec | class | Objects per S3 storage class |
| godwit_storage_class_bytes | GaugeVec | class | Bytes per S3 storage class |
| godwit_license_bytes | GaugeVec | type | License byte metrics (limit, used) |
| godwit_license_limit_hit | Gauge | — | 1 if license cap was reached, 0 otherwise |
| godwit_active_workers | Gauge | — | Configured parallel workers (--parallel) |
| godwit_buffer_capacity | Gauge | — | Copy-queue buffer size |
| godwit_config_rps | Gauge | — | Configured requests/s limit (0 = unlimited) |
| godwit_config_read_bps | Gauge | — | Configured read B/s limit (0 = unlimited) |
| godwit_config_max_inflight | Gauge | — | Configured max inflight uploads (0 = unlimited) |
| godwit_config_max_retries | Gauge | — | Configured max retries per object (0 = no retries, set by --retry) |
| godwit_config_retry_base_delay_seconds | Gauge | — | Configured base retry delay in seconds; doubles on each attempt (set by --retry-backoff) |
| godwit_run_info | GaugeVec | run_id, source, destination | Static run metadata; always 1 while the run is active |
| godwit_run_started_timestamp | GaugeVec | run_id | Unix timestamp when the run started |
| godwit_run_completed_timestamp | GaugeVec | run_id | Unix timestamp when the run finished |
| godwit_run_stage | GaugeVec | run_id, stage | Current run stage; value 1 for the active stage, 0 otherwise (planning, transferring, verifying, completed, failed) |
| godwit_run_objects_total | GaugeVec | run_id | Total planned objects for the run |
| godwit_run_objects_completed | GaugeVec | run_id | Objects completed (transferred or skipped) |
| godwit_run_objects_failed | GaugeVec | run_id | Objects that failed during transfer |
| godwit_run_objects_skipped | GaugeVec | run_id | Objects skipped (already exist or excluded) |
| godwit_run_bytes_total | GaugeVec | run_id | Total planned bytes for the run |
| godwit_run_bytes_transferred | GaugeVec | run_id | Bytes transferred so far |
| godwit_run_bytes_verified | GaugeVec | run_id | Bytes verified during post-transfer check |
| godwit_verify_duration_seconds | GaugeVec | run_id | Wall-clock duration of the verification phase in seconds |
| godwit_plan_created_timestamp | GaugeVec | run_id | Unix timestamp when the plan was created |
Runtime & Process Metrics
Automatically collected by the Go runtime and the Prometheus client library.
| Metric | Type | Description |
|---|---|---|
| go_goroutines | Gauge | Current number of goroutines |
| go_threads | Gauge | OS threads created |
| go_memstats_heap_inuse_bytes | Gauge | Heap bytes in active use |
| go_memstats_alloc_bytes | Gauge | Bytes allocated on heap (live) |
| go_gc_duration_seconds | Summary | GC pause duration distribution |
| process_cpu_seconds_total | Counter | Total CPU seconds (user+system) |
| process_resident_memory_bytes | Gauge | Resident memory size (RSS) |
| process_open_fds | Gauge | Open file descriptors |
Example PromQL Queries
Common queries for monitoring Godwit Sync transfers.
# Transfer progress percentage
godwit_objects{state="done"} / godwit_objects{state="total"} * 100
# Current failure rate
rate(godwit_objects_total{status="failed"}[5m])
# p99 upload latency
histogram_quantile(0.99, rate(godwit_task_duration_seconds_bucket[5m]))
# Median object size
histogram_quantile(0.5, rate(godwit_object_size_bytes_bucket[5m]))
# Multipart resume ratio
godwit_multipart_sessions_total{action="resumed"}
/ (godwit_multipart_sessions_total{action="created"}
+ godwit_multipart_sessions_total{action="resumed"})
# Storage class distribution
godwit_storage_class_objects
# Process CPU usage rate
rate(process_cpu_seconds_total[5m])
# Heap memory in use
go_memstats_heap_inuse_bytes
# --- Migration-run metrics (Layer 2) ---
# Current run stage (planning / transferring / verifying / completed / failed)
godwit_run_stage{stage="transferring"}
# Per-run transfer throughput (bytes/s)
rate(godwit_run_transfer_bytes_total[1m])
# Run progress percentage
godwit_run_objects_completed / godwit_run_objects_total * 100
# p99 single-object transfer latency
histogram_quantile(0.99, rate(godwit_task_duration_seconds_bucket[5m]))
# Median S3 request latency
histogram_quantile(0.5, rate(godwit_s3_request_seconds_bucket[5m]))
# Pipeline queue depth (pending objects)
godwit_objects{state="pending"}
# Verification failure rate
rate(godwit_verify_total{result=~"mismatched|error"}[5m])
# Run wall-clock duration
godwit_run_completed_timestamp - godwit_run_started_timestamp