Prometheus

Prometheus Metrics

Complete reference for all Prometheus metrics exported by Godwit Sync, including counters, histograms, gauges, and Go runtime metrics.

Enabling Metrics

Pass --status-addr to expose a /metrics endpoint in Prometheus exposition format.

godwit sync --source s3://src --dest s3://dst --status-addr :8080

# Scrape metrics
curl localhost:8080/metrics

Add a scrape job to your Prometheus configuration:

# prometheus.yml
scrape_configs:
  - job_name: godwit
    static_configs:
      - targets: ["localhost:8080"]

Counters

Monotonically increasing values tracking cumulative totals.

Metric	Type	Labels	Description
godwit_objects_total	CounterVec	status	Objects processed by terminal status (completed, failed, storage_type_glacier, unsupported_key, excluded)
godwit_bytes_total	CounterVec	dir	Bytes processed by direction (read, write, storage_type_glacier, unsupported_key)
godwit_requests_total	CounterVec	op, code	S3 API requests by operation and HTTP status code
godwit_upload_type_total	CounterVec	type	Upload type distribution (single, multipart)
godwit_multipart_sessions_total	CounterVec	action	Multipart session lifecycle (created, resumed, aborted)
godwit_multipart_parts_total	CounterVec	action	Multipart parts by action (uploaded, skipped)
godwit_retries_total	CounterVec	dir	Task/part retry attempts by direction (read = source error, write = sink error)
godwit_source_list_retries_total	Counter	—	Source listing retry attempts
godwit_partial_upload_wasted_bytes_total	Counter	—	Bytes lost due to failed multipart uploads
godwit_verify_total	CounterVec	result	Verification outcomes (matched, mismatched, error)
godwit_run_transfer_bytes_total	CounterVec	run_id, dir	Cumulative bytes transferred per run; use rate() for throughput

Histograms

Observe values bucketed into configurable ranges. Each histogram exposes _bucket, _sum, and _count series.

Metric	Type	Labels	Buckets	Description
godwit_part_latency_seconds	HistogramVec	op	Defaults (0.005–10 s)	Latency of multipart operations
godwit_task_duration_seconds	Histogram	—	Exp 0.01×2ⁿ (10 ms–163 s)	Per-object transfer duration
godwit_object_size_bytes	Histogram	—	Exp 1×4ⁿ (1 B–256 GB)	Object size distribution at plan time
godwit_task_attempts	Histogram	—	Linear 1..10	Attempt count per task (1 = first try)
godwit_multipart_parts_per_object	Histogram	—	Exp 1×2ⁿ (1–8192)	Multipart parts per object
godwit_s3_request_seconds	HistogramVec	run_id	0.01–60 s	Duration of individual S3 API requests

Gauges

Point-in-time values that can go up or down.

Metric	Type	Labels	Description
godwit_objects	GaugeVec	state	Object count by state (total, done, failed, skipped, pending, running, excluded)
godwit_bytes	GaugeVec	state	Byte count by state (total, done, failed, skipped, pending, running, excluded)
godwit_eta_seconds	Gauge	—	Estimated seconds to completion
godwit_duration_seconds	GaugeVec	phase	Duration by phase (plan, sync, elapsed)
godwit_throughput_bytes_per_second	Gauge	—	Current transfer throughput
godwit_warnings	GaugeVec	reason	Warning counts (glacier_skipped, unsupported_key_skipped)
godwit_key_issues	GaugeVec	type	Key issues (case_conflict, unsupported_key)
godwit_version_history_keys	GaugeVec	outcome	Per-key version history completeness by outcome (complete, partial, fully_skipped)
godwit_object_lock_versions	GaugeVec	type	Object Lock version counts by type (governance, compliance, legal_hold, none)
godwit_storage_class_objects	GaugeVec	class	Objects per S3 storage class
godwit_storage_class_bytes	GaugeVec	class	Bytes per S3 storage class
godwit_license_bytes	GaugeVec	type	License byte metrics (limit, used)
godwit_license_limit_hit	Gauge	—	1 if license cap was reached, 0 otherwise
godwit_active_workers	Gauge	—	Configured parallel workers (--parallel)
godwit_buffer_capacity	Gauge	—	Copy-queue buffer size
godwit_config_rps	Gauge	—	Configured requests/s limit (0 = unlimited)
godwit_config_read_bps	Gauge	—	Configured read B/s limit (0 = unlimited)
godwit_config_max_inflight	Gauge	—	Configured max inflight uploads (0 = unlimited)
godwit_config_max_retries	Gauge	—	Configured max retries per object (0 = no retries, set by --retry)
godwit_config_retry_base_delay_seconds	Gauge	—	Configured base retry delay in seconds; doubles on each attempt (set by --retry-backoff)
godwit_run_info	GaugeVec	run_id, source, destination	Static run metadata; always 1 while the run is active
godwit_run_started_timestamp	GaugeVec	run_id	Unix timestamp when the run started
godwit_run_completed_timestamp	GaugeVec	run_id	Unix timestamp when the run finished
godwit_run_stage	GaugeVec	run_id, stage	Current run stage; value 1 for the active stage, 0 otherwise (planning, transferring, verifying, completed, failed)
godwit_run_objects_total	GaugeVec	run_id	Total planned objects for the run
godwit_run_objects_completed	GaugeVec	run_id	Objects completed (transferred or skipped)
godwit_run_objects_failed	GaugeVec	run_id	Objects that failed during transfer
godwit_run_objects_skipped	GaugeVec	run_id	Objects skipped (already exist or excluded)
godwit_run_bytes_total	GaugeVec	run_id	Total planned bytes for the run
godwit_run_bytes_transferred	GaugeVec	run_id	Bytes transferred so far
godwit_run_bytes_verified	GaugeVec	run_id	Bytes verified during post-transfer check
godwit_run_objects_verified	GaugeVec	run_id	Objects verified so far (matched + mismatched + errors); only set during verify runs
godwit_verify_duration_seconds	GaugeVec	run_id	Wall-clock duration of the verification phase in seconds
godwit_plan_created_timestamp	GaugeVec	run_id	Unix timestamp when the plan was created

Runtime & Process Metrics

Automatically collected by the Go runtime and the Prometheus client library.

Metric	Type	Description
go_goroutines	Gauge	Current number of goroutines
go_threads	Gauge	OS threads created
go_memstats_heap_inuse_bytes	Gauge	Heap bytes in active use
go_memstats_alloc_bytes	Gauge	Bytes allocated on heap (live)
go_gc_duration_seconds	Summary	GC pause duration distribution
process_cpu_seconds_total	Counter	Total CPU seconds (user+system)
process_resident_memory_bytes	Gauge	Resident memory size (RSS)
process_open_fds	Gauge	Open file descriptors

Example PromQL Queries

Common queries for monitoring Godwit Sync transfers.

# Transfer progress percentage
godwit_objects{state="done"} / godwit_objects{state="total"} * 100

# Current failure rate
rate(godwit_objects_total{status="failed"}[5m])

# p99 upload latency
histogram_quantile(0.99, rate(godwit_task_duration_seconds_bucket[5m]))

# Median object size
histogram_quantile(0.5, rate(godwit_object_size_bytes_bucket[5m]))

# Multipart resume ratio
godwit_multipart_sessions_total{action="resumed"}
  / (godwit_multipart_sessions_total{action="created"}
   + godwit_multipart_sessions_total{action="resumed"})

# Storage class distribution
godwit_storage_class_objects

# Process CPU usage rate
rate(process_cpu_seconds_total[5m])

# Heap memory in use
go_memstats_heap_inuse_bytes

# --- Migration-run metrics (Layer 2) ---

# Current run stage (planning / transferring / verifying / completed / failed)
godwit_run_stage{stage="transferring"}

# Per-run transfer throughput (bytes/s)
rate(godwit_run_transfer_bytes_total[1m])

# Run progress percentage
godwit_run_objects_completed / godwit_run_objects_total * 100

# p99 single-object transfer latency
histogram_quantile(0.99, rate(godwit_task_duration_seconds_bucket[5m]))

# Median S3 request latency
histogram_quantile(0.5, rate(godwit_s3_request_seconds_bucket[5m]))

# Pipeline queue depth (pending objects)
godwit_objects{state="pending"}

# Verification failure rate
rate(godwit_verify_total{result=~"mismatched|error"}[5m])

# Run wall-clock duration
godwit_run_completed_timestamp - godwit_run_started_timestamp

← Guides Status Endpoint →