Design Brief

wait_event_timing
Benchmark & Testing Design

Methodology for benchmarking Dmitry Fomin's Oracle-style wait-event instrumentation patch series.

PR: DmitryNFomin/postgres #2 Session: Hacking Postgres, 2026-06-17 Branch: wet-series (base 89eafad297a, HEAD 983f0ec) Status: corrected results available

📊 Corrected benchmark results (v3) → A/A floor, randomized order, paired stats, stub rung; 3 throughput regimes + latency (n=10) + percentiles.

1. Feature under test

wait_event_timing (WET) brings Oracle-style wait-event instrumentation to PostgreSQL. Each time a backend enters or exits a wait event, a small record is written. Aggregates and per-session traces are exposed through new catalog views and functions.

Patch series

7 reviewable patches in PR DmitryNFomin/postgres #2. The series applies cleanly onto upstream master (89eafad297a); series HEAD is 983f0ec on branch wet-series.

Capture levels

Controlled by the GUC wait_event_capture:

Level	What is recorded	Storage
`off`	Nothing — feature compiled in but dormant.	—
`stats`	Per-(backend, event) count, total duration, max duration, plus a 32-bucket log₂ histogram of individual wait durations.	Shared memory (one slot per backend per event type).
`trace`	Everything in `stats`, plus a per-session DSA ring buffer of individual wait entries. Query-attribution markers (`ExecStart`/`ExecEnd`, `QueryStart`/`QueryEnd`) are interleaved to correlate waits with SQL statements.	DSA ring (per session) + shared memory aggregates.

Compile-time gate

The feature is guarded by a compile-time flag:

# meson (preferred)
meson setup build -Dwait_event_timing=true ...

# autoconf equivalent
./configure --enable-wait-event-timing ...

Without the flag the instrumentation points are #ifdef'd out entirely. With the flag but wait_event_capture=off, the GUC is present and the code paths compile in, but no data is written at runtime.

Key new catalog objects

pg_stat_wait_event_timing — view exposing per-(backend, event) aggregates.
DSA-backed ring functions for trace retrieval.

2. Two cost concepts (keep them distinct)

Conflating these is the most common mistake in instrumentation benchmarks. They must be measured and reported separately.

Static / baseline overhead

The constant cost of having the feature compiled in and/or enabled, independent of whether anyone reads or resets the data. Sources include:

Branch prediction misses on the inline #ifdef/if (capture_level) gate in every backend loop.
Additional shared-memory footprint touching more cache lines.
Per-wait-event entry writes into shared-memory aggregates or DSA ring under stats/trace.

This is measured by comparing TPS/latency across the measurement ladder rungs (see Section 3) at a fixed, low observer frequency (or zero reads from outside).

Observer effect

The dynamic measurement error introduced by the act of monitoring itself. It grows with:

Sampling/probing frequency — how often an external agent reads the views.
Reset operations — clearing the stats table (e.g., calling the reset function analogous to pg_stat_reset()) under load acquires a lock on the shared hash table; every backend that tries to re-register a wait event must contend for that lock. Higher reset frequency ↠ worse observer effect.

Analogy — pg_stat_statements under load

Resetting pg_stat_statements under a heavy write workload is not free: clearing the shared hash forces every backend that next executes a query to re-register its entry, all contending on the same spinlock/LWLock. The same dynamic applies to wait_event_timing stats. A benchmark that resets stats frequently will inflate overhead numbers beyond what a typical production observer would see.

The observer-effect microbenchmark (a future pass) will sweep read/reset frequency as an independent variable while holding workload constant.

3. Measurement ladder

Each rung isolates one additional source of cost. Deltas between consecutive rungs attribute cost to specific causes. All rungs run on the same hardware against the same workload within the same benchmark session (interleaved rounds — see Section 7).

1
Vanilla Postgres (upstream master, no patch applied) — the true baseline. Every other rung is compared to this.
Δ = reference
2
Patched, compiled without --enable-wait-event-timing (stub build) — the patch is applied but all instrumentation is #ifdef'd out. Should be statistically indistinguishable from rung 1. Any delta here reveals a source-code-change cost (e.g., changed hot-path data structures, new function calls outside the guard).
Δ(1→2) source-change cost
3
Patched + compiled with flag, capture=off — the feature is compiled in and the GUC is present, but no wait events are recorded. Delta over rung 2 is the cost of the compiled-in inline gate (branch, possible cache-line touch).
Δ(2→3) compiled-in gate cost
4
capture=stats — per-backend shared-memory aggregates are maintained (counts, durations, histogram buckets updated on every wait-event transition).
Δ(3→4) stats capture cost
5
capture=trace — DSA ring writes per event plus query-attribution markers. Heaviest mode.
Δ(4→5) trace capture cost

A further axis — observer read/reset frequency — is orthogonal to the rungs and will be swept separately in the observer-effect microbenchmark.

4. Validity gate

Critical: verify the feature is genuinely active

A build configured without --enable-wait-event-timing still exposes the wait_event_capture GUC, but silently downgrades any non-off value back to off. If this is not caught, a benchmark comparing off, stats, and trace against the un-instrumented build will produce a false ~0% overhead — all three levels are actually identical.

Mandatory pre-benchmark check

Before accepting any numbers from a build, run this gate. Failure means the build is mis-configured and the run must be discarded.

These are the exact commands run by /root/go.sh before every benchmark session:

# Check 1: stats level must stick
WET=/opt/pg-wet/bin
su - pgbench -c "$WET/psql -p 5441 -d postgres -tAc \
  \"SET wait_event_capture=stats; SELECT current_setting('wait_event_capture');\""
# Expected output: stats
# If output is 'off': build is NOT instrumented — ABORT

# Check 2: trace level must stick
su - pgbench -c "$WET/psql -p 5441 -d postgres -tAc \
  \"SET wait_event_capture=trace; SELECT current_setting('wait_event_capture');\""
# Expected output: trace

# Check 3: the view must exist and be queryable
su - pgbench -c "$WET/psql -p 5441 -d postgres -tAc \
  \"SELECT count(*) FROM pg_stat_wait_event_timing;\""
# Expected output: a number (0 or more) — not an error

# go.sh only proceeds if STATS='stats' AND TRACE='trace'; otherwise it exits 1
# and writes: "NOT VERIFIED — will NOT produce misleading numbers."

Gate passed

All three checks succeed ↠ the build is genuinely instrumented. Proceed with the benchmark.

Lesson

Any instrumentation benchmark must assert that the feature is active, not merely that the GUC is settable. A GUC that silently no-ops is a common footgun in compile-gated features.

5. Workloads

All three workloads use pgbench -S (SELECT-only). Read-write would add WAL/lock noise to the overhead signal; SELECT-only is sufficient to exercise the wait-event instrumentation path cleanly. Uniform pgbench flags across all runs: -c 16 -j 16 -T 180 -P 15 -r -n (clients=jobs=cores; 3-minute runs for stable numbers per Dmitry; 15s progress intervals to confirm steady-state; per-statement latency report; no vacuum).

ID	Description	pgbench command	Bottleneck / wait profile	Headline metric
W1	Read-only, fully cached	`pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n` scale 100, `shared_buffers=8GB` (data ~1.6 GB, fits in buffer pool)	CPU-bound; very few waits per transaction.	TPS. Measures the pure inline-gate cost — waits rarely fire, isolating constant overhead of the instrumented code path.
W2	Wait-saturated (key test)	`pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n` scale 400, `shared_buffers=256MB` (data ~6.4 GB ≫ SB)	Constant buffer-mapping / I/O / LWLock waits. Working set far exceeds buffer pool — maximum wait-event transition rate.	TPS + wait-event proof (top events by count, total transitions). Per-event cost of `stats`/`trace` is most visible here.
W3	Latency at 50% capacity	`pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n -R <rate>` Same setup as W2 (scale 400, `shared_buffers=256MB`) `rate` = 50% of W2 baseline median TPS	Regular load-testing (~50% capacity, not stress-testing) in the same wait-heavy environment as W2. Server is not saturated — overhead surfaces as added latency per call rather than TPS drop. Production-realistic load level.	Latency (avg, p95, p99). CPU-headroom check (vmstat idle%) required to confirm the server is genuinely at 50% capacity. Rate derived from W2 results.

Why -c 16 = -j 16

The benchmark host has 16 dedicated vCPUs. Setting clients = jobs = 16 matches pgbench worker threads to cores and avoids pgbench itself oversubscribing the CPU when co-located with the Postgres server.

W2 proof obligation

Before trusting W2 numbers, confirm the workload is genuinely wait-bound (not CPU-bound). Run these while pgbench is active:

-- Top wait events by transition count:
SELECT wait_event_type, wait_event,
       sum(count)                        AS total_count,
       sum(total_wait_time_us) / 1000.0  AS total_wait_ms
FROM pg_stat_wait_event_timing
GROUP BY wait_event_type, wait_event
ORDER BY total_count DESC
LIMIT 10;

-- Mid-run snapshot of what backends are waiting on:
SELECT coalesce(wait_event_type, 'CPU')     AS type,
       coalesce(wait_event, 'running')       AS event,
       count(*)
FROM pg_stat_activity
WHERE state = 'active'
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 8;

-- Confirm CPU is not maxed out (high idle% = IO/wait bound):
vmstat 1 5

If vmstat shows CPU idle near 0% and wait-event counts are low, the workload is CPU-bound and W2 is not exercising the instrumentation path — reduce shared_buffers or increase scale until buffer/IO waits dominate.

6. Load modes — why two perspectives

W1 and W2 run at full concurrency (stress mode); W3 runs rate-limited at ~50% capacity. Both perspectives are necessary.

Stress mode — W1 and W2

The system is driven to the bottleneck resource (CPU for W1, I/O/buffer for W2). Metric: TPS. Overhead from the feature shows as reduced throughput.

Caveat: when the bottleneck is I/O, additional CPU work from the instrumentation may hide behind I/O wait time — overhead can look smaller than it is in a CPU-saturated scenario. W1 (CPU-bound) and W2 (IO/wait-bound) together expose overhead under both bottleneck types.

Rate-limited mode — W3

Run the same wait-heavy environment as W2 but hold TPS at ~50% of the W2 baseline via pgbench -R <rate>. The server is not saturated, so the feature's per-wait-event cost surfaces as added latency per transaction rather than reduced throughput. Metric: avg / p95 / p99 latency.

This is more production-realistic: most production systems run well below saturation. It also eliminates the CPU-masking effect and reduces pgbench co-location contention (pgbench and Postgres share the same 16 cores; at full load pgbench competes with the server).

Deriving the W3 rate

After the W2 stress runs complete, take the baseline (vanilla) median TPS and set -R to 50% of that value. The same rate is used for all four W3 variants (baseline / off / stats / trace) so latency differences are directly comparable.

7. Run discipline

Interleaving (cancels temporal drift)

Within each round, cycle through all variants in sequence before moving to the next round. Do not run all trials for one variant then all for another — thermal and memory-pressure drift will bias inter-variant comparisons.

for round in 1 2 3; do
  for variant in baseline off stats trace; do
    pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n postgres
  done
done

Run parameters

Duration: 180 seconds (3 minutes) per run — chosen by Dmitry for stable numbers.
Rounds: 3 rounds per (workload × variant).
Warmup: discard round 1; report on rounds 2–3.
Progress reporting: -P 15 (every 15 s) to confirm the run has reached steady-state before accepting the result.
Latency report: -r gives per-statement avg latency in the pgbench summary — the primary latency metric for W3.
Summary statistics: median TPS (W1, W2); avg / p95 / p99 latency (W3).
--log (-l) is intentionally NOT used — per-transaction file logging is itself an observer-effect source (one write per transaction to disk) and would inflate overhead numbers unless the log lives on a RAM disk.

Between-run resets

To isolate static overhead from observer effect, do not reset pg_stat_wait_event_timing between variants during the primary benchmark. A separate reset-sweep pass will quantify the observer effect independently.

8. Hardware & build

Hardware

Parameter	Value
Provider	Hetzner Cloud
Instance type	CCX43
CPU	AMD EPYC Milan, 16 dedicated vCPU
RAM	64 GB
OS	Ubuntu 24.04 LTS

Build

Parameter	Value
Compiler	gcc (ccache wrapper), `-O2`
Build system	autoconf (`./configure`)
Upstream base	`89eafad297a` (master)
Patch branch	`wet-series`
Series HEAD	`983f0ec`
Instrumented flag	`-Dwait_event_timing=true` (rungs 3–5); omitted for rung 2

Build commands (actual — autoconf)

The benchmark uses autoconf (./configure), not meson. Two separate installs are maintained side by side.

# Rung 1: vanilla baseline (upstream master, no patch)
#   installed at /opt/pg-baseline  port 5440
./configure --prefix=/opt/pg-baseline CC="ccache gcc" CFLAGS="-O2"
make -j16
make install

# Rung 2: stub build (patch applied, flag ABSENT) — PLANNED, not yet built
#   same configure line as vanilla but on wet-series branch
./configure --prefix=/opt/pg-stub CC="ccache gcc" CFLAGS="-O2"
make -j16
make install

# Rungs 3-5: instrumented build (patch + flag present)
#   installed at /opt/pg-wet  port 5441
#   Source: /root/rebuild_wet.sh
cd /root/pg-src
git checkout -f wet-series
git clean -fdx -q
./configure --prefix=/opt/pg-wet --enable-wait-event-timing \
  CC="ccache gcc" CFLAGS="-O2"
make -j16
make install

# Sanity check after install: flag must appear in config.log
grep -- '--enable-wait-event-timing' config.log

postgresql.conf settings (both clusters)

Appended by /root/setup_clusters.sh and /root/go.sh:

# Baseline cluster (/data/pg-baseline, port 5440)
port = 5440
max_connections = 300
max_wal_size = 8GB
track_activities = on      # required for query attribution in trace mode
compute_query_id = on      # required for query attribution in trace mode
# shared_buffers set per-workload at runtime (sed replacement in run_bench.sh)
# fsync left at default (on) — workload is SELECT-only, no writes

# Wet cluster (/data/pg-wet, port 5441)
port = 5441
max_connections = 300
max_wal_size = 8GB
track_activities = on      # required for query attribution in trace mode
compute_query_id = on      # required for query attribution in trace mode
wait_event_capture = off   # changed to stats/trace per-run by run_bench.sh
# shared_buffers set per-workload at runtime
# fsync left at default (on) — workload is SELECT-only, no writes

9. Results (populated on completion, ~2h)

Work in progress

Numbers below are filled from /root/bench.log when the benchmark run finishes (~2h). Do not treat any cell as final until the run completes and the validity gate (Section 4) has been passed for each build.

W1 — Read-only cached (stress, TPS)

Config: scale=100, shared_buffers=8GB, pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n. Metric: median TPS of rounds 2–3.

Variant	Median TPS	%Δ vs baseline	%Δ vs off
1 — vanilla (baseline)	—	reference	—
3 — off	—	—	reference
4 — stats	—	—	—
5 — trace	—	—	—

W2 — Wait-saturated (stress, TPS)

Config: scale=400, shared_buffers=256MB, pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n. Metric: median TPS of rounds 2–3.

Variant	Median TPS	%Δ vs baseline	%Δ vs off
1 — vanilla (baseline)	—	reference	—
3 — off	—	—	reference
4 — stats	—	—	—
5 — trace	—	—	—

W2 — Wait-event proof

Collected during a dedicated W2 stats run. Confirms workload is wait-bound, not CPU-bound.

wait_event_type	wait_event	total_count	total_wait_ms
— populated from pg_stat_wait_event_timing query —

Total wait transitions in 180s: — (~— /sec). CPU idle% (vmstat): —%.

W3 — Latency at 50% capacity

Config: same as W2 but rate-limited with -R <50% of W2 baseline TPS>. Rate used: — TPS. Metric: avg / p95 / p99 latency.

Variant	Avg lat (ms)	p95 lat (ms)	p99 lat (ms)	%Δ avg vs baseline	%Δ avg vs off
1 — vanilla (baseline)	—	—	—	reference	—
3 — off	—	—	—	—	reference
4 — stats	—	—	—	—	—
5 — trace	—	—	—	—	—

Source: /root/bench.log on the CCX43 benchmark host. Results will be posted to the PR thread once all runs complete. Rung 2 (stub build — patched without --enable-wait-event-timing) is planned but not yet built; its row will be added once available.

10. Exact commands per step

These are verbatim invocations extracted from the scripts on the benchmark host (23.88.42.150). Scripts: /root/rebuild_wet.sh, /root/setup_clusters.sh, /root/go.sh, /root/run_bench.sh.

Step 0 — Data initialization

Run once per workload tier before the interleaved benchmark rounds. W3 reuses the W2 dataset (same scale=400, same shared_buffers=256MB) — no separate init required.

BASELINE_BIN=/opt/pg-baseline/bin
WET_BIN=/opt/pg-wet/bin

# W1: scale=100, run with shared_buffers=8GB
$BASELINE_BIN/pgbench -p 5440 -i -s 100 --no-vacuum postgres
$WET_BIN/pgbench    -p 5441 -i -s 100 --no-vacuum postgres

# W2 + W3: scale=400, run with shared_buffers=256MB
# (W3 rate-limited uses the same dataset; no separate init)
$BASELINE_BIN/pgbench -p 5440 -i -s 400 --no-vacuum postgres
$WET_BIN/pgbench    -p 5441 -i -s 400 --no-vacuum postgres

Step 1 — Per-round startup sequence

For each round and each variant, run_bench.sh stops all clusters, edits postgresql.conf via Python regex to set shared_buffers and wait_event_capture, then starts the appropriate cluster:

# Set shared_buffers and wait_event_capture in postgresql.conf at runtime:
python3 -c "
import re
with open('/data/pg-wet/postgresql.conf', 'r') as f:
    content = f.read()
content = re.sub(r'^#?shared_buffers\s*=.*',
                 \"shared_buffers = '8GB'\", content, flags=re.MULTILINE)
content = re.sub(r'^#?wait_event_capture\s*=.*',
                 \"wait_event_capture = 'stats'\", content, flags=re.MULTILINE)
with open('/data/pg-wet/postgresql.conf', 'w') as f:
    f.write(content)
"

# Start cluster and confirm settings:
su - pgbench -c "/opt/pg-wet/bin/pg_ctl -D /data/pg-wet -l /tmp/wet.log start"
su - pgbench -c "/opt/pg-wet/bin/psql -p 5441 -t \
  -c 'SHOW wait_event_capture;' postgres"

Step 2 — pgbench measured-run command (canonical)

The measured run command is identical for every workload and every variant. It is defined once here; all workload descriptions reference it.

# Canonical measured-run command (W1, W2, W3 — all variants):
pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n postgres

# Run via su - pgbench against the appropriate port:
#   baseline cluster  →  -p 5440  (/opt/pg-baseline/bin/pgbench)
#   wet cluster       →  -p 5441  (/opt/pg-wet/bin/pgbench)
# shared_buffers and wait_event_capture are set in postgresql.conf before each start.

Flag rationale:

-S — SELECT-only script (all three workloads).
-c 16 -j 16 — clients = worker threads = vCPUs; avoids pgbench oversubscribing the 16 dedicated cores when co-located with Postgres.
-T 180 — 3-minute run duration for stable numbers (per Dmitry).
-P 15 — progress report every 15 s; used to confirm steady-state before accepting results.
-r — per-statement latency statistics in the summary; the primary metric for W3.
-n — skip vacuum.
--log / -l intentionally omitted — per-transaction file logging is itself an observer-effect source (one write per transaction) and would inflate overhead numbers unless the log lives on a RAM disk.

The only two justified additions to the canonical command:

W3 only: append -R <rate> where rate = 50% of the W2 baseline median TPS (derived after W2 stress runs complete).
```
pgbench -S -c 16 -j 16 -T 180 -P 15 -r -n -R $RATE postgres
```
W2 diagnostic pass only (not a measured run): a separate short run under capture=stats to collect wait-event evidence. See Step 4 below. This run is not included in the TPS/latency results.

Step 3 — capture level changes between variants

wait_event_capture is changed by rewriting postgresql.conf and doing a full stop/start cycle — not a live ALTER SYSTEM. This guarantees a clean server state and eliminates residual DSA ring state from a prior trace run.

Each variant is a full stop/start cycle of the appropriate cluster with the target wait_event_capture value written to postgresql.conf before start. The measured-run command (see Step 2) is identical for every variant; only the port and cluster binary differ.

# Sequence within one round — W2 example (same pattern for W1 and W3):

# 1. baseline (separate binary, /opt/pg-baseline, port 5440)
#    set shared_buffers=256MB on baseline cluster, start
su - pgbench -c "/opt/pg-baseline/bin/pgbench \
  -p 5440 -S -c 16 -j 16 -T 180 -P 15 -r -n postgres"

# 2. off (wet binary, /opt/pg-wet, port 5441)
#    set shared_buffers=256MB + wait_event_capture=off, start wet cluster
su - pgbench -c "/opt/pg-wet/bin/pgbench \
  -p 5441 -S -c 16 -j 16 -T 180 -P 15 -r -n postgres"

# 3. stats  (set wait_event_capture=stats, restart)
su - pgbench -c "/opt/pg-wet/bin/pgbench \
  -p 5441 -S -c 16 -j 16 -T 180 -P 15 -r -n postgres"

# 4. trace  (set wait_event_capture=trace, restart)
su - pgbench -c "/opt/pg-wet/bin/pgbench \
  -p 5441 -S -c 16 -j 16 -T 180 -P 15 -r -n postgres"

Step 4 — W2 wait-event proof queries (diagnostic pass, not a measured run)

Run once, separately from the interleaved rounds, with capture=stats. Purpose: confirm the W2 workload is genuinely wait-bound. Results go into the "W2 wait-event proof" table in Section 9; they are not included in TPS results.

# Reset stats counters before the diagnostic run:
su - pgbench -c "/opt/pg-wet/bin/psql -p 5441 \
  -c 'SELECT pg_stat_reset_wait_event_timing_all();' postgres"

# Start a 180s diagnostic load run in background, sample midway:
# (same canonical command as measured runs — no --log)
su - pgbench -c "/opt/pg-wet/bin/pgbench \
  -p 5441 -S -c 16 -j 16 -T 180 -P 15 -r -n postgres" &
sleep 90

# CPU idle% (high idle% = IO/wait bound, not CPU-bound):
vmstat 1 5

# Active waits from pg_stat_activity:
/opt/pg-wet/bin/psql -p 5441 -t postgres -c "
  SELECT coalesce(wait_event_type,'CPU')    AS type,
         coalesce(wait_event,'running')      AS event,
         count(*)
  FROM pg_stat_activity
  WHERE state = 'active'
  GROUP BY 1, 2
  ORDER BY 3 DESC
  LIMIT 8;"

wait   # let pgbench finish

# Aggregate from pg_stat_wait_event_timing:
/opt/pg-wet/bin/psql -p 5441 postgres -c "
  SELECT wait_event_type, wait_event,
         sum(count)                        AS total_count,
         sum(total_wait_time_us) / 1000.0  AS total_wait_ms
  FROM pg_stat_wait_event_timing
  GROUP BY wait_event_type, wait_event
  ORDER BY total_count DESC
  LIMIT 10;"

# Total wait transitions in the 180s run:
/opt/pg-wet/bin/psql -p 5441 -t postgres -c \
  "SELECT sum(count) FROM pg_stat_wait_event_timing;"

Open questions / next steps

Stub-build rung (rung 2) — the patched binary compiled without --enable-wait-event-timing is not yet built. Expected delta vs vanilla: ~0%; any non-trivial delta would reveal a hot-path structural change in the patch outside the compile guard.
W3 CPU-headroom check — sample CPU (vmstat idle%) during the rate-limited W3 run to confirm the server is genuinely not saturated. This proves W3 is regular load-testing at 50% capacity, distinct from W2 stress.
Reset/observer-effect microbenchmark — sweep reset frequency (0, 1/s, 10/s) as an independent variable while holding the W2 stress workload constant. Isolates observer-effect cost (locking, re-registration) from the static overhead measured in the primary runs.
Iteration 2 — pg_stat_statements loaded — add shared_preload_libraries = 'pg_stat_statements' and CREATE EXTENSION pg_stat_statements on both clusters to introduce realistic shared-hash / LWLock contention (more wait events in flight). Protocol: reset pg_stat_statements before each run; cross-check its calls count (should equal ≈ TPS × 180) to confirm the workload ran as expected; dump top queries alongside the wait-event summary.
W4 (future) — read-side observer effect — while one pgbench drives the SELECT-only workload, a second client polls pg_stat_wait_event_timing at increasing frequency (0, 1 Hz, 10 Hz). Measures how reading the instrumentation perturbs the workload being measured — the instrumentation creates the problem it is trying to observe.
DSA ring sizing — what happens when the trace ring overflows under very high wait-event rates? Is overflow behavior documented and tested?