Plan: Reduce GitHub Actions Resource Consumption

Current State

Workflow File Trigger Est. Duration Frequency
Build build.yml push/PR on main ~15–20 min (3-platform build + Docker) Every push/PR
Release release.yml push v* tag ~25–30 min (3-platform build + Docker + GHCR) Infrequent
CI ci.yml workflow_dispatch (manual) ~30+ min (unit × 3 OS + integration + E2E + bench + CNPG) Manual
Coverage coverage.yml workflow_dispatch (manual) ~10–15 min Manual
Benchmarks benchmarks.yml workflow_dispatch (manual) ~10–15 min Manual

Already Done

  • CI, Coverage, and Benchmarks workflows are already manual-only (workflow_dispatch).
  • The setup-pgrx composite action already caches Rust artifacts, cargo-pgrx binary, and ~/.pgrx.
  • The Build workflow Docker step already uses GHA cache (cache-from: type=gha).
  • All automatic workflows have cancel-in-progress: true.

Primary Cost Driver

The Build workflow is the only remaining automatic workflow that runs on every push/PR. It runs: 1. Lint job (ubuntu) — ~3 min 2. Build matrix (Linux, macOS-arm64, Windows) — ~8–12 min each 3. Docker image build (ubuntu) — ~5–8 min

Total: ~4 billable jobs per push/PR, with Windows being the most expensive (pgrx compiles PG from source).


Prioritized Steps

Step 1: Add path filters to Build workflow ⭐ HIGH IMPACT / LOW EFFORT

Problem: The Build workflow triggers on every push, including doc-only, plan, or config changes.

Fix: Add paths-ignore to skip builds when only non-code files change.

on:
  push:
    branches: [main]
    paths-ignore:
      - '**/*.md'
      - 'docs/**'
      - 'LICENSE'
      - '.gitignore'
      - 'PLAN*.md'
      - 'REPORT*.md'
      - 'adrs/**'
      - 'coverage/**'
      - 'cnpg/**'
      - 'scripts/**'
  pull_request:
    branches: [main]
    paths-ignore:
      - '**/*.md'
      - 'docs/**'
      - 'LICENSE'
      - '.gitignore'
      - 'PLAN*.md'
      - 'REPORT*.md'
      - 'adrs/**'
      - 'coverage/**'
      - 'cnpg/**'
      - 'scripts/**'

Estimated savings: 30–50% fewer Build runs (this project has frequent doc/plan commits).

Caveat: If Build is a required status check for PR merge, skipped runs show as “pending” forever. Fix by adding a lightweight “pass-through” job that always succeeds, or use paths-filter action to set a condition.


Step 2: Drop or conditionally skip Windows from the Build matrix ⭐ HIGH IMPACT / LOW EFFORT

Problem: The Windows build is continue-on-error: true (experimental) and is the slowest job — pgrx downloads and compiles PostgreSQL from source (~10–15 min). It’s not a gating check, yet burns the most minutes.

Fix — Option A (recommended): Remove Windows from the Build workflow entirely. Keep it only in the manual CI workflow for periodic verification.

matrix:
  include:
    - os: ubuntu-22.04
      artifact_suffix: linux-amd64
      archive_ext: tar.gz
    - os: macos-14
      artifact_suffix: macos-arm64
      archive_ext: tar.gz
    # Windows build available via manual CI workflow

Fix — Option B: Move Windows to a separate job gated on a label or workflow_dispatch:

build-windows:
  if: github.event_name == 'workflow_dispatch' || contains(github.event.pull_request.labels.*.name, 'test-windows')

Estimated savings: ~10–15 min per Build run (the full Windows job duration).


Step 3: Add timeout-minutes to all jobs ⭐ MEDIUM IMPACT / LOW EFFORT

Problem: A hung job (e.g., E2E Docker build, CNPG wait) can run for up to 6 hours (GitHub default), silently draining the budget.

Fix: Add timeout-minutes to every job:

Job Recommended timeout
Lint 10 min
Build (per-platform) 20 min
Docker image build 15 min
Unit tests 15 min
Integration tests 15 min
E2E tests 25 min
Benchmarks 20 min
CNPG smoke test 15 min
Coverage 15 min
Release jobs 30 min

Example: yaml jobs: lint: runs-on: ubuntu-latest timeout-minutes: 10

Estimated savings: Prevents runaway cost; caps worst-case at a known ceiling.


Step 4: Reduce artifact retention days ⭐ MEDIUM IMPACT / LOW EFFORT

Problem: Stored artifacts count against GitHub storage quotas. Current settings: - Build artifacts: 14 days - Docker image artifact: 7 days - Benchmark results: 14 days (manual CI) + 30 days (benchmarks) - Coverage: 14 days

Fix: Reduce retention for non-release artifacts:

Artifact Current Recommended
Build packages (pkg-*) 14 days 5 days
Docker image (docker-image) 7 days 3 days
Benchmark results 14–30 days 7 days
Coverage LCOV 14 days 7 days
Release artifacts 30 days 30 days (keep)
- uses: actions/upload-artifact@v4
  with:
    retention-days: 5  # was 14

Estimated savings: Reduces storage costs; no impact on workflow speed.


Step 5: Split Build into fast-check gate + full build ⭐ MEDIUM IMPACT / MEDIUM EFFORT

Problem: All 3 platform builds start immediately. If there’s a syntax error, all 3 fail after ~8 min each.

Fix: Add a fast check job that runs cargo check + cargo clippy + cargo fmt on Linux only (~2 min). The full build matrix depends on it via needs: check.

jobs:
  check:
    name: Quick check
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup-pgrx
      - run: cargo fmt -- --check
      - run: cargo check --all-targets --features pg18
      - run: cargo clippy --all-targets --features pg18 -- -D warnings

  build:
    name: Build (${{ matrix.artifact_suffix }})
    needs: check
    # ...existing matrix...

Estimated savings: On check failure, saves ~20+ min of wasted build time across the matrix. Typical pushes that pass add ~2 min of overhead.

Note: This replaces the current separate lint job — the quick check subsumes it.


Step 6: Skip Docker image build on non-Docker changes ⭐ MEDIUM IMPACT / MEDIUM EFFORT

Problem: The Build workflow always builds the Docker E2E image (~5–8 min) even when only Rust source changed (and the Dockerfile didn’t).

Fix: Use dorny/paths-filter to conditionally run the Docker build:

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      docker: ${{ steps.filter.outputs.docker }}
      rust: ${{ steps.filter.outputs.rust }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            docker:
              - 'tests/Dockerfile.e2e'
              - 'Cargo.toml'
              - 'Cargo.lock'
              - 'src/**'
              - 'sql/**'
              - 'pg_trickle.control'
            rust:
              - 'src/**'
              - 'Cargo.toml'
              - 'Cargo.lock'

  build-docker:
    needs: detect-changes
    if: needs.detect-changes.outputs.docker == 'true'
    # ...existing docker job...

Estimated savings: Skips Docker build on ~20–30% of pushes (e.g., test-only, bench-only changes).


Step 7: Consider making Build also manual, keep only Lint automatic ⭐ HIGH IMPACT / LOW EFFORT

Problem: If the project is in active solo development (not team/PR workflow), even the Build workflow fires too often.

Fix: Create a minimal lint.yml that runs only fmt + clippy (~2–3 min) on push/PR, and make the full Build manual:

# .github/workflows/lint.yml
name: Lint
on:
  push:
    branches: [main]
    paths-ignore: ['**/*.md', 'docs/**', 'LICENSE', 'PLAN*.md', 'adrs/**']
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup-pgrx
      - run: cargo fmt -- --check
      - run: cargo clippy --all-targets --features pg18 -- -D warnings

Then change build.yml to workflow_dispatch.

Estimated savings: ~80% reduction — from ~20 min per push down to ~3 min (lint only).

Trade-off: Build artifacts won’t be automatically produced. Run Build manually before releases or when you want cross-platform verification.


Step 8: Optimize the setup-pgrx composite action ⭐ LOW IMPACT / LOW EFFORT

The setup-pgrx action is already well-cached. Minor improvements:

  1. Pin actions/cache to v4 consistently — already done
  2. Add save-always: true to Rust cache so partial builds are also saved on failure: ```yaml
    • uses: Swatinem/rust-cache@v2 with: cache-on-failure: true save-always: true ```
  3. Consider sccache for cross-job compilation cache if multiple jobs compile the same crate (diminishing returns given Swatinem already handles this).

Estimated savings: Marginal (~1–2 min improvement on cache misses).


Implementation Order

Priority Step Effort Savings Risk Status
P1 1. Path filters on Build 5 min 30–50% fewer runs Low — may need pass-through job ✅ Done
P1 2. Drop Windows from Build 5 min ~10–15 min/run Low — Windows stays in manual CI ✅ Done
P1 3. Add timeout-minutes 10 min Prevents runaway cost None ✅ Done
P2 4. Reduce artifact retention 5 min Storage savings None ✅ Done
P2 5. Fast check gate 15 min ~20 min on failures Low ✅ Done
P2 6. Skip Docker conditionally 15 min ~5–8 min/run when skipped Low ✅ Done
P3 7. Lint-only automatic 10 min ~80% reduction Higher — no auto build artifacts ✅ Done
P3 8. Optimize setup-pgrx 5 min Marginal None ✅ Done

Expected Total Savings

Before (current): Every push to main or PR triggers Build with 4 jobs (lint + 3 platforms + Docker) ≈ 40–50 billable minutes.

After P1–P3: Only Lint runs automatically (~3 min). Doc-only commits are skipped entirely. Full build is manual.

Scenario Before After (P1+P2) After (P1–P3)
Code push ~45 min ~20 min ~3 min
Doc-only push ~45 min 0 min (skipped) 0 min
Manual CI run ~35 min ~35 min ~35 min
Release (v* tag) ~30 min ~30 min ~30 min

Monthly estimate (assuming ~100 pushes/month): - Before: ~4,500 min - After P1+P2: ~2,000 min (55% reduction) - After P1–P3: ~300 min + manual runs (93% reduction)