Contents
PgClone Async Operations
Async functions run clone operations in PostgreSQL background workers, allowing you to continue using your session while cloning proceeds in the background.
Table of Contents
- Prerequisites
- Async Table Clone
- Async Schema Clone
- Parallel Cloning
- Progress Tracking
- Progress Tracking View
- Job Management
- Conflict Resolution in Async Mode
- How It Works
Prerequisites
Add to postgresql.conf and restart PostgreSQL:
shared_preload_libraries = 'pgclone'
max_worker_processes = 32 # recommended for parallel cloning
Without shared_preload_libraries, async functions will not be available.
Async Table Clone
-- Returns a job_id (integer)
SELECT pgclone_table_async(
'host=source-server dbname=mydb user=postgres',
'public', 'large_table', true
);
-- Returns: 1
All options available for pgclone_table also work with pgclone_table_async, including target name, JSON options, conflict strategy, selective columns, and WHERE filters.
Async Schema Clone
SELECT pgclone_schema_async(
'host=source-server dbname=mydb user=postgres',
'sales', true
);
Parallel Cloning (v2.2.0 — Worker Pool)
Clone tables in parallel using a fixed-size worker pool. Instead of spawning one background worker per table (which could exhaust max_worker_processes), pgclone launches exactly N workers that pull tasks from a shared queue:
-- Clone schema with 4 pool workers
SELECT pgclone_schema_async(
'host=source-server dbname=mydb user=postgres',
'sales', true,
'{"parallel": 4}'
);
-- Combine parallel with other options
SELECT pgclone_schema_async(
'host=source-server dbname=mydb user=postgres',
'sales', true,
'{"parallel": 8, "conflict": "replace", "triggers": false}'
);
How the Worker Pool Works
- The parent process queries the source for the table list and populates a shared-memory task queue.
- Exactly N background workers are launched (where N =
"parallel"value, capped at table count). - Each worker grabs the next unclaimed table from the queue, clones it, then grabs the next — until the queue is empty.
- Faster workers automatically handle more tables (dynamic load balancing).
- The parent job tracks aggregate progress across all workers.
Benefits over v2.0.0
| Aspect | v2.0.0 (per-table workers) | v2.2.0 (worker pool) |
|---|---|---|
| 100 tables, parallel=4 | 100 bgworkers | 4 bgworkers |
| DB connections | 200 (100×2) | 8 (4×2) |
| Job slots used | 101 (1 parent + 100 child) | 5 (1 parent + 4 pool) |
| max_worker_processes needed | ≥100 | ≥4 |
| Load balancing | Static (pre-assigned) | Dynamic (work-stealing) |
Limitations
- Maximum 512 tables per pool operation (
PGCLONE_MAX_POOL_TASKS). - Only one pool operation can run at a time per database cluster.
- Pool workers are visible in
pgclone_jobs_viewas individual table-type jobs.
Progress Tracking
Check a specific job
SELECT pgclone_progress(1);
Returns JSON:
{
"job_id": 1,
"status": "running",
"phase": "copying data",
"tables_completed": 5,
"tables_total": 12,
"rows_copied": 450000,
"current_table": "orders",
"elapsed_ms": 8500
}
List all jobs
SELECT pgclone_jobs();
-- Returns JSON array of all active/recent jobs
Progress Tracking View (v2.1.0+)
Query live progress of all async clone jobs as a standard PostgreSQL view with visual progress bar and elapsed time:
SELECT job_id, status, schema_name, progress_bar FROM pgclone_jobs_view;
job_id | status | schema_name | progress_bar
--------+-----------+-------------+------------------------------------------------------------
1 | running | sales | [████████████░░░░░░░░] 60.0% | 450000 rows | 00:08:30 elapsed
2 | pending | public | [░░░░░░░░░░░░░░░░░░░░] 0.0% | 0 rows | 00:00:00 elapsed
3 | completed | analytics | [████████████████████] 100.0% | 1200000 rows | 00:25:18 elapsed
Filter by status
-- Running jobs with elapsed time
SELECT job_id, status, elapsed_time, pct_complete
FROM pgclone_jobs_view
WHERE status = 'running';
-- Failed jobs with error messages
SELECT job_id, schema_name, error_message
FROM pgclone_jobs_view
WHERE status = 'failed';
Full detail
SELECT * FROM pgclone_jobs_view;
-- Or via the underlying function:
SELECT * FROM pgclone_progress_detail();
Job Management
Cancel a running job
SELECT pgclone_cancel(1);
Resume a failed job
Resumes from the last checkpoint, returns a new job_id:
SELECT pgclone_resume(1);
-- Returns: 2
Clear completed/failed jobs
SELECT pgclone_clear_jobs();
-- Returns: number of jobs cleared
Conflict Resolution in Async Mode
All conflict strategies work with async functions:
-- Skip if table exists
SELECT pgclone_table_async(conn, 'public', 'orders', true, 'orders',
'{"conflict": "skip"}');
-- Drop and re-create
SELECT pgclone_table_async(conn, 'public', 'orders', true, 'orders',
'{"conflict": "replace"}');
-- Rename existing table
SELECT pgclone_table_async(conn, 'public', 'orders', true, 'orders',
'{"conflict": "rename"}');
How It Works
- When you call an async function, pgclone registers a background worker with PostgreSQL’s
BackgroundWorkerAPI. - The background worker starts in a separate process, connects to both source and target databases using
libpq, and performs the clone operation. - Progress is tracked in shared memory (
pgclone_state), which is allocated viashmem_request_hook(PG 15+) orRequestAddinShmemSpace(PG 14). - The
pgclone_jobs_viewreads shared memory to display real-time progress. - For parallel cloning, the parent worker spawns child workers — one per table — and monitors their completion.
Tip: Verbose per-table/per-row NOTICE messages have been moved to DEBUG1 level. To see them:
SET client_min_messages = debug1;