Parallelization Guide#
This guide explains how QCANT parallelizes two expensive algorithm stages:
ADAPT-VQE operator-pool commutator/gradient scoring
qscEOM matrix-element construction
What Is Parallelized#
ADAPT-VQE
Parallelized: independent commutator evaluations across candidate operators.
Serial: ADAPT iteration loop and parameter optimization.
qscEOM
Parallelized: diagonal and off-diagonal matrix-element evaluations.
Serial: final matrix assembly and eigenvalue solve.
How To Enable It#
ADAPT-VQE parallel gradients:
params, excitations, energies = QCANT.adapt_vqe(
symbols=symbols,
geometry=geometry,
adapt_it=3,
basis="sto-3g",
charge=0,
spin=1,
active_electrons=5,
active_orbitals=5,
device_name="default.qubit",
parallel_gradients=True,
parallel_backend="process", # process | thread | auto
max_workers=8,
gradient_chunk_size=2,
)
qscEOM parallel matrix construction:
values = QCANT.qscEOM(
symbols=symbols,
geometry=geometry,
active_electrons=6,
active_orbitals=6,
charge=0,
params=params,
ash_excitation=ash_excitation,
basis="sto-3g",
method="pyscf",
shots=0,
symmetric=True,
parallel_matrix=True,
parallel_backend="process", # process | thread | auto
max_workers=8,
matrix_chunk_size=20,
)
Backend Selection#
parallel_backend="process": preferred for CPU-bound QNode-heavy workloads.parallel_backend="thread": useful where process creation is restricted.parallel_backend="auto": uses process backend on POSIX and thread backend on Windows.
If process pools cannot be created (restricted environment), QCANT falls back to thread backend automatically.
Tuning Parameters#
max_workers: worker count for the selected backend.gradient_chunk_size: number of ADAPT candidates per submitted task.matrix_chunk_size: number of qscEOM matrix entries per submitted task.
Practical defaults:
Start with
max_workersin[2, 4, 8].For ADAPT gradients, start
gradient_chunk_size=2.For larger qscEOM matrices, use smaller chunks (for example
20) to improve load balance.
Benchmarking#
QCANT includes a benchmark script:
python scripts/benchmark_parallel_adapt_qsceom.py --profile small --repeats 1
python scripts/benchmark_parallel_adapt_qsceom.py --profile large --repeats 1
Options:
--profile small|large: workload size.--workers 1 2 4 8: worker counts to test.--repeats Nand--warmup N: timing controls.--outdir <path>: output directory for CSV and plot.
Outputs:
benchmark_parallel_adapt_qsceom.csvbenchmark_parallel_adapt_qsceom_speedup.png
Notes#
Results depend on backend/device availability and CPU topology.
Set BLAS/OpenMP thread limits to avoid oversubscription when benchmarking.
Expect diminishing returns once synchronization and scheduling overhead approaches per-task compute cost.