Parallelization Guide#

This guide explains how QCANT parallelizes two expensive algorithm stages:

ADAPT-VQE operator-pool commutator/gradient scoring
qscEOM matrix-element construction

What Is Parallelized#

ADAPT-VQE

Parallelized: independent commutator evaluations across candidate operators.
Serial: ADAPT iteration loop and parameter optimization.

qscEOM

Parallelized: diagonal and off-diagonal matrix-element evaluations.
Serial: final matrix assembly and eigenvalue solve.

How To Enable It#

ADAPT-VQE parallel gradients:

params, excitations, energies = QCANT.adapt_vqe(
    symbols=symbols,
    geometry=geometry,
    adapt_it=3,
    basis="sto-3g",
    charge=0,
    spin=1,
    active_electrons=5,
    active_orbitals=5,
    device_name="default.qubit",
    parallel_gradients=True,
    parallel_backend="process",   # process | thread | auto
    max_workers=8,
    gradient_chunk_size=2,
)

qscEOM parallel matrix construction:

values = QCANT.qscEOM(
    symbols=symbols,
    geometry=geometry,
    active_electrons=6,
    active_orbitals=6,
    charge=0,
    params=params,
    ash_excitation=ash_excitation,
    basis="sto-3g",
    method="pyscf",
    shots=0,
    symmetric=True,
    parallel_matrix=True,
    parallel_backend="process",   # process | thread | auto
    max_workers=8,
    matrix_chunk_size=20,
)

Backend Selection#

parallel_backend="process": preferred for CPU-bound QNode-heavy workloads.
parallel_backend="thread": useful where process creation is restricted.
parallel_backend="auto": uses process backend on POSIX and thread backend on Windows.

If process pools cannot be created (restricted environment), QCANT falls back to thread backend automatically.

Tuning Parameters#

max_workers: worker count for the selected backend.
gradient_chunk_size: number of ADAPT candidates per submitted task.
matrix_chunk_size: number of qscEOM matrix entries per submitted task.

Practical defaults:

Start with max_workers in [2, 4, 8].
For ADAPT gradients, start gradient_chunk_size=2.
For larger qscEOM matrices, use smaller chunks (for example 20) to improve load balance.

Benchmarking#

QCANT includes a benchmark script:

python scripts/benchmark_parallel_adapt_qsceom.py --profile small --repeats 1
python scripts/benchmark_parallel_adapt_qsceom.py --profile large --repeats 1

Options:

--profile small|large: workload size.
--workers 1 2 4 8: worker counts to test.
--repeats N and --warmup N: timing controls.
--outdir <path>: output directory for CSV and plot.

Outputs:

benchmark_parallel_adapt_qsceom.csv
benchmark_parallel_adapt_qsceom_speedup.png

Notes#

Results depend on backend/device availability and CPU topology.
Set BLAS/OpenMP thread limits to avoid oversubscription when benchmarking.
Expect diminishing returns once synchronization and scheduling overhead approaches per-task compute cost.