Parallelization Guide#

This guide explains how QCANT parallelizes two expensive algorithm stages:

  • ADAPT-VQE operator-pool commutator/gradient scoring

  • qscEOM matrix-element construction

What Is Parallelized#

ADAPT-VQE

  • Parallelized: independent commutator evaluations across candidate operators.

  • Serial: ADAPT iteration loop and parameter optimization.

qscEOM

  • Parallelized: diagonal and off-diagonal matrix-element evaluations.

  • Serial: final matrix assembly and eigenvalue solve.

How To Enable It#

ADAPT-VQE parallel gradients:

params, excitations, energies = QCANT.adapt_vqe(
    symbols=symbols,
    geometry=geometry,
    adapt_it=3,
    basis="sto-3g",
    charge=0,
    spin=1,
    active_electrons=5,
    active_orbitals=5,
    device_name="default.qubit",
    parallel_gradients=True,
    parallel_backend="process",   # process | thread | auto
    max_workers=8,
    gradient_chunk_size=2,
)

qscEOM parallel matrix construction:

values = QCANT.qscEOM(
    symbols=symbols,
    geometry=geometry,
    active_electrons=6,
    active_orbitals=6,
    charge=0,
    params=params,
    ash_excitation=ash_excitation,
    basis="sto-3g",
    method="pyscf",
    shots=0,
    symmetric=True,
    parallel_matrix=True,
    parallel_backend="process",   # process | thread | auto
    max_workers=8,
    matrix_chunk_size=20,
)

Backend Selection#

  • parallel_backend="process": preferred for CPU-bound QNode-heavy workloads.

  • parallel_backend="thread": useful where process creation is restricted.

  • parallel_backend="auto": uses process backend on POSIX and thread backend on Windows.

If process pools cannot be created (restricted environment), QCANT falls back to thread backend automatically.

Tuning Parameters#

  • max_workers: worker count for the selected backend.

  • gradient_chunk_size: number of ADAPT candidates per submitted task.

  • matrix_chunk_size: number of qscEOM matrix entries per submitted task.

Practical defaults:

  • Start with max_workers in [2, 4, 8].

  • For ADAPT gradients, start gradient_chunk_size=2.

  • For larger qscEOM matrices, use smaller chunks (for example 20) to improve load balance.

Benchmarking#

QCANT includes a benchmark script:

python scripts/benchmark_parallel_adapt_qsceom.py --profile small --repeats 1
python scripts/benchmark_parallel_adapt_qsceom.py --profile large --repeats 1

Options:

  • --profile small|large: workload size.

  • --workers 1 2 4 8: worker counts to test.

  • --repeats N and --warmup N: timing controls.

  • --outdir <path>: output directory for CSV and plot.

Outputs:

  • benchmark_parallel_adapt_qsceom.csv

  • benchmark_parallel_adapt_qsceom_speedup.png

Notes#

  • Results depend on backend/device availability and CPU topology.

  • Set BLAS/OpenMP thread limits to avoid oversubscription when benchmarking.

  • Expect diminishing returns once synchronization and scheduling overhead approaches per-task compute cost.