Parallelization Guide ===================== This guide explains how QCANT parallelizes two expensive algorithm stages: - ADAPT-VQE operator-pool commutator/gradient scoring - qscEOM matrix-element construction What Is Parallelized -------------------- ADAPT-VQE - Parallelized: independent commutator evaluations across candidate operators. - Serial: ADAPT iteration loop and parameter optimization. qscEOM - Parallelized: diagonal and off-diagonal matrix-element evaluations. - Serial: final matrix assembly and eigenvalue solve. How To Enable It ---------------- ADAPT-VQE parallel gradients: .. code-block:: python params, excitations, energies = QCANT.adapt_vqe( symbols=symbols, geometry=geometry, adapt_it=3, basis="sto-3g", charge=0, spin=1, active_electrons=5, active_orbitals=5, device_name="default.qubit", parallel_gradients=True, parallel_backend="process", # process | thread | auto max_workers=8, gradient_chunk_size=2, ) qscEOM parallel matrix construction: .. code-block:: python values = QCANT.qscEOM( symbols=symbols, geometry=geometry, active_electrons=6, active_orbitals=6, charge=0, params=params, ash_excitation=ash_excitation, basis="sto-3g", method="pyscf", shots=0, symmetric=True, parallel_matrix=True, parallel_backend="process", # process | thread | auto max_workers=8, matrix_chunk_size=20, ) Backend Selection ----------------- - ``parallel_backend="process"``: preferred for CPU-bound QNode-heavy workloads. - ``parallel_backend="thread"``: useful where process creation is restricted. - ``parallel_backend="auto"``: uses process backend on POSIX and thread backend on Windows. If process pools cannot be created (restricted environment), QCANT falls back to thread backend automatically. Tuning Parameters ----------------- - ``max_workers``: worker count for the selected backend. - ``gradient_chunk_size``: number of ADAPT candidates per submitted task. - ``matrix_chunk_size``: number of qscEOM matrix entries per submitted task. Practical defaults: - Start with ``max_workers`` in ``[2, 4, 8]``. - For ADAPT gradients, start ``gradient_chunk_size=2``. - For larger qscEOM matrices, use smaller chunks (for example ``20``) to improve load balance. Benchmarking ------------ QCANT includes a benchmark script: .. code-block:: bash python scripts/benchmark_parallel_adapt_qsceom.py --profile small --repeats 1 python scripts/benchmark_parallel_adapt_qsceom.py --profile large --repeats 1 Options: - ``--profile small|large``: workload size. - ``--workers 1 2 4 8``: worker counts to test. - ``--repeats N`` and ``--warmup N``: timing controls. - ``--outdir ``: output directory for CSV and plot. Outputs: - ``benchmark_parallel_adapt_qsceom.csv`` - ``benchmark_parallel_adapt_qsceom_speedup.png`` Notes ----- - Results depend on backend/device availability and CPU topology. - Set BLAS/OpenMP thread limits to avoid oversubscription when benchmarking. - Expect diminishing returns once synchronization and scheduling overhead approaches per-task compute cost.