easyscience · rozyczko · May 6, 2026 · May 9, 2026 · May 10, 2026 · May 10, 2026
diff --git a/docs/docs/tutorials/fitting-bayesian.ipynb b/docs/docs/tutorials/fitting-bayesian.ipynb
@@ -262,8 +262,11 @@
     "- `samples` (4000): the total number of posterior draws to generate across all chains;\n",
     "- `burn` (500): the number of initial *burn-in* iterations to discard &mdash; the sampler needs time to find the typical set of the posterior, and early samples are not representative;\n",
     "- `thin` (2): the *thinning* interval &mdash; only every second sample is kept, which reduces autocorrelation between consecutive draws;\n",
+    "- `population`: DREAM population count (number of parallel chains).\n",
     "\n",
-    "First, we switch to the BUMPS minimizer:\n"
+    "For expensive models, you can parallelise the population evaluation across multiple CPU processes by passing `n_workers`. This is covered in the next section.\n",
+    "\n",
+    "First, we switch to the BUMPS minimizer:"
    ]
   },
   {
@@ -298,6 +301,103 @@
     "print('parameters:', result['param_names'])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "01c7cf9d",
+   "metadata": {},
+   "source": [
+    "## Faster sampling with multiprocessing\n",
+    "\n",
+    "Each DREAM generation evaluates the model for an entire *population* of\n",
+    "candidate parameter sets. By default these evaluations run sequentially in a\n",
+    "single process. For **expensive models** (where each evaluation takes tens of\n",
+    "milliseconds or more) you can speed up sampling by distributing the population\n",
+    "across multiple worker processes.\n",
+    "\n",
+    "Set `n_workers` to the number of parallel workers:\n",
+    "\n",
+    "```python\n",
+    "result = fitter.mcmc_sample(\n",
+    "    ...,\n",
+    "    n_workers=4,   # evaluate 4 population members in parallel\n",
+    ")\n",
+    "```\n",
+    "\n",
+    "**How it works:** `mcmc_sample` serialises the BUMPS `FitProblem` (including the\n",
+    "model and all parameters) and spawns a `multiprocessing.Pool` using the\n",
+    "`spawn` start method. Each worker process independently evaluates a single\n",
+    "population member and returns the negative log-likelihood. The pool is\n",
+    "automatically closed after sampling finishes or if an error occurs.\n",
+    "\n",
+    "**When to use it:**\n",
+    "\n",
+    "| `n_workers` | Behaviour |\n",
+    "|---|---|\n",
+    "| `None` (default) | Sequential evaluation &mdash; no pool is created. |\n",
+    "| `1` | Also sequential, but explicitly requested. |\n",
+    "| `2` or more | Parallel evaluation using that many worker processes. |\n",
+    "\n",
+    "**Requirements:**\n",
+    "\n",
+    "- The `cloudpickle` package must be installed (it is a dependency of\n",
+    "  `easyscience`).\n",
+    "- The BUMPS `FitProblem` and your model's fit function must be\n",
+    "  *serialisable*. Models that capture non-picklable objects (e.g. open file\n",
+    "  handles, bare `lambda` closures over module-level state) will raise\n",
+    "  `FitError`. If this happens, fall back to `n_workers=None`.\n",
+    "- On Windows and macOS, the `spawn` start method is used automatically.\n",
+    "  Make sure the sampling call is inside an `if __name__ == '__main__':`\n",
+    "  guard when running from a script.\n",
+    "\n",
+    "**Choosing `n_workers`:** as a rule of thumb, set `n_workers` to the number\n",
+    "of physical CPU cores. If your model is very cheap (< 1 ms per evaluation),\n",
+    "the overhead of serialisation and inter-process communication may outweigh\n",
+    "the parallelism gain &mdash; try it and compare. The\n",
+    "`tools/benchmarks/sampling_mpi.py` script in the repository provides a\n",
+    "ready-made benchmark for your own model.\n",
+    "\n",
+    "```{note}\n",
+    "`n_workers` is capped at the DREAM population size (``population`` × number of\n",
+    "free parameters) because there is no benefit from more workers than there\n",
+    "are population members to evaluate.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "47c4fb9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import multiprocessing\n",
+    "import time\n",
+    "\n",
+    "# For this simple model the overhead of multiprocessing is not worth it,\n",
+    "# but the pattern below shows exactly how to enable it for expensive models.\n",
+    "\n",
+    "n_cores = multiprocessing.cpu_count()\n",
+    "print(f'Detected {n_cores} CPU cores.')\n",
+    "\n",
+    "# Example (commented out because the simple Lorentzian model is too cheap\n",
+    "# to benefit from parallelism; uncomment and adjust n_workers for your own\n",
+    "# expensive model):\n",
+    "#\n",
+    "# t0 = time.perf_counter()\n",
+    "# result_parallel = mle_fitter.mcmc_sample(\n",
+    "#     x=omega,\n",
+    "#     y=intensity_obs,\n",
+    "#     weights=1 / intensity_error,\n",
+    "#     samples=10000,\n",
+    "#     burn=500,\n",
+    "#     thin=2,\n",
+    "#     n_workers=min(4, n_cores),  # use up to 4 workers\n",
+    "# )\n",
+    "# elapsed = time.perf_counter() - t0\n",
+    "# print(f'Parallel sampling took {elapsed:.1f} s')\n",
+    "# print(f'Drew {result_parallel[\"draws\"].shape[0]} samples')"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "8766b170",

diff --git a/pixi.lock b/pixi.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -24,12 +24,13 @@ classifiers = [
 ]
 requires-python = '>=3.11'
 dependencies = [
-  'asteval', # Safely evaluate Python expressions from strings
-  'lmfit',   # Non-linear least squares fitting
-  'bumps',   # Bayesian uncertainty estimation
-  'dfo-ls',  # Derivative-free optimization
-  'numpy',   # Numerical computing
-  'scipp',   # Handling and analysis of scientific data
+  'asteval',     # Safely evaluate Python expressions from strings
+  'lmfit',       # Non-linear least squares fitting
+  'bumps',       # Bayesian uncertainty estimation
+  'cloudpickle', # Serialize fitting closures for multiprocessing
+  'dfo-ls',      # Derivative-free optimization
+  'numpy',       # Numerical computing
+  'scipp',       # Handling and analysis of scientific data
 ]
 
 [project.optional-dependencies]
@@ -211,7 +212,7 @@ select = [
 # Ignore specific rules globally
 ignore = [
   'COM812', # https://docs.astral.sh/ruff/rules/missing-trailing-comma/
-  # The following is replaced by 'D'/[tool.ruff.lint.pydocstyle] and [tool.pydoclint]
+  # The following is replaced by 'D' plus pydocstyle and pydoclint
   'DOC', # https://docs.astral.sh/ruff/rules/#pydoclint-doc
   # Disable, as [tool.format_docstring] split one-line docstrings into the canonical multi-line layout
   'D200', # https://docs.astral.sh/ruff/rules/unnecessary-multiline-docstring/

diff --git a/src/easyscience/fitting/fitter.py b/src/easyscience/fitting/fitter.py
@@ -421,9 +421,12 @@ def mcmc_sample(
         samples: int = 10000,
         burn: int = 2000,
         thin: int = 10,
+        chains: Optional[int] = None,
         population: Optional[int] = None,
+        seed: Optional[int] = None,
         vectorized: bool = False,
         sampler_kwargs: Optional[dict] = None,
+        n_workers: Optional[int] = None,
         progress_callback: Optional[Callable[[dict], Optional[bool]]] = None,
         abort_test: Optional[Callable[[], bool]] = None,
     ) -> dict:
@@ -450,14 +453,26 @@ def mcmc_sample(
         thin : int, default=10
             Thinning interval — only every ``thin``-th sample is kept,
             which reduces autocorrelation between consecutive draws.
+        chains : Optional[int], default=None
+            User-friendly alias for the BUMPS DREAM population count (number
+            of parallel chains).  Mutually exclusive with ``population``.
         population : Optional[int], default=None
-            BUMPS DREAM population count (number of parallel chains).
+            BUMPS DREAM population count for advanced users.
+        seed : Optional[int], default=None
+            Best-effort random seed.  BUMPS DREAM may use additional internal
+            RNG state that is not controlled by this seed, so exact
+            reproducibility is not guaranteed.
         vectorized : bool, default=False
             When ``True``, each x array may be multi-dimensional (e.g. an
             ``(N, M, 2)`` grid for a 2D model) and is left as-is.  When
             ``False`` (default), each x array is expected to be 1-D.
         sampler_kwargs : Optional[dict], default=None
             Additional keyword arguments forwarded to the BUMPS DREAM sampler.
+        n_workers : Optional[int], default=None
+            Number of worker processes used to evaluate the DREAM population.
+            Values of ``None`` and ``1`` use BUMPS' sequential mapper. Values
+            greater than ``1`` require the BUMPS problem and fit function to be
+            pickleable.
         progress_callback : Optional[Callable[[dict], Optional[bool]]], default=None
             Optional callback invoked at each DREAM generation.  The payload
             dict includes ``iteration`` and ``sampling: True``.
@@ -473,7 +488,8 @@ def mcmc_sample(
         Raises
         ------
         ValueError
-            If ``samples``, ``burn``, or ``thin`` are invalid.
+            If ``samples``, ``burn``, or ``thin`` are invalid, or if both
+            ``chains`` and ``population`` are provided with different values.
         RuntimeError
             If the active minimizer is not a BUMPS instance.
         """
@@ -505,8 +521,11 @@ def mcmc_sample(
                 samples=samples,
                 burn=burn,
                 thin=thin,
+                chains=chains,
                 population=population,
+                seed=seed,
                 sampler_kwargs=sampler_kwargs,
+                n_workers=n_workers,
                 progress_callback=progress_callback,
                 abort_test=abort_test,
             )