Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 101 additions & 1 deletion docs/docs/tutorials/fitting-bayesian.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -262,8 +262,11 @@
"- `samples` (4000): the total number of posterior draws to generate across all chains;\n",
"- `burn` (500): the number of initial *burn-in* iterations to discard — the sampler needs time to find the typical set of the posterior, and early samples are not representative;\n",
"- `thin` (2): the *thinning* interval — only every second sample is kept, which reduces autocorrelation between consecutive draws;\n",
"- `population`: DREAM population count (number of parallel chains).\n",
"\n",
"First, we switch to the BUMPS minimizer:\n"
"For expensive models, you can parallelise the population evaluation across multiple CPU processes by passing `n_workers`. This is covered in the next section.\n",
"\n",
"First, we switch to the BUMPS minimizer:"
]
},
{
Expand Down Expand Up @@ -298,6 +301,103 @@
"print('parameters:', result['param_names'])"
]
},
{
"cell_type": "markdown",
"id": "01c7cf9d",
"metadata": {},
"source": [
"## Faster sampling with multiprocessing\n",
"\n",
"Each DREAM generation evaluates the model for an entire *population* of\n",
"candidate parameter sets. By default these evaluations run sequentially in a\n",
"single process. For **expensive models** (where each evaluation takes tens of\n",
"milliseconds or more) you can speed up sampling by distributing the population\n",
"across multiple worker processes.\n",
"\n",
"Set `n_workers` to the number of parallel workers:\n",
"\n",
"```python\n",
"result = fitter.mcmc_sample(\n",
" ...,\n",
" n_workers=4, # evaluate 4 population members in parallel\n",
")\n",
"```\n",
"\n",
"**How it works:** `mcmc_sample` serialises the BUMPS `FitProblem` (including the\n",
"model and all parameters) and spawns a `multiprocessing.Pool` using the\n",
"`spawn` start method. Each worker process independently evaluates a single\n",
"population member and returns the negative log-likelihood. The pool is\n",
"automatically closed after sampling finishes or if an error occurs.\n",
"\n",
"**When to use it:**\n",
"\n",
"| `n_workers` | Behaviour |\n",
"|---|---|\n",
"| `None` (default) | Sequential evaluation — no pool is created. |\n",
"| `1` | Also sequential, but explicitly requested. |\n",
"| `2` or more | Parallel evaluation using that many worker processes. |\n",
"\n",
"**Requirements:**\n",
"\n",
"- The `cloudpickle` package must be installed (it is a dependency of\n",
" `easyscience`).\n",
"- The BUMPS `FitProblem` and your model's fit function must be\n",
" *serialisable*. Models that capture non-picklable objects (e.g. open file\n",
" handles, bare `lambda` closures over module-level state) will raise\n",
" `FitError`. If this happens, fall back to `n_workers=None`.\n",
"- On Windows and macOS, the `spawn` start method is used automatically.\n",
" Make sure the sampling call is inside an `if __name__ == '__main__':`\n",
" guard when running from a script.\n",
"\n",
"**Choosing `n_workers`:** as a rule of thumb, set `n_workers` to the number\n",
"of physical CPU cores. If your model is very cheap (< 1 ms per evaluation),\n",
"the overhead of serialisation and inter-process communication may outweigh\n",
"the parallelism gain &mdash; try it and compare. The\n",
"`tools/benchmarks/sampling_mpi.py` script in the repository provides a\n",
"ready-made benchmark for your own model.\n",
"\n",
"```{note}\n",
"`n_workers` is capped at the DREAM population size (``population`` × number of\n",
"free parameters) because there is no benefit from more workers than there\n",
"are population members to evaluate.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47c4fb9b",
"metadata": {},
"outputs": [],
"source": [
"import multiprocessing\n",
"import time\n",
"\n",
"# For this simple model the overhead of multiprocessing is not worth it,\n",
"# but the pattern below shows exactly how to enable it for expensive models.\n",
"\n",
"n_cores = multiprocessing.cpu_count()\n",
"print(f'Detected {n_cores} CPU cores.')\n",
"\n",
"# Example (commented out because the simple Lorentzian model is too cheap\n",
"# to benefit from parallelism; uncomment and adjust n_workers for your own\n",
"# expensive model):\n",
"#\n",
"# t0 = time.perf_counter()\n",
"# result_parallel = mle_fitter.mcmc_sample(\n",
"# x=omega,\n",
"# y=intensity_obs,\n",
"# weights=1 / intensity_error,\n",
"# samples=10000,\n",
"# burn=500,\n",
"# thin=2,\n",
"# n_workers=min(4, n_cores), # use up to 4 workers\n",
"# )\n",
"# elapsed = time.perf_counter() - t0\n",
"# print(f'Parallel sampling took {elapsed:.1f} s')\n",
"# print(f'Drew {result_parallel[\"draws\"].shape[0]} samples')"
]
},
{
"cell_type": "markdown",
"id": "8766b170",
Expand Down
1 change: 1 addition & 0 deletions pixi.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 8 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ classifiers = [
]
requires-python = '>=3.11'
dependencies = [
'asteval', # Safely evaluate Python expressions from strings
'lmfit', # Non-linear least squares fitting
'bumps', # Bayesian uncertainty estimation
'dfo-ls', # Derivative-free optimization
'numpy', # Numerical computing
'scipp', # Handling and analysis of scientific data
'asteval', # Safely evaluate Python expressions from strings
'lmfit', # Non-linear least squares fitting
'bumps', # Bayesian uncertainty estimation
'cloudpickle', # Serialize fitting closures for multiprocessing
'dfo-ls', # Derivative-free optimization
'numpy', # Numerical computing
'scipp', # Handling and analysis of scientific data
]

[project.optional-dependencies]
Expand Down Expand Up @@ -211,7 +212,7 @@ select = [
# Ignore specific rules globally
ignore = [
'COM812', # https://docs.astral.sh/ruff/rules/missing-trailing-comma/
# The following is replaced by 'D'/[tool.ruff.lint.pydocstyle] and [tool.pydoclint]
# The following is replaced by 'D' plus pydocstyle and pydoclint
'DOC', # https://docs.astral.sh/ruff/rules/#pydoclint-doc
# Disable, as [tool.format_docstring] split one-line docstrings into the canonical multi-line layout
'D200', # https://docs.astral.sh/ruff/rules/unnecessary-multiline-docstring/
Expand Down
23 changes: 21 additions & 2 deletions src/easyscience/fitting/fitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,12 @@ def mcmc_sample(
samples: int = 10000,
burn: int = 2000,
thin: int = 10,
chains: Optional[int] = None,
population: Optional[int] = None,
seed: Optional[int] = None,
vectorized: bool = False,
sampler_kwargs: Optional[dict] = None,
n_workers: Optional[int] = None,
progress_callback: Optional[Callable[[dict], Optional[bool]]] = None,
abort_test: Optional[Callable[[], bool]] = None,
) -> dict:
Expand All @@ -450,14 +453,26 @@ def mcmc_sample(
thin : int, default=10
Thinning interval — only every ``thin``-th sample is kept,
which reduces autocorrelation between consecutive draws.
chains : Optional[int], default=None
User-friendly alias for the BUMPS DREAM population count (number
of parallel chains). Mutually exclusive with ``population``.
population : Optional[int], default=None
BUMPS DREAM population count (number of parallel chains).
BUMPS DREAM population count for advanced users.
seed : Optional[int], default=None
Best-effort random seed. BUMPS DREAM may use additional internal
RNG state that is not controlled by this seed, so exact
reproducibility is not guaranteed.
vectorized : bool, default=False
When ``True``, each x array may be multi-dimensional (e.g. an
``(N, M, 2)`` grid for a 2D model) and is left as-is. When
``False`` (default), each x array is expected to be 1-D.
sampler_kwargs : Optional[dict], default=None
Additional keyword arguments forwarded to the BUMPS DREAM sampler.
n_workers : Optional[int], default=None
Number of worker processes used to evaluate the DREAM population.
Values of ``None`` and ``1`` use BUMPS' sequential mapper. Values
greater than ``1`` require the BUMPS problem and fit function to be
pickleable.
progress_callback : Optional[Callable[[dict], Optional[bool]]], default=None
Optional callback invoked at each DREAM generation. The payload
dict includes ``iteration`` and ``sampling: True``.
Expand All @@ -473,7 +488,8 @@ def mcmc_sample(
Raises
------
ValueError
If ``samples``, ``burn``, or ``thin`` are invalid.
If ``samples``, ``burn``, or ``thin`` are invalid, or if both
``chains`` and ``population`` are provided with different values.
RuntimeError
If the active minimizer is not a BUMPS instance.
"""
Expand Down Expand Up @@ -505,8 +521,11 @@ def mcmc_sample(
samples=samples,
burn=burn,
thin=thin,
chains=chains,
population=population,
seed=seed,
sampler_kwargs=sampler_kwargs,
n_workers=n_workers,
progress_callback=progress_callback,
abort_test=abort_test,
)
Expand Down
Loading
Loading