What happens?
Training the same model on the same data (using the DuckDB backend) results in slightly different model parameters (m/u probabilities) in each run. The differences are very small (<1e-10) so likely just some floating point issue.
Although the differences are small, it makes development of linkage models a bit more cumbersome because the parameters will always change slightly and it's not immediately clear where differences between runs come from.
I'm currently often resorting to rounding the model parameters in splink.Linker._settings_obj after training (to 10 decimals which seems to work reliably) but this feels like a hack.
To Reproduce
reproduce_em_nondeterminism.py
OS:
MacOS
Splink version:
4.0.12
Have you tried this on the latest master branch?
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
What happens?
Training the same model on the same data (using the
DuckDBbackend) results in slightly different model parameters (m/u probabilities) in each run. The differences are very small (<1e-10) so likely just some floating point issue.Although the differences are small, it makes development of linkage models a bit more cumbersome because the parameters will always change slightly and it's not immediately clear where differences between runs come from.
I'm currently often resorting to rounding the model parameters in
splink.Linker._settings_objafter training (to 10 decimals which seems to work reliably) but this feels like a hack.To Reproduce
reproduce_em_nondeterminism.py
OS:
MacOS
Splink version:
4.0.12
Have you tried this on the latest
masterbranch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?