ACCES Tutorial#

Introduction#

ACCES - or autonomous calibration and characterisation via evolutionary software - can calibrate virtually any simulation parameters against a user-defined cost function, quantifying and then minimising the “difference” between the simulated system and experimental reality – in essence, autonomously ‘learning’ the physical properties of the particles within the system, without the need for human input. This cost function is completely general, allowing ACCESS to calibrate simulations against measurements as simple as photographed occupancy plots, or complex system properties captured through e.g. Lagrangian particle tracking.

Example cost function quantifying the disparity between a simulation and reality.

But what does this “difference between simulation and reality” look like? Damn awful:

We ran the same DEM simulation of a vibrofluidised bed on a 100x100 grid of parameters (10,000 simulations indeed, thank you BlueBEAR @Birmingham), then compared central simulation’s occupancy profile with the other simulations.

Fundamentally, this is an optimisation problem: what parameters do I need such that my simulation reproduces an experimental measurement? But it is a horrible one, with the error function typically being noisy, non-smooth and having many false, local minima. In mathematical terms, it is non-convex and non-smooth - so gradient-based optimisers go out the window. Plus, a single simulation (a single point on the graph above) can take hours or days to run.

Enter CMA-ES [CMAES], a gradient-free global optimisation strategy that consistently ranks amongst the best optimisers for tough, global problems [OptBench] which forms the core of ACCES.

ACCES can take arbitrary Python scripts defining a simulation, automatically parallelise them to execute efficiently on multithreaded laptops or even distributed computing clusters and deterministically optimise the user-defined free parameters. It is fault-tolerant and can return to the latest optimisation state even after e.g. a system crash.

[CMAES]

Hansen N, Müller SD, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation. 2003 Mar;11(1):1-8.

[OptBench]

Rios LM, Sahinidis NV. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization. 2013 Jul;56(3):1247-93.

Example ACCES Run#

Simulations are oftentimes huge beasts that are hard to set up and run correctly / efficiently; ask a modeller to re-write their simulation as a function (which is what virtually all optimisers expect) and you might not be friends anymore. ACCES takes a different approach: it accepts entire simulation scripts; so you can take even pre-existing simulations and just define the free parameters ACCES can vary at the top, e.g.:

# In file `simulation_script.py`

# ACCESS PARAMETERS START
import coexist

parameters = coexist.create_parameters(
    variables = ["CED", "CoR", "Epsilon", "Mu"],
    minimums = [-5, -5, -5, -5],
    maximums = [+5, +5, +5, +5],
    values = [0, 0, 0, 0],          # Optional, initial guess
)

access_id = 0                       # Optional, unique ID for each simulation
# ACCESS PARAMETERS END


# Extract the free parameters' values - ACCES will modify / optimise them.
x, y, z, t = parameters["value"]


# Define the error value in *any* way - run a simulation, analyse data, etc.
#
# For simplicity, here's an analytical 4D Himmelblau function with 8 global
# and 2 local minima - the initial guess is very close to the local one!
error = (x**2 + y - 11)**2 + (x + y**2 - 7)**2 + z * t

All you need to do is to create a variable called parameters (a simple pandas.DataFrame) storing the free parameters’ names and bounds, and optionally starting values and relative uncertainty. Then, by the end of the simulation script, just define a variable named exactly error, storing a number that will need to be minimised.

Importantly, the above simulation script can be run on its own - so while prototyping your simulation, just run the script and see if a single simulation works fine; if it does, then you can pass it to ACCES to optimise the error:

# In file `access_learn.py`, same folder as `simulation_script.py`

import coexist

# Use ACCESS to learn a simulation's parameters
access = coexist.Access("simulation_script.py")
access.learn(
    num_solutions = 10,         # Number of solutions per epoch
    target_sigma = 0.1,         # Target std-dev (accuracy) of solution
    random_seed = 42,           # Reproducible / deterministic optimisation
)

You will get an output showing the parameter combinations being tried and corresponding error values:

================================================================================
Starting ACCES run at 13:05:47 on 02/03/2022 in directory `access_seed42`.
================================================================================
Epoch    0 | Population 10
--------------------------------------------------------------------------------
Scaled overall standard deviation: 1.0
             CED       CoR   Epsilon        Mu
estimate     0.0  0.000000  0.000000  0.000000
uncertainty  4.0  4.000050  4.000100  4.000150
scaled_std   1.0  1.000013  1.000025  1.000038

        CED       CoR   Epsilon        Mu       error
0  1.218868 -4.159988  3.001880  3.762400  331.093228
1 -3.095859 -4.967675  0.511374 -1.265018  252.732762
2 -0.067205 -3.412218  3.517680  3.111284  239.466423
3  0.264123  4.509021  1.870084 -3.437299  219.638764
4  1.475003 -3.835578  3.513889 -0.199711  243.967225
5 -0.739449 -2.723752  4.825957 -0.618141  170.752129
6 -1.713311 -1.408552  2.129290  1.461831  138.135979
7  1.650930  1.723306  2.333195 -1.625721   44.785098
8 -2.048971 -3.255132  2.463979  4.516059  114.660635
9 -0.455790 -3.360668 -3.298007  2.602469  206.454823
Total function evaluations: 10

================================================================================
Epoch    1 | Population 10
--------------------------------------------------------------------------------
Scaled overall standard deviation: 1.0346092474852793
                  CED       CoR   Epsilon        Mu
estimate    -0.154096 -0.641435  4.978669  0.731821
uncertainty  3.895601  3.679495  4.795107  4.013030
scaled_std   0.973900  0.919874  1.198777  1.003258

[...output truncated...]

================================================================================
Epoch   31 | Population 10
--------------------------------------------------------------------------------
Scaled overall standard deviation: 0.10326429766691796
                  CED       CoR   Epsilon        Mu
estimate     3.621098 -1.748473  4.999445 -4.992881
uncertainty  0.052355  0.078080  0.284105  0.180196
scaled_std   0.013089  0.019520  0.071026  0.045049

          CED       CoR   Epsilon        Mu      error
310  3.574473 -1.650134  4.999374 -4.901913 -23.996813
311  3.582026 -1.756384  4.863496 -4.973417 -24.071692
312  3.628789 -1.616891  4.859678 -4.992656 -23.386000
313  3.662351 -1.782704  4.982040 -4.998059 -24.478008
314  3.594971 -1.725715  4.999877 -4.898257 -24.269162
315  3.577725 -1.744971  4.998411 -4.956502 -24.629198
316  3.613914 -1.747412  4.983253 -4.996630 -24.690880
317  3.579212 -1.774811  4.852262 -4.972910 -24.055219
318  3.634952 -1.784927  4.999971 -4.999863 -24.783959
319  3.647169 -1.872419  4.999971 -4.978640 -24.685207
Total function evaluations: 320


Optimal solution found within `target_sigma`, i.e. 10.0%:
  sigma = 0.08390460663313563 < 0.1

================================================================================
The best result was found in 32 epochs:
              CED       CoR   Epsilon        Mu      error
  value  3.569249 -1.813354  4.995112 -4.994052 -24.920092
  sigma  0.042168  0.060116  0.203900  0.140874

These results were found for the job:
  access_seed42/results/parameters.284.pickle
================================================================================

Take a moment to appreciate what we’ve done here: out of 8 global and 2 local - false - minima, we have found an optimum 4-dimensional parameter combination with only 320 function evaluations. Compare that with a classical approach of regularly sampling on a 10x10x10x10 grid - we used 30 times fewer “simulations”, yet we found an optimum within 0.32% of the analytical solution!

Each ACCES run creates a folder “access_seed<random_seed>” saving the optimisation state. You can access (pun intended) it using coexist.AccessData(), even while the optimisation is still running for intermediate results:

>>> access_data = coexist.AccessData("access_seed42")
>>> access_data

AccessData
--------------------------------------------------------------------------------
paths             ╎ AccessPaths(...)
parameters        ╎             value  min  max     sigma
                  ╎ CED      3.569249 -5.0  5.0  0.052355
                  ╎ CoR     -1.813354 -5.0  5.0  0.078080
                  ╎ Epsilon  4.995112 -5.0  5.0  0.284105
                  ╎ Mu      -4.994052 -5.0  5.0  0.180196
parameters_scaled ╎             value   min   max     sigma
                  ╎ CED      0.892312 -1.25  1.25  0.013089
                  ╎ CoR     -0.453338 -1.25  1.25  0.019520
                  ╎ Epsilon  1.248778 -1.25  1.25  0.071026
                  ╎ Mu      -1.248513 -1.25  1.25  0.045049
scaling           ╎ [4. 4. 4. 4.]
population        ╎ 10
num_epochs        ╎ 32
target            ╎ 0.1
seed              ╎ 42
epochs            ╎ DataFrame(CED_mean, CoR_mean, Epsilon_mean, Mu_mean,
                  ╎           CED_std, CoR_std, Epsilon_std, Mu_std, overall_std)
epochs_scaled     ╎ DataFrame(CED_mean, CoR_mean, Epsilon_mean, Mu_mean,
                  ╎           CED_std, CoR_std, Epsilon_std, Mu_std, overall_std)
results           ╎ DataFrame(CED, CoR, Epsilon, Mu, error)
results_scaled    ╎ DataFrame(CED, CoR, Epsilon, Mu, error)

You can create a “convergence plot” showing the evolution of the ACCES run using coexist.plots.access:

import coexist

# Use path to either the `access_seed<random_seed>` folder itself, or its
# parent directory
access_data = coexist.AccessData("access_seed42")

fig = coexist.plots.access(access_data)
fig.show()

Which will produce the following interactive Plotly figure:

If you zoom into the error value, you’ll see that ACCES effectively found the optimum in less than 15 epochs; while this particular error function is smooth and a gradient-based optimiser may be quicker if the initial guess is close to a global optimum, this can never be assumed with physical simulations and noisy measurements (see the image at the top of the page).

You can visualise 2D slices through the parameter space explored, colour-coded by the error value of the closest parameter combination tried:

import coexist

# Use path to either the `access_seed<random_seed>` folder itself, or its
# parent directory
access_data = coexist.AccessData("access_seed42")

fig = coexist.plots.access2d(access_data)
fig.show()

Which will produce the following interactive Plotly figure:

The dots are the parameter combinations tried; the cells’ colours represent the closest simulation’s error value (darker = smaller). The smaller the cell, the more simulations were run in that region - notice how ACCES spends minimum computational time on unimportant, high-error areas and more around the global optimum.

ACCES on Supercomputing Clusters#

ACCES can also run each simulation as separate, massively-parallel jobs on distributed supercomputing clusters using the coexist.schedulers interface. For example, for executing simulations as sbatch jobs on a SLURM-managed supercomputer (such as the awesome BlueBEAR @Birmingham):

import coexist
from coexist.schedulers import SlurmScheduler


scheduler = SlurmScheduler(
    "10:0:0",           # Time allocated for a single simulation
    commands = '''
        # Commands you'd add in the sbatch script, after `#`
        set -e
        module purge; module load bluebear
        module load BEAR-Python-DataScience/2020a-foss-2020a-Python-3.8.2
    ''',
    qos = "bbdefault",
    account = "windowcr-rt-royalsociety",
    constraint = "cascadelake",     # Any other #SBATCH -- <CMD> = "VAL" pair
)

# Use ACCESS to learn the simulation parameters
access = coexist.Access("simulation_script.py", scheduler)
access.learn(num_solutions = 100, target_sigma = 0.1, random_seed = 12345)

Same script as before, except for the scheduler second argument to coexist.Access. For full details - and extra possible settings - do check out the “Manual” tab at the top of the page.