By | 16.11.2023

LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It's an acronym for Large-scale Atomic/Molecular Massively Parallel. LAMMPS is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. It was developed at Sandia National Laboratories. Large-scale Atomic/Molecular Massively Parallel Simulator is a molecular dynamics program from Sandia National Laboratories. LAMMPS makes use of Message Passing Interface for parallel communication and is free and open-source software, distributed.

|Do you spot anything unusual in these run times? If so, lammps, lammps, can you explain this strange result? Solution The lammps takes almost the same amount of time when running on a single core as when running on two cores, lammps.

A more detailed look lammps the in. Domain decomposition In the previous exercise, you will hopefully have noticed that, lammps, while the simulation run time decreases overall as lammps core count is increased, lammps, the run time was the same when run on one processor as it was when run on two processors, lammps. This unexpected behaviour for a truly strong-scaling system, lammps, you would expect the simulation to run twice as fast on two cores as lammps does on a single core can be explained by looking at our starting simulation lammps and understanding how LAMMPS handles domain decomposition, lammps.

In parallel computing, lammps, domain decomposition describes the methods used to split calculations across the cores being lammps by the lammps. How domain decomposition lammps handled varies from problem lammps problem.

In the field lammps molecular dynamics and, by extension, lammps, withing LAMMPSlammps, lammps, this decomposition is done through spatial decomposition lammps the simulation box is split up into a number of blocks, lammps, with each block being lammps to their own core.

The amount of work that a given core needs to do is directly linked to the number of atoms within its part of the domain, lammps. If a system is of uniform density i. If, lammps, lammps, however, lammps, your system is not evenly distributed, lammps, then you run the risk of having a number of cores doing all of lammps work while the rest sit lammps.


The system we have been simulating looks like this at the start of the simulation: As lammps is a system of non-uniform density, the lammps domain decomposition will not produce the desired results, lammps.

Lammps you expect the distribution of atoms within your simulation to remain constant throughout the simulation, lammps, lammps, you can use a balance command to run lammps one-off rebalancing of lammps simulation lammps the cores at the start of your simulation. On lammps other hand, lammps, if you expect the number of atoms per region of your system to fluctuate e, lammps. The diagram below helps to illustrate how these work, lammps.

Using better domain decomposition Lammps your in. What do you lammps about the runtimes? We are using the dynamic load balancing command — would the static, one-off balance command be effective here? Solution The runtimes decrease significantly when running with dynamic lammps balancing, lammps. Lammps this case, lammps, static load balancing would not work as the ethanol is still expanding to fill the simulation box, lammps, lammps.

Once the ethanol lammps evenly distributed within the box, lammps, you can remove the dynamic load balancing, lammps.


Playing around with dynamic load balancing In the example, lammps, lammps, the fix balance lammps set to be recalculated every 1, lammps, lammps, timesteps. How does the runtime vary lammps you change lammps value?

I would recommend trying 10, lammps,lammps, and 10, lammps, lammps, Solution The simulation time lammps vary drastically depending on how often rebalancing is carried out, lammps. When using dynamic rebalancing, lammps, lammps, there lammps an important trade-off between the time gained from rebalancing and the cost involved with recalculating the load balance among cores, lammps.

In general, lammps, when running a new simulation on a multi-core system, lammps, three of these values are worth particular attention though all will tell you where your lammps is spending lammps of its time : Pair indicates how much time is spent calculating pairwise particle interactions.

Ideally, lammps, when running a sensible system lammps a sensible fashion, lammps, timings will be dominated by this, lammps.

Neigh will let you know how much time is being spent building up neighbour lists. Kspace will let you know how much time is being spent calculating long-ranged interactions. Comm lets lammps know how much time lammps spent in communication between cores. This should never dominate simulation times and, lammps, lammps, lammps, if it does, lammps, this is the most obvious sign that too many computational recources are being assigned to run the simulation, lammps.

In the example lammps, we notice that the majority of the time is spent in lammps Neigh section — e, lammps. Neighbour lists are lammps common method for speeding up simulations with short-ranged lammps interactions, lammps.

Instead of considering all interactions between every particle in a system, you can generate a list of all particles within the truncation cutoff plus a little bit. With this time, lammps, you can work out how frequently you need to lammps this list. Doing this reduces the number of times that all interparticle distances need to be calculated: every few timestep, lammps, the interparticle distances for all particle lammps are calculated to generate the lammps list for each particle; and in the lammps, only the interparticle distances lammps particles within a neighbour lammps need be calculated as this is a much lammps proportion of the full system, lammps, this greatly reduces the total number of calculations.

If we dig a bit deeper into lammps in, lammps, lammps. How does this affect the simulation runtime? What happens now?


Neighbour lists only give physical lammps when the update time is less than the time it would lammps for a particle outwith the neighbour cutoff to get lammps within the short-ranged interaction cutoff. If lammps happens, the results generated by the simulation become questionable at best and, lammps, lammps, lammps, in lammps worst case, LAMMPS will crash, lammps, lammps.

If you know how many timesteps lammps short simulation ran for, lammps, lammps, you can estimate lammps frequency at which you need to calculate neighbour lists by working out how many lammps there are per lammps on average, lammps.

Provided that lammps update frequency is less than or equal to that, lammps, you should see a speed up, lammps, lammps, lammps. In this secion, lammps, we only lammps changing the frequency of updating neighbour lists. Lammps either of these will reduce the number of particles within the neighbour cutoff distance, lammps, lammps decreasing lammps number of interactions being calculated lammps timestep, lammps, lammps.

Some further tips Fixing bonds and angles A lot of interesting system involve simulating particles bonded into molecules, lammps. In a lot of classical atomistic systems, lammps, some of these bonds fluctuate significantly and at high frequencies while not causing lammps interesting physics thing e, lammps. As lammps timestep is restricted lammps the fastest-moving lammps of a simulation, lammps, lammps, lammps, the frequency of fluctuation of these bonds restricts the length of the timestep that can be used in the simulation, lammps, lammps.

Using these fixes will ensure that the desired bonds and angles are reset to lammps equilibrium lammps every timestep, lammps, lammps, lammps. An additional constraint is applied to lammps atoms to ensure that they can still lammps while keeping the bonds and angles as specified, lammps. This is especially useful for simulating fast-moving bonds at higher timesteps. Kspace can often come to dominate the lammps profile when running with a large number of MPI ranks.

Setting --tasks-per-node lammps --cpus-per-task will ensure that Slurm assigns the correct number of Lammps ranks and OpenMP threads to the executable. Running hybrid jobs efficiently can add a layer of complications, and a lammps of additional considerations must be taken into account to ensure the desired results, lammps. Some of lammps are: The product of the values assigned to --tasks-per-node and --cpus-per-taskshould be lammps than or equal lammps the number of lammps on a node on ARCHER2, lammps, that lammps is cores, lammps, lammps.

In a similar vein to the above, you also want to lammps sure that your OpenMP threads are kept within lammps single NUMA region — spanning across multiple NUMA regions will decrease the performance significantly. Using this would let you define the partitions and the amount of computational resources assigned to this partition on which long-ranged k-space interactions are calculated, lammps.

It is important to spend some time lammps your system and considering its performance. Where possible, always run a quick benchmark of your system before setting up a large run.❷