# lammps-parallel-hpc > This skill should be used when users ask about parallel and hpc in lammps; it prioritizes documentation references and then source inspection only for unresolved details. - Author: Tao E. Li - Repository: TEL-Research-Group/lammps - Version: 20260207223556 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-08 - Source: https://github.com/TEL-Research-Group/lammps - Web: https://mule.run/skillshub/@@TEL-Research-Group/lammps~lammps-parallel-hpc:20260207223556 --- --- name: lammps-parallel-hpc description: This skill should be used when users ask about parallel and hpc in lammps; it prioritizes documentation references and then source inspection only for unresolved details. --- # lammps: Parallel and HPC ## High-Signal Playbook ### Route the request - Use `lammps-build-and-install` for enabling accelerator packages at compile time. - Use `lammps-simulation-workflows` once launch performance is acceptable. - Use `lammps-troubleshooting` for MPI/GPU crashes, hangs, or nondeterministic failures. ### Triage questions - Hardware target: CPU-only, NVIDIA/AMD GPU, or mixed cluster? - Current MPI tasks per node and OpenMP threads per task? - Which accelerator path is intended (`OPENMP`, `GPU`, `KOKKOS`)? - Is MPI GPU-aware (for multi-rank CUDA KOKKOS runs)? - Is PPPM/KSpace dominating wall time? ### Canonical workflow - Start from a correct MPI-only baseline and capture timing breakdown (`doc/src/Run_output.rst`, `doc/src/Speed_measure.rst`). - Test OPENMP via `-sf omp` and thread/task sweeps (`doc/src/Speed_omp.rst`). - Test GPU package via `-sf gpu -pk gpu Ng` and tasks/GPU sweeps (`doc/src/Speed_gpu.rst`). - Test KOKKOS via `-k on -sf kk`, then tune `-pk kokkos` options (`doc/src/Speed_kokkos.rst`). - Pin MPI tasks and threads to hardware locality for stable performance. - Validate physics consistency after switching precision/backend. ### Minimal working examples - OPENMP launch (`doc/src/Speed_omp.rst`): ```bash mpirun -np 4 lmp_omp -sf omp -pk omp 4 -in in.script ``` - KOKKOS CPU launch (`doc/src/Speed_kokkos.rst`): ```bash mpirun -np 32 -ppn 4 lmp_kokkos_omp -k on t 4 -sf kk -in in.lj ``` - GPU package launch (`doc/src/Speed_gpu.rst`): ```bash mpirun -np 12 lmp -sf gpu -pk gpu 2 -in in.script ``` ### Pitfalls - Oversubscribing cores (`MPI tasks * threads` too high) (`doc/src/Speed_omp.rst`, `doc/src/Speed_kokkos.rst`). - Forgetting `-k on` when using KOKKOS-accelerated styles (`doc/src/Speed_kokkos.rst`). - Multi-rank CUDA KOKKOS runs without GPU-aware MPI (use `gpu/aware off`) (`doc/src/Speed_kokkos.rst`). - Assuming one MPI task per GPU is always optimal; often 2-10 tasks/GPU is better (`doc/src/Speed_gpu.rst`). - Interpreting Pair/KSpace timing without accounting for async GPU overlap (`doc/src/Speed_gpu.rst`). ### Validation checklist - Track `Loop time`, `katom-step/s`, and category percentages across tuning runs. - Verify CPU utilization is near expected (`MPI x OMP`) for chosen backend. - Check load-balance histograms (`Nlocal`, neighbors) for imbalance bottlenecks. - Reconfirm key thermodynamic averages after backend/precision changes. ## Scope - Handle questions about MPI/OpenMP/GPU execution, scaling, and batch systems. - Keep responses abstract and architectural for large codebases; avoid exhaustive per-function documentation unless requested. ## Primary documentation references - `doc/src/Run_output.rst` - `doc/src/Speed_measure.rst` - `doc/src/Speed_packages.rst` - `doc/src/Speed_omp.rst` - `doc/src/Speed_gpu.rst` - `doc/src/Speed_kokkos.rst` - `doc/src/Speed_intel.rst` - `doc/src/Speed_tips.rst` - `doc/src/package.rst` ## Workflow - Start with the primary references above. - If details are missing, inspect `references/doc_map.md` for additional topic documents (generated inventory plus curated anchors). - Use tutorials/examples as executable usage patterns when available. - Use tests as behavior or regression references when available. - If ambiguity remains after docs, inspect `references/source_map.md` and start with the ranked source entry points. - Cite exact documentation file paths in responses. ## Tutorials and examples - `examples` ## Test references - None discovered. ## Optional deeper inspection - `fortran` - `lib` - `python` - `src` ## Source entry points for unresolved issues - `src/NETCDF/dump_netcdf_mpiio.h` - `src/NETCDF/dump_netcdf_mpiio.cpp` - `lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel_Scan.hpp` - `lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel_Reduce.hpp` - `lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel_For.hpp` - `src/OPENMP/pair_zbl_omp.h` - `src/OPENMP/pair_zbl_omp.cpp` - `src/OPENMP/pair_yukawa_omp.h` - Prefer targeted source search (for example: `rg -n "" fortran lib python src`).