Practice using the cluster! Choose one of the examples below:

Graphic showing a tiger sending commands to the cluster (server in Princeton's data center) and the cluster sending the output of your commands back

Example 1: Run LAMMPS on adroit:

LAMMPS is used to simulate and analyze atom and molecule movements.

This sample job will result in the creation of this cool visualization!

Our first focus is going to be getting on the cluster. This will allow us to run our program on adroit (aka: one of the clusters)

$ ssh <YourUserID>@adroit.princeton.edu

Create a directory

$ mkdir lammps.ex
$ cd lammps.ex

Download LAMMPS

$ pwd
/home/<YourUserID>/lammps.ex

$ wget https://raw.githubusercontent.com/PrincetonUniversity/install_lammps/master/01_installing/ins/adroit/lammps_mixed_prec_adroit_gpu_a100.sh

...

Execute the LAMMPS installation script and save the installation output to the “install_lammps.log” file. (This will also cause the output to be displayed onto the terminal in real time.)

$ bash lammps_mixed_prec_adroit_gpu_a100.sh | tee install_lammps.log

...

Let’s choose an example input “melt” to use for our test job. It is helpful to move this file into our build directory

$ mv .....

Now navigate to the build folder

$ cd build

Execute the MakeFile, which will compile your code

$ make all

...

Now we are going to create a slurm file — this file will help us run our program!

$ touch slurm.run

Open the file so that we can edit it

$ nano slurm.run

Below is a slurm file that will run your code — when using different inputs it’s important to change the nodes, ntasks, cpus per task, time, and more. For more information on how to create slurm files, click here.

Copy and paste this text into slurm.run, replace “<YourNetID>”

#!/bin/bash

####### --clusters=adroit            # Select which system(s) to use
##BATCH --account=blah
##BATCH --partition=all
####### --reservation=blah
####### --partition=main             # Partition (job queue)
##BATCH --requeue                    # Return job to the queue if preempted
#SBATCH --job-name=badidea           # Assign an short name to your job
#SBATCH --nodes=1                    # Number of nodes you require
#SBATCH --ntasks=8                   # Total # of tasks across all nodes
#SBATCH --cpus-per-task=1            # Cores per task (>1 if multithread tasks)
#SBATCH --mem-per-cpu=4G                     # Real memory (RAM) required per node
#SBATCH --time=15:00                 # Total run time limit (DD-HH:MM:SS)
#SBATCH --output=slurm.%N.%j.out     # STDOUT file for SLURM output
#SBATCH --mail-type=end
#SBATCH --mail-user=<YourNetID>@princeton.edu

## Environment settings needed for this job
module purge
module load intel/19.1.1.217
module load intel-mpi/intel/2019.7
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

## Run the job
srun $HOME/.local/bin/lmp_adroit -in in.melt

Notice that “srun” is what is ultimately executing our program.

Now we will execute LAMMPS. “sbatch” is the command used to submit a job script to Slurm for execution. It is followed by the name of the script you want to submit, in this case, “slurm.run.

$ sbatch slurm.run

This specific job will take about 25 minutes to run. To verify that your job is running you can run the command:

$ squeue -u <YourNetID>


...
[am3949@adroit5 build]$ squeue -u am3949
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1825257       gpu  lmpmltx   <YourUserID>  R       0:26      1 adroit-h11g2

We know that our job is running because we can see an “R” symbol under our RT status column.

While we’re waiting on our output let’s run some commands to take a closer look at our job’s performance.

First connect directly to the node where the job is running and verify it’s running in parallel.

$ ssh adroit-h11g2                  #connect to the node
$ top -u <YourUserID>               #verify the job is running in parallel

Now you will see a new output file in your build folder

$ ls

.. #add all files here

View your file by using the more command

$ more slurm-<NumberShown>

It’s really important when running your own jobs that you optimize your slurm file to run efficiently on the cluster. This command will give you data on how fast your program ran. This can be used to compare with different slurm configurations.

jobstats is a simple way to do this.

You should see an output file appear in your build directory

ex: slurm.adroit-h11n6.1799247.out

you are going to use for example number “1799247” in your jobstats

$ jobstats <YourOutputNumber>

You should get an output that looks something like this:

================================================================================
                              Slurm Job Statistics
================================================================================
         Job ID: 1799290
  NetID/Account: am3949/cses
       Job Name: badidea
          State: COMPLETED
          Nodes: 1
      CPU Cores: 8
     CPU Memory: 32GB (4GB per CPU-core)
  QOS/Partition: test/all
        Cluster: adroit
     Start Time: Mon Jul 10, 2023 at 11:53 AM
       Run Time: 00:13:44
     Time Limit: 00:15:00

                              Overall Utilization
================================================================================
  CPU utilization  [|||||||||||||||||||||||||||||||||||||||||||||||98%]
  CPU memory usage [                                                1%]

                              Detailed Utilization
================================================================================
  CPU utilization per node (CPU time used/run time)
      adroit-h11n2: 01:47:45/01:49:52 (efficiency=98.1%)

  CPU memory usage per node - used/allocated
      adroit-h11n2: 378.7MB/32.0GB (47.3MB/4.0GB per core of 8)

                                     Notes
================================================================================
  * This job only used 1% of the 32GB of total allocated CPU memory. For
    future jobs, please allocate less memory by using a Slurm directive such
    as --mem-per-cpu=1G or --mem=1G. This will reduce your queue times and
    make the resources available to other users. For more info:
      https://researchcomputing.princeton.edu/support/knowledge-base/memory

  * This job ran in the test QOS. Each user can only run a small number of
    jobs simultaneously in this QOS. For more info:
      https://researchcomputing.princeton.edu/support/knowledge-base/job-priority#test-queue

  * For additional job metrics including metrics plotted against time:
      https://myadroit.princeton.edu/pun/sys/jobstats  (VPN required off-campus)

Notice the notes section. It seems like in our original slurm file, we are using too much CPU memory, we can fix this by decreasing the amount of “ntasks” we have on our slurm file.

Change your ntasks line to read the following:

#SBATCH --ntasks=1                   # Total # of tasks across all ...

This is how we make sure that our program is running efficiently on the cluster.

Now we are going to work on running our visualization. If you do not plan on visualizing your data you can skip this step.

First we are going to copy our file into our home directory

Open a new window in your terminal

Shows Shell -> New window being selected in top bar

This should get you to your home directory terminal.

Then we are going to navigate to the folder of which we want to copy our file into

$ cd Desktop
$ pwd
/Users/am3949/Desktop

This command will copy our melt.lammpstrj file into our home directory

$ scp <YourUserId>@adroit.princeton.edu:/home/<YourUserId>/lammps.ex/lammps-stable_29Oct2020/build/melt.lammpstrj .


....

melt.lammpstrj                                49%  239MB  28.5MB/s   00:08 ET

Download VMD using this link, be sure you’re downloading the correct version. If you’re on Mac, you can check here:

Shows the apple symbol and "About this Mac" selected

You may run into a problem with downloading VMD. If so, check out this post.

Now that you have VMD open and running we are going to add our output file

Shows navigating to File -> New Molecule.. on VMD Main tab

Navigate to the file you just copied

This should load our molecule onto VMD — You may notice it just looks like a big blob of lines. We’re going to fix that by editing our Display settings. Navigate here:

Navigate to Graphics -> Representations on the VMD Main tab

Now we are going to change our drawing method to VDW

Shows navigation to VDW selection under drawing methods on Graphical Representations tab

Practice using the cluster! Choose one of the examples below:

Comments

Leave a Reply Cancel reply