srun and openmpi not launching parallel jobs












0















I can't seem to run MPI jobs using slurm. Any help or advice?



Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:



$ mpirun -N 3 MPI_Hello
Process 1 on ubuntu18.davidcarter.ca, out of 3
Process 2 on ubuntu18.davidcarter.ca, out of 3
Process 0 on ubuntu18.davidcarter.ca, out of 3


When I run with srun I get:



$ srun -n 3 MPI_Hello
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1


I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.



This is my test program:



#include <stdio.h>
#include <mpi.h>

int main(int argc, char** argv) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);

printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);

MPI_Finalize();
}


This is my /etc/slurm-llnl/slurm.conf:



# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=ubuntu18.davidcarter.ca
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
MpiParams=ports=12000-12100
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
#NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
#NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN

#
# Partitions
PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO









share|improve this question







New contributor




David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    0















    I can't seem to run MPI jobs using slurm. Any help or advice?



    Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:



    $ mpirun -N 3 MPI_Hello
    Process 1 on ubuntu18.davidcarter.ca, out of 3
    Process 2 on ubuntu18.davidcarter.ca, out of 3
    Process 0 on ubuntu18.davidcarter.ca, out of 3


    When I run with srun I get:



    $ srun -n 3 MPI_Hello
    Process 0 on ubuntu18.davidcarter.ca, out of 1
    Process 0 on ubuntu18.davidcarter.ca, out of 1
    Process 0 on ubuntu18.davidcarter.ca, out of 1


    I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.



    This is my test program:



    #include <stdio.h>
    #include <mpi.h>

    int main(int argc, char** argv) {
    int numprocs, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init (&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);

    printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);

    MPI_Finalize();
    }


    This is my /etc/slurm-llnl/slurm.conf:



    # slurm.conf file generated by configurator easy.html.
    # Put this file on all nodes of your cluster.
    # See the slurm.conf man page for more information.
    #
    ControlMachine=ubuntu18.davidcarter.ca
    #ControlAddr=
    #
    #MailProg=/bin/mail
    MpiDefault=none
    MpiParams=ports=12000-12100
    ProctrackType=proctrack/pgid
    ReturnToService=1
    SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
    #SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
    #SlurmdPort=6818
    SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
    SlurmUser=slurm
    #SlurmdUser=root
    StateSaveLocation=/var/lib/slurm-llnl/slurmctld
    SwitchType=switch/none
    TaskPlugin=task/affinity
    #
    #
    # TIMERS
    #KillWait=30
    #MinJobAge=300
    #SlurmctldTimeout=120
    #SlurmdTimeout=300
    #
    #
    # SCHEDULING
    FastSchedule=1
    SchedulerType=sched/backfill
    #SchedulerPort=7321
    SelectType=select/linear
    #
    #
    # LOGGING AND ACCOUNTING
    AccountingStorageType=accounting_storage/none
    ClusterName=cluster
    #JobAcctGatherFrequency=30
    JobAcctGatherType=jobacct_gather/linux
    #SlurmctldDebug=3
    SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
    #SlurmdDebug=3
    SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
    #
    #
    # COMPUTE NODES
    #NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
    #NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
    NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
    NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN

    #
    # Partitions
    PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
    PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
    PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO









    share|improve this question







    New contributor




    David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      0












      0








      0








      I can't seem to run MPI jobs using slurm. Any help or advice?



      Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:



      $ mpirun -N 3 MPI_Hello
      Process 1 on ubuntu18.davidcarter.ca, out of 3
      Process 2 on ubuntu18.davidcarter.ca, out of 3
      Process 0 on ubuntu18.davidcarter.ca, out of 3


      When I run with srun I get:



      $ srun -n 3 MPI_Hello
      Process 0 on ubuntu18.davidcarter.ca, out of 1
      Process 0 on ubuntu18.davidcarter.ca, out of 1
      Process 0 on ubuntu18.davidcarter.ca, out of 1


      I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.



      This is my test program:



      #include <stdio.h>
      #include <mpi.h>

      int main(int argc, char** argv) {
      int numprocs, rank, namelen;
      char processor_name[MPI_MAX_PROCESSOR_NAME];

      MPI_Init (&argc, &argv);
      MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
      MPI_Get_processor_name(processor_name, &namelen);

      printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);

      MPI_Finalize();
      }


      This is my /etc/slurm-llnl/slurm.conf:



      # slurm.conf file generated by configurator easy.html.
      # Put this file on all nodes of your cluster.
      # See the slurm.conf man page for more information.
      #
      ControlMachine=ubuntu18.davidcarter.ca
      #ControlAddr=
      #
      #MailProg=/bin/mail
      MpiDefault=none
      MpiParams=ports=12000-12100
      ProctrackType=proctrack/pgid
      ReturnToService=1
      SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
      #SlurmctldPort=6817
      SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
      #SlurmdPort=6818
      SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
      SlurmUser=slurm
      #SlurmdUser=root
      StateSaveLocation=/var/lib/slurm-llnl/slurmctld
      SwitchType=switch/none
      TaskPlugin=task/affinity
      #
      #
      # TIMERS
      #KillWait=30
      #MinJobAge=300
      #SlurmctldTimeout=120
      #SlurmdTimeout=300
      #
      #
      # SCHEDULING
      FastSchedule=1
      SchedulerType=sched/backfill
      #SchedulerPort=7321
      SelectType=select/linear
      #
      #
      # LOGGING AND ACCOUNTING
      AccountingStorageType=accounting_storage/none
      ClusterName=cluster
      #JobAcctGatherFrequency=30
      JobAcctGatherType=jobacct_gather/linux
      #SlurmctldDebug=3
      SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
      #SlurmdDebug=3
      SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
      #
      #
      # COMPUTE NODES
      #NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
      #NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
      NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
      NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN

      #
      # Partitions
      PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
      PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
      PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO









      share|improve this question







      New contributor




      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I can't seem to run MPI jobs using slurm. Any help or advice?



      Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:



      $ mpirun -N 3 MPI_Hello
      Process 1 on ubuntu18.davidcarter.ca, out of 3
      Process 2 on ubuntu18.davidcarter.ca, out of 3
      Process 0 on ubuntu18.davidcarter.ca, out of 3


      When I run with srun I get:



      $ srun -n 3 MPI_Hello
      Process 0 on ubuntu18.davidcarter.ca, out of 1
      Process 0 on ubuntu18.davidcarter.ca, out of 1
      Process 0 on ubuntu18.davidcarter.ca, out of 1


      I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.



      This is my test program:



      #include <stdio.h>
      #include <mpi.h>

      int main(int argc, char** argv) {
      int numprocs, rank, namelen;
      char processor_name[MPI_MAX_PROCESSOR_NAME];

      MPI_Init (&argc, &argv);
      MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
      MPI_Get_processor_name(processor_name, &namelen);

      printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);

      MPI_Finalize();
      }


      This is my /etc/slurm-llnl/slurm.conf:



      # slurm.conf file generated by configurator easy.html.
      # Put this file on all nodes of your cluster.
      # See the slurm.conf man page for more information.
      #
      ControlMachine=ubuntu18.davidcarter.ca
      #ControlAddr=
      #
      #MailProg=/bin/mail
      MpiDefault=none
      MpiParams=ports=12000-12100
      ProctrackType=proctrack/pgid
      ReturnToService=1
      SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
      #SlurmctldPort=6817
      SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
      #SlurmdPort=6818
      SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
      SlurmUser=slurm
      #SlurmdUser=root
      StateSaveLocation=/var/lib/slurm-llnl/slurmctld
      SwitchType=switch/none
      TaskPlugin=task/affinity
      #
      #
      # TIMERS
      #KillWait=30
      #MinJobAge=300
      #SlurmctldTimeout=120
      #SlurmdTimeout=300
      #
      #
      # SCHEDULING
      FastSchedule=1
      SchedulerType=sched/backfill
      #SchedulerPort=7321
      SelectType=select/linear
      #
      #
      # LOGGING AND ACCOUNTING
      AccountingStorageType=accounting_storage/none
      ClusterName=cluster
      #JobAcctGatherFrequency=30
      JobAcctGatherType=jobacct_gather/linux
      #SlurmctldDebug=3
      SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
      #SlurmdDebug=3
      SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
      #
      #
      # COMPUTE NODES
      #NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
      #NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
      NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
      NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN

      #
      # Partitions
      PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
      PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
      PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO






      18.04






      share|improve this question







      New contributor




      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 hours ago









      David CarterDavid Carter

      11




      11




      New contributor




      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      David Carter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "89"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          David Carter is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1123930%2fsrun-and-openmpi-not-launching-parallel-jobs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          David Carter is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          David Carter is a new contributor. Be nice, and check out our Code of Conduct.













          David Carter is a new contributor. Be nice, and check out our Code of Conduct.












          David Carter is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Ask Ubuntu!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1123930%2fsrun-and-openmpi-not-launching-parallel-jobs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          GameSpot

          connect to host localhost port 22: Connection refused

          Getting a Wifi WPA2 wifi connection