srun and openmpi not launching parallel jobs
I can't seem to run MPI jobs using slurm. Any help or advice?
Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:
$ mpirun -N 3 MPI_Hello
Process 1 on ubuntu18.davidcarter.ca, out of 3
Process 2 on ubuntu18.davidcarter.ca, out of 3
Process 0 on ubuntu18.davidcarter.ca, out of 3
When I run with srun I get:
$ srun -n 3 MPI_Hello
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.
This is my test program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);
MPI_Finalize();
}
This is my /etc/slurm-llnl/slurm.conf:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=ubuntu18.davidcarter.ca
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
MpiParams=ports=12000-12100
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
#NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
#NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN
#
# Partitions
PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
18.04
New contributor
add a comment |
I can't seem to run MPI jobs using slurm. Any help or advice?
Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:
$ mpirun -N 3 MPI_Hello
Process 1 on ubuntu18.davidcarter.ca, out of 3
Process 2 on ubuntu18.davidcarter.ca, out of 3
Process 0 on ubuntu18.davidcarter.ca, out of 3
When I run with srun I get:
$ srun -n 3 MPI_Hello
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.
This is my test program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);
MPI_Finalize();
}
This is my /etc/slurm-llnl/slurm.conf:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=ubuntu18.davidcarter.ca
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
MpiParams=ports=12000-12100
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
#NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
#NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN
#
# Partitions
PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
18.04
New contributor
add a comment |
I can't seem to run MPI jobs using slurm. Any help or advice?
Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:
$ mpirun -N 3 MPI_Hello
Process 1 on ubuntu18.davidcarter.ca, out of 3
Process 2 on ubuntu18.davidcarter.ca, out of 3
Process 0 on ubuntu18.davidcarter.ca, out of 3
When I run with srun I get:
$ srun -n 3 MPI_Hello
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.
This is my test program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);
MPI_Finalize();
}
This is my /etc/slurm-llnl/slurm.conf:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=ubuntu18.davidcarter.ca
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
MpiParams=ports=12000-12100
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
#NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
#NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN
#
# Partitions
PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
18.04
New contributor
I can't seem to run MPI jobs using slurm. Any help or advice?
Running a local home based mini cluster to use all of my processors. I'm using 18.04, and have installed the stock openmpi and slurm packages. I have a small test program that I use to show which cores I'm running on. When I run with mpirun I get:
$ mpirun -N 3 MPI_Hello
Process 1 on ubuntu18.davidcarter.ca, out of 3
Process 2 on ubuntu18.davidcarter.ca, out of 3
Process 0 on ubuntu18.davidcarter.ca, out of 3
When I run with srun I get:
$ srun -n 3 MPI_Hello
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
Process 0 on ubuntu18.davidcarter.ca, out of 1
I've done this many times with different arguments (--mpi=pmi2, --mpi=openmpi, etc) and can confirm that instead of running a job with n parallel threads it runs n single threaded jobs. n times the work with 1/n times the expected resources per job.
This is my test program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s, out of %dn", rank, processor_name, numprocs);
MPI_Finalize();
}
This is my /etc/slurm-llnl/slurm.conf:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=ubuntu18.davidcarter.ca
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
MpiParams=ports=12000-12100
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
#NodeName=compute[1-4] Sockets=1 CoresPerSocket=2 RealMemory=1900 State=UNKNOWN
#NodeName=compute[1-2] Sockets=1 CoresPerSocket=4 RealMemory=3800 State=UNKNOWN
NodeName=compute1 Sockets=8 CoresPerSocket=1 RealMemory=7900 State=UNKNOWN
NodeName=ubuntu18 Sockets=1 CoresPerSocket=3 RealMemory=7900 State=UNKNOWN
#
# Partitions
PartitionName=debug Nodes=ubuntu18 Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=batch Nodes=compute1 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
PartitionName=prod Nodes=compute1,ubuntu18 Default=NO MaxTime=INFINITE State=UP OverSubscribe=NO
18.04
18.04
New contributor
New contributor
New contributor
asked 2 hours ago
David CarterDavid Carter
11
11
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
David Carter is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1123930%2fsrun-and-openmpi-not-launching-parallel-jobs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
David Carter is a new contributor. Be nice, and check out our Code of Conduct.
David Carter is a new contributor. Be nice, and check out our Code of Conduct.
David Carter is a new contributor. Be nice, and check out our Code of Conduct.
David Carter is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1123930%2fsrun-and-openmpi-not-launching-parallel-jobs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown