First 5 minutes of hell
The non-blocking operation returns immediately (independently of the completion condition). The data transfer is then in progress and we cannot modify the buffer (after sending) or read the buffer (before receiving) until explicitly testing whether the communication has finished.
Operations:
MPI_Isend,MPI_Ibsend,MPI_Issend,MPI_Irsend,MPI_Irecv(prefix “I” = immediate)
- the completion conditions did not change (see previous exam question), only that those functions return immediately
MPI_Irecvinitiates the data reception, but the data cannot be read until the explicit check
- so this function does not return
MPI_Statusstruct, because the data are not received yet- all operations have an extra variable:
MPI_Request, which is like a semantic connection between the kickoff of the communication and the completion of the communication
- this is useful for checking whether the communication has finished or not?
How to check if the communication is finished?
MPI_Test- for active pooling (returns instantly with true/false if the communication is finished)MPI_Wait- for passive pooling, but is blocking (waits until the communication is finished)- both functions fill the
MPI_Statusfield after the communication takes place- both functions serve for checking both sending and receiving statuses even though
MPI_Statusis not needed for the sending side (it is done for universality), could be ignored- there are also:
Testany,Waitany,Testall,Waitall(testing/waiting for any process or all processes)What are the benefits of the nonblocking operations?
- threads can do useful work while waiting on the data (communication X work overlap)
- threads can wait on multiple communications at once (communication X communication overlap)
- mitigation of the risk of communication deadlock (which could happen easily with blocking communication method - one process waits on other, which waits on the first since it needs certain data to continue)
- the send needs to be matched with receive in order that does not cause deadlock
Definition: nonblocking operation
A nonblocking MPI function returns immediately after kicking off the operation, independently of any completion condition. The data transfer is in progress; whether it has finished is unknown until the program explicitly tests for it.
Two consequences follow directly from this definition:
- For nonblocking sends: the input buffer cannot be modified until completion has been verified.
- For nonblocking receives: the output buffer cannot be read until completion has been verified.
The five nonblocking primitives
Each blocking primitive has a nonblocking counterpart, distinguished by the prefix I (immediate):
MPI_Isend- nonblocking standard send.MPI_Ibsend- nonblocking buffered send.MPI_Issend- nonblocking synchronous send.MPI_Irsend- nonblocking ready send.MPI_Irecv- nonblocking receive.
The communication mode (standard, buffered, synchronous, ready) determines under what conditions the eventual completion will happen, exactly as in the blocking case. The nonblocking variant only changes when the call returns, not what completion means.
The MPI_Request state object
All nonblocking functions take an extra output parameter: a pointer to a variable of type MPI_Request. This handle is the unique link between the kickoff and the later completion check.
int MPI_Isend(const void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm,
MPI_Request *request);
int MPI_Irecv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm,
MPI_Request *request);With multiple outstanding nonblocking operations, each generates its own MPI_Request, which preserves the semantic pairing of kickoff and completion. Conceptually, the kickoff and completion check act like opening and closing parentheses surrounding arbitrary intervening code.
Completion: MPI_Test and MPI_Wait
int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status);
int MPI_Wait(MPI_Request *request, MPI_Status *status);MPI_Testis active polling: returns immediately.flag = truemeans the operation has completed (andstatusis filled in for receives);flag = falsemeans it is still in progress.MPI_Waitis blocking: returns only when the operation has completed. The completion condition is the same as the equivalent blocking call would have used (e.g. forMPI_Issend,MPI_Waitreturns once the receiver has initiated reception).
The two mechanisms can be freely mixed: poll a few times with MPI_Test to overlap useful work with the communication, then fall back to MPI_Wait once the data is genuinely needed.
Where the status object comes from in nonblocking receive
For nonblocking reception, the MPI_Status object is filled in by MPI_Test / MPI_Wait, not by MPI_Irecv itself - because at the moment MPI_Irecv returns, no data has been received yet, so source/tag/count are not yet known.
int c;
MPI_Request request;
MPI_Irecv(&c, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, &request);
// c is not defined yet
MPI_Status status;
MPI_Wait(&request, &status);
// now c holds the received value, status is filled in
std::cout << "Source: " << status.MPI_SOURCE << std::endl;Canonical example: MPI_Isend + MPI_Test + MPI_Wait
int c = 10;
MPI_Request request;
MPI_Isend(&c, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
// c cannot be modified yet
int flag;
MPI_Status status;
MPI_Test(&request, &flag, &status);
// c can be modified only if flag == true
MPI_Wait(&request, &status);
// now c can be safely modifiedCollective completion: Testany, Waitany, Testall, Waitall
MPI_Test and MPI_Wait operate on one request. For a set of outstanding requests:
MPI_Testany/MPI_Waitany- test/wait for any one of the requests to complete; returns the index of the one that did.MPI_Testall/MPI_Waitall- test/wait for all requests to complete simultaneously.
MPI_Request requests[3];
MPI_Irecv(..., &requests[0]);
MPI_Irecv(..., &requests[1]);
MPI_Irecv(..., &requests[2]);
MPI_Status statuses[3];
MPI_Waitall(3, requests, statuses);This enables expressing “do something if any of my outstanding sends or receives has progressed” in a single call, instead of looping over individual MPI_Tests.
Universality of MPI_Wait / MPI_Test
MPI_Wait is the same function for all four send modes and for receive. Its universality requires an output MPI_Status parameter even though status makes no real sense for sends. The MPI designers chose universality over neatness - one wait/test call covers all five operation kinds; the user simply ignores the status (MPI_STATUS_IGNORE) when waiting on a send. The same applies to MPI_Test, which additionally has the output flag.
Why nonblocking communication matters
Correctness
Any blocking operation creates the possibility of deadlock if data dependencies between communication operations are misdesigned. For complex communication patterns, ensuring that every blocking send is matched with a receive in a deadlock-free order is hard; for dynamic patterns determined at runtime, it is harder still. Nonblocking communication breaks cyclic dependencies because the kickoff returns immediately, allowing receives to be posted on the same process.
The cyclic shift permutation is the canonical example: naive MPI_Send + MPI_Recv on a ring may deadlock; replacing the send with MPI_Isend followed by MPI_Recv and then MPI_Wait works correctly and even works if MPI_Isend is replaced with MPI_Issend. Symmetrically, MPI_Irecv followed by blocking MPI_Send works as well.
Performance
Nonblocking communication enables two kinds of overlap:
- Overlap of communication with computation: between the kickoff and the completion check, the process can do useful work that does not touch the buffer.
- Overlap of communication with other communication: multiple nonblocking operations can progress simultaneously on systems with capable networks.
Communication modes of nonblocking operations: summary
- Return from blocking
MPI_Send,MPI_Bsend,MPI_Ssend,MPI_Rsenddepends on satisfaction of the mode’s defined condition. - Return from
MPI_Isend,MPI_Ibsend,MPI_Issend,MPI_Irsendis immediate, independent of any condition. - Their eventual completion, verified via
MPI_Test/MPI_Wait, satisfies exactly the same condition that the corresponding blocking operation would have satisfied on return.
So the mode (standard / buffered / synchronous / ready) controls what completion means; the I-prefix controls when the call returns. The two dimensions are orthogonal.
State objects in nonblocking communication
Two distinct state objects are involved:
MPI_Request- input/output to nonblocking calls and toMPI_Test/MPI_Wait. Created by the nonblocking kickoff, consumed (and reset) by a successful completion check.MPI_Status- output ofMPI_Test/MPI_Wait. CarriesMPI_SOURCE,MPI_TAG, and the underlying count (queried withMPI_Get_count). For sends, the status field has no meaningful content - useMPI_STATUS_IGNORE.
Potential exam questions
- Define a nonblocking MPI operation. State precisely what is and is not guaranteed when an
MPI_IsendorMPI_Irecvreturns. - List the five nonblocking point-to-point primitives in MPI and explain how each relates to its blocking counterpart.
- What is the role of the
MPI_Requeststate object? When is it created, how is it consumed, and what happens after a successfulMPI_Wait? - Explain the difference between
MPI_TestandMPI_Wait. Can the two be mixed on the same request? Give an example pattern. - For nonblocking receive, where does the
MPI_Statusobject get filled in - inMPI_Irecvor in the test/wait? Why? - Describe
MPI_Waitany,MPI_Waitall,MPI_Testany,MPI_Testall. When would you prefer each over a loop ofMPI_Waits? - Why does
MPI_Waithave aMPI_Statusoutput parameter even when waiting on a send? How do you handle this? - Explain the orthogonality of communication mode (standard/buffered/synchronous/ready) and blocking/nonblocking. What does each axis control?
- Why is nonblocking communication important for correctness? Give the cyclic-shift example as illustration.
- Why is nonblocking communication important for performance? Distinguish overlap of communication with computation from overlap of communication with communication.
- Show a complete code fragment in which process 0 sends an integer to process 1 using
MPI_Isend+MPI_Test+MPI_Wait, and process 1 receives it usingMPI_Irecv+MPI_Waitand prints the source rank. - State what the eventual completion of
MPI_Issend(viaMPI_Wait) guarantees, and contrast it with the eventual completion ofMPI_Ibsend.