分佈式系統中常常用到MPI,這裏簡單地學習一下基礎用法,並作個筆記。node
教程c++
通信器(communicator)。通信器定義了一組可以互相發消息的進程。在這組進程中,每一個進程會被分配一個序號,稱做(rank).分佈式
本身先把本身想要發送的數據寫在一個buffer裏,該buffer能夠是MPI_Datatype
類型的指針所指向的一片內存區域,調用Send的時候就將該類型指針轉爲void *
.ide
MPI_Send( void* data, int count, MPI_Datatype datatype, int destination, int tag, MPI_Comm communicator)
MPI_Recv( void* data, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm communicator, MPI_Status* status)
注意學習
MPI_STATUS_IGNORE
。若是調用MPI_Recv
時提供了MPI_Status
參數,假設是一個名爲stat的MPI_Status,就會往裏面填入一些信息,主要是如下3個:ui
stat.MPI_SOURCE
去access;stat.MPI_TAG
;MPI_Get_count( MPI_Status* status, MPI_Datatype datatype, int* count)
The count
variable is the total number of datatype
elements that were received.
到這裏就有一個疑問?爲何須要這3個信息?指針
MPI_Recv
裏的count是最多接收多少個datatype類型的元素,可是MPI_Status裏的count是實際接收到了多少個元素。MPI_ANY_TAG
來表示接收任意tag的message,這時如何去分辨收到的message是屬於何種tag就只能依靠Status裏的信息MPI_ANY_SOURCE
來表示接收來自任何sender的message,這時如何分辨收到的message來自哪一個sender也只能靠status的信息。由於在調用MPI_Recv
的時候須要提供一個buffer去存收到的消息嘛,而每每真正收到消息以前咱們並不知道消息有多大,因此先調用Probe去探測一下。而後再調用MPI_Recv
去真正接收message。rest
MPI_Probe( int source, int tag, MPI_Comm comm, MPI_Status* status)
MPI_Barrier(MPI_Comm communicator)
: 就是BSP裏的barrier啦。code
關於同步最後一個要注意的地方是:始終記得每個你調用的集體通訊方法都是同步的。也就是說,若是你無法讓全部進程都完成 MPI_Barrier,那麼你也無法完成任何集體調用。若是你在沒有確保全部進程都調用 MPI_Barrier 的狀況下調用了它,那麼程序會空閒下來。這對初學者來講會很迷惑,因此當心這類問題。orm
MPI_Bcast( void* data, int count, MPI_Datatype datatype, int root, MPI_Comm communicator)
不管是發送方仍是接收方,都是調用一樣的MPI_Bcast
。這與點對點通訊不同。
問題:compare_bcast.c
中的第二個MPI_Barrier有什麼做用?
MPI_Bcast與Scatter的區別:
MPI_Scatter( void* send_data, int send_count, MPI_Datatype send_datatype, void* recv_data, int recv_count, MPI_Datatype recv_datatype, int root, MPI_Comm communicator)
MPI_Reduce( void* send_data, void* recv_data, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm communicator)
MPI_Allreduce( void* send_data, void* recv_data, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm communicator)
MPI_Allreduce與MPI_Allgather
相似,就是普通的MPI_gather
是將結果放到一個進程裏,可是MPI_Allgather
是將結果返回到全部進程,能被全部進程access到。MPI_Allreduce
也同樣,將reduce的結果能夠被全部的進程access到。
前面的應用,要麼是talk to one process或者talk to all the processes, 只是用了默認的一個communicator。 隨着程序規模的增大,可能須要只與部分processes通訊,因此引入了group,每一個group分別對應一個communicator。如何建立多個communicator呢?
MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm* newcomm)
MPI_Comm_split
creates new communicators by 「splitting」 a communicator into a group of sub-communicators based on the input values color and key.
The first argument, comm, is the communicator that will be used as the basis for the new communicators. This could be MPI_COMM_WORLD, but it could be any other communicator as well.
The second argument, color, determines to which new communicator each processes will belong. All processes which pass in the same value for color are assigned to the same communicator. If the color is MPI_UNDEFINED, that process won’t be included in any of the new communicators.
The third argument, key, determines the ordering (rank) within each new communicator. The process which passes in the smallest value for key will be rank 0, the next smallest will be rank 1, and so on. If there is a tie, the process that had the lower rank in the original communicator will be first.
When you print things out in an MPI program, each process has to send its output back to the place where you launched your MPI job before it can be printed to the screen. This tends to mean that the ordering gets jumbled so you can’t ever assume that just because you print things in a specific rank order, that the output will actually end up in the same order you expect. The output was just rearranged here to look nice.
MPI has a limited number of objects that it can create at a time and not freeing your objects could result in a runtime error if MPI runs out of allocatable objects.
MPI Init
or MPI_Init_thread
?int MPI_Init_thread(int *argc, char *((*argv)[]), int required, int *provided)
int MPI::Init_thread(int& argc, char**& argv, int required) int MPI::Init_thread(int required)
required: the desired level of thread support, 可能的取值:
Vendors may provide (implementation dependent) means to specify the level(s) of thread support available when the MPI program is started, e.g., with arguments to mpiexec. This will affect the outcome of calls to MPI_INIT and MPI_INIT_THREAD.
Suppose, for example, that an MPI program has been started so that only MPI_THREAD_MULTIPLE is available. Then MPI_INIT_THREAD will return provided = MPI_THREAD_MULTIPLE, irrespective of the value of required; a call to MPI_INIT will also initialize the MPI thread support level to MPI_THREAD_MULTIPLE.
Suppose, on the other hand, that an MPI program has been started so that all four levels of thread support are available. Then, a call to MPI_INIT_THREAD will return provided = required; on the other hand, a call to MPI_INIT will initialize the MPI thread support level to MPI_THREAD_SINGLE. 當提供的參數required爲MPI_THREAD_SINGLE時,與MPI_Init效果同樣
https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm