Skip to content
Snippets Groups Projects
Commit 77e3bb1a authored by Ionizing's avatar Ionizing :new_moon_with_face:
Browse files

[README.md] add report of Alltoall

parent fcfcf2d9
No related branches found
No related tags found
No related merge requests found
# Zhaojin's Lab
## H3C
# Benchmark of MPI Communication Performance on Different Nodes
## Facility Specification
## Sugon
| Machine | CPU | Memory | OS | Infiniband Device Model & Driver |
| ---------------- | ------------------------------------- | -----: | ---------- | --------------------------------------- |
| H3C: h[5,6] | Intel Xeon 6240R 24C @ 2.4GHz \* 2 | 192GB | CentOS 7.8 | Mellanox MT27800, Official OFED 5.5 |
| H3C: h[9,10] | Intel Xeon 6240R 24C @ 2.4GHz \* 2 | 192GB | CentOS 7.9 | Mellanox MT27800, CentOS builtin-driver |
| H3C: SciNat | Intel Xeon 5219R 20C @ 2.1GHz \* 2 | 192GB | CentOS 7.9 | Mellanox MT27800, CentOS builtin-driver |
| Sugon: node[1,2] | Intel Xeon E5-2650v2 8C @ 2.6GHz \* 2 | 64GB | CentOS 7.8 | Mellanox MT27500, Official OFED 4.9 |
| Sugon: node[3,4] | Intel Xeon E5-2650v2 8C @ 2.6GHz \* 2 | 64GB | CentOS 7.9 | Mellanox MT27500, CentOS builtin-driver |
## Alltoall Performance
# BLSC
`Alltoall` is very similar to matrix transposing and here is a schematic
illustration of `Alltoall`:
```
@brief Illustrates how to use an all to all.
@details This application is meant to be run with 3 MPI processes. Every MPI
process begins with a buffer containing 3 integers, one for each process
including themselves. They also have a buffer in which receive the integer
that has been sent by each other process for them. It can be visualised as
follows:
# HFAC
+-----------------------+ +-----------------------+ +-----------------------+
| Process 0 | | Process 1 | | Process 2 |
+-------+-------+-------+ +-------+-------+-------+ +-------+-------+-------+
| Value | Value | Value | | Value | Value | Value | | Value | Value | Value |
| 0 | 100 | 200 | | 300 | 400 | 500 | | 600 | 700 | 800 |
+-------+-------+-------+ +-------+-------+-------+ +-------+-------+-------+
| | |_________|_______|_______|_________|___ | |
| | _____________|_______|_______|_________| | | |
| |___|_____________|_ | _|_____________|___| |
| _____|_____________| | | | |_____________|_____ |
| | | | | | | | |
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
| 0 | 300 | 600 | | 100 | 400 | 700 | | 200 | 500 | 800 |
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
| Process 0 | | Process 1 | | Process 2 |
+-----------------+ +-----------------+ +-----------------+
```
Reference: [rookiehpc](https://rookiehpc.org/mpi/docs/mpi_alltoall/index.html)
### Test Code
```C
void test_alltoall(const uint64_t count_send, const int nrank, const int irank) {
if (0 == irank) {
uint64_t bytes = count_send * nrank * sizeof(int);
uint64_t gbs = bytes >> 30;
uint64_t mbs = ( bytes % (1 << 30) ) >> 20;
printf(" * Profiling throughput of %4lu GB %4lu MB per rank ... ", gbs, mbs);
}
int* buffer_send = (int*)malloc( nrank * count_send * (sizeof(int)) );
int* buffer_recv = (int*)malloc( nrank * count_send * (sizeof(int)) );
for (uint64_t i=0; i!=(nrank * count_send); ++i) {
buffer_send[i] = i * i + irank * 114514;
buffer_recv[i] = 0;
}
clock_t start = clock();
for (int i=0; i!=10; ++i) {
MPI_Alltoall(buffer_send, count_send, MPI_INT, buffer_recv, count_send, MPI_INT, MPI_COMM_WORLD);
}
clock_t end = clock();
int msec = (end - start) * 100 / CLOCKS_PER_SEC;
free(buffer_send);
free(buffer_recv);
if (0 == irank) {
printf(" time taken in MPI_Alltoall: %3d s %3d ms\n", msec / 1000, msec % 1000);
}
}
```
**Note**: This code allocates `nrank * bufsize` memories on each MPI rank, which means the total allocated memory
would be related to number of MPI ranks. Thus the more processes you used in `mpirun`, the more data is transferred.
### Test Result
| Machine | MPI Ranks | Data Transferred per Rank | Time Used (sec) |
| --------- | :-------: | ------------------------: | --------------: |
| SciNat | 48 | 1.5 GB | 0.895 |
| h5 | 48 | 1.5GB | 1.094 |
| h9 | 48 | 1.5GB | 1.045 |
| h5,h6 | 24 \* 2 | 1.5GB | 1.688 |
| h9,h10 | 24 \* 2 | 1.5GB | 2.437 |
| node1 | 16 | 0.5GB | 0.349 |
| node3 | 16 | 0.5GB | 0.352 |
| node[1,2] | 8 \* 2 | 0.5GB | 0.430 |
| node[3,4] | 8 \* 2 | 0.5GB | 1.935 |
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment