Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
A
alltoall_benchmark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Admin message
为了安全,强烈建议开启2FA双因子认证:User Settings -> Account -> Enable two-factor authentication!!!
Show more breadcrumbs
Ionizing
alltoall_benchmark
Commits
77e3bb1a
Commit
77e3bb1a
authored
1 year ago
by
Ionizing
Browse files
Options
Downloads
Patches
Plain Diff
[README.md] add report of Alltoall
parent
fcfcf2d9
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+90
-5
90 additions, 5 deletions
README.md
with
90 additions
and
5 deletions
README.md
+
90
−
5
View file @
77e3bb1a
# Zhaojin's Lab
## H3C
# Benchmark of MPI Communication Performance on Different Nodes
## Facility Specification
## Sugon
| Machine | CPU | Memory | OS | Infiniband Device Model & Driver |
| ---------------- | ------------------------------------- | -----: | ---------- | --------------------------------------- |
| H3C: h[5,6] | Intel Xeon 6240R 24C @ 2.4GHz
\*
2 | 192GB | CentOS 7.8 | Mellanox MT27800, Official OFED 5.5 |
| H3C: h[9,10] | Intel Xeon 6240R 24C @ 2.4GHz
\*
2 | 192GB | CentOS 7.9 | Mellanox MT27800, CentOS builtin-driver |
| H3C: SciNat | Intel Xeon 5219R 20C @ 2.1GHz
\*
2 | 192GB | CentOS 7.9 | Mellanox MT27800, CentOS builtin-driver |
| Sugon: node[1,2] | Intel Xeon E5-2650v2 8C @ 2.6GHz
\*
2 | 64GB | CentOS 7.8 | Mellanox MT27500, Official OFED 4.9 |
| Sugon: node[3,4] | Intel Xeon E5-2650v2 8C @ 2.6GHz
\*
2 | 64GB | CentOS 7.9 | Mellanox MT27500, CentOS builtin-driver |
## Alltoall Performance
# BLSC
`Alltoall`
is very similar to matrix transposing and here is a schematic
illustration of
`Alltoall`
:
```
@brief Illustrates how to use an all to all.
@details This application is meant to be run with 3 MPI processes. Every MPI
process begins with a buffer containing 3 integers, one for each process
including themselves. They also have a buffer in which receive the integer
that has been sent by each other process for them. It can be visualised as
follows:
# HFAC
+-----------------------+ +-----------------------+ +-----------------------+
| Process 0 | | Process 1 | | Process 2 |
+-------+-------+-------+ +-------+-------+-------+ +-------+-------+-------+
| Value | Value | Value | | Value | Value | Value | | Value | Value | Value |
| 0 | 100 | 200 | | 300 | 400 | 500 | | 600 | 700 | 800 |
+-------+-------+-------+ +-------+-------+-------+ +-------+-------+-------+
| | |_________|_______|_______|_________|___ | |
| | _____________|_______|_______|_________| | | |
| |___|_____________|_ | _|_____________|___| |
| _____|_____________| | | | |_____________|_____ |
| | | | | | | | |
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
| 0 | 300 | 600 | | 100 | 400 | 700 | | 200 | 500 | 800 |
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
| Process 0 | | Process 1 | | Process 2 |
+-----------------+ +-----------------+ +-----------------+
```
Reference:
[
rookiehpc
](
https://rookiehpc.org/mpi/docs/mpi_alltoall/index.html
)
### Test Code
```
C
void test_alltoall(const uint64_t count_send, const int nrank, const int irank) {
if (0 == irank) {
uint64_t bytes = count_send * nrank * sizeof(int);
uint64_t gbs = bytes >> 30;
uint64_t mbs = ( bytes % (1 << 30) ) >> 20;
printf(" * Profiling throughput of %4lu GB %4lu MB per rank ... ", gbs, mbs);
}
int* buffer_send = (int*)malloc( nrank * count_send * (sizeof(int)) );
int* buffer_recv = (int*)malloc( nrank * count_send * (sizeof(int)) );
for (uint64_t i=0; i!=(nrank * count_send); ++i) {
buffer_send[i] = i * i + irank * 114514;
buffer_recv[i] = 0;
}
clock_t start = clock();
for (int i=0; i!=10; ++i) {
MPI_Alltoall(buffer_send, count_send, MPI_INT, buffer_recv, count_send, MPI_INT, MPI_COMM_WORLD);
}
clock_t end = clock();
int msec = (end - start) * 100 / CLOCKS_PER_SEC;
free(buffer_send);
free(buffer_recv);
if (0 == irank) {
printf(" time taken in MPI_Alltoall: %3d s %3d ms\n", msec / 1000, msec % 1000);
}
}
```
**Note**
: This code allocates
`nrank * bufsize`
memories on each MPI rank, which means the total allocated memory
would be related to number of MPI ranks. Thus the more processes you used in
`mpirun`
, the more data is transferred.
### Test Result
| Machine | MPI Ranks | Data Transferred per Rank | Time Used (sec) |
| --------- | :-------: | ------------------------: | --------------: |
| SciNat | 48 | 1.5 GB | 0.895 |
| h5 | 48 | 1.5GB | 1.094 |
| h9 | 48 | 1.5GB | 1.045 |
| h5,h6 | 24
\*
2 | 1.5GB | 1.688 |
| h9,h10 | 24
\*
2 | 1.5GB | 2.437 |
| node1 | 16 | 0.5GB | 0.349 |
| node3 | 16 | 0.5GB | 0.352 |
| node[1,2] | 8
\*
2 | 0.5GB | 0.430 |
| node[3,4] | 8
\*
2 | 0.5GB | 1.935 |
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment