diff --git "a/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-Latency.png" "b/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-Latency.png"
new file mode 100644
index 0000000000..25709f006f
Binary files /dev/null and "b/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-Latency.png" differ
diff --git "a/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-bibw.png" "b/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-bibw.png"
new file mode 100644
index 0000000000..b293a43782
Binary files /dev/null and "b/docs/blog/posts/2026/05/OSU\342\200\2217.5-CUDA-bibw.png" differ
diff --git a/docs/blog/posts/2026/05/eessi-cray-slingshot11-part2.md b/docs/blog/posts/2026/05/eessi-cray-slingshot11-part2.md
new file mode 100644
index 0000000000..1e2cddc731
--- /dev/null
+++ b/docs/blog/posts/2026/05/eessi-cray-slingshot11-part2.md
@@ -0,0 +1,202 @@
+---
+author: [Richard]
+date: 2026-05-11
+slug: EESSI-on-Cray-Slingshot-part2
+---
+
+# MPI at Warp Speed: EESSI Meets Slingshot-11(part2)
+
+Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI/2025.06. The outcome is a significant performance improvement, bringing EESSI MPI behavior much closer to vendor tuned Cray MPI environments.
+In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](https://www.eessi.io/docs/blog/2025/11/14/EESSI-on-Cray-Slingshot/), we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the [host_injections](https://www.eessi.io/docs/site_specific_config/host_injections/) mechanism. Even as a proof‑of‑concept, the results were promising especially for GPU aware MPI communication on NVIDIA Grace Hopper systems.
+We have continued to tune and refine MPI communication while using EESSI/2025.06 software stack. Through updates to several core components and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11.
+In this follow up blog post, we present the results using OSU-Micro-Benchmarks/7.5 and show how close EESSI can now get to native, vendor‑optimized MPI performance on Slingshot‑11 systems.
+
+### System Architecture
+
+Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all
+connected via HPE Slingshot high-speed interconnect.
+It consists of two main distinct partitions:
+
+- **Partition 1**: x86_64 AMD CPUs without accelerators
+- **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators
+
+### Testing
+
+The following tests were conducted on Olivia accel partition (Grace nodes with Hopper GPUs), using two-node, two-GPU configuration with one MPI task per node.
+
+We evaluated two OSU Micro-Benchmark builds:
+
+1- OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI
+
+2- OSU-Micro-Benchmarks/7.5 compiled with PrgEnv-cray.
+
+The following commands were used to run the benchmarks:
+
+`srun -N 2 --ntasks-per-node=1 osu_bibw -i 10 D D`
+
+`srun -N 2 --ntasks-per-node=1 osu_latency -i 10 D D`
+
+ 
+
+
+See details
+
+Test using OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI:
+```
+Environment set up to use EESSI (2025.06), have fun!
+
+hostname:
+gpu-1-111
+gpu-1-102
+
+CPU info:
+Vendor ID: ARM
+
+Currently Loaded Modules:
+ 1) EESSI/2025.06 12) PMIx/5.0.2-GCCcore-13.3.0
+ 2) GCCcore/13.3.0 13) PRRTE/3.0.5-GCCcore-13.3.0
+ 3) GCC/13.3.0 14) UCC/1.3.0-GCCcore-13.3.0
+ 4) numactl/2.0.18-GCCcore-13.3.0 15) OpenMPI/5.0.3-GCC-13.3.0
+ 5) libxml2/2.12.7-GCCcore-13.3.0 16) gompi/2024a
+ 6) libpciaccess/0.18.1-GCCcore-13.3.0 17) GDRCopy/2.4.1-GCCcore-13.3.0
+ 7) hwloc/2.10.0-GCCcore-13.3.0 18) UCX-CUDA/1.16.0-GCCcore-13.3.0-CUDA-12.6.0 (g)
+ 8) OpenSSL/3 19) NCCL/2.22.3-GCCcore-13.3.0-CUDA-12.6.0 (g)
+ 9) libevent/2.1.12-GCCcore-13.3.0 20) UCC-CUDA/1.3.0-GCCcore-13.3.0-CUDA-12.6.0 (g)
+ 10) UCX/1.16.0-GCCcore-13.3.0 21) OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 (g)
+ 11) libfabric/1.21.0-GCCcore-13.3.0
+
+ Where:
+ g: built for GPU
+
+# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5
+# Datatype: MPI_CHAR.
+# Size Bandwidth (MB/s)
+1 2.57
+2 5.11
+4 10.22
+8 20.66
+16 40.44
+32 80.95
+64 165.02
+128 329.14
+256 650.10
+512 1301.93
+1024 2608.66
+2048 5189.90
+4096 10332.67
+8192 19474.04
+16384 28342.00
+32768 33507.82
+65536 37659.55
+131072 41730.65
+262144 44740.60
+524288 45448.67
+1048576 45700.68
+2097152 45895.85
+4194304 46035.77
+
+# OSU MPI-CUDA Latency Test v7.5
+# Datatype: MPI_CHAR.
+# Size Avg Latency(us)
+1 2.38
+2 2.34
+4 2.34
+8 2.32
+16 2.34
+32 2.34
+64 2.34
+128 3.16
+256 3.31
+512 3.35
+1024 3.46
+2048 3.60
+4096 3.80
+8192 4.08
+16384 4.63
+32768 7.55
+65536 10.07
+131072 12.15
+262144 17.37
+524288 28.50
+1048576 50.04
+2097152 93.27
+4194304 179.65
+```
+
+Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray:
+```
+
+hostname:
+gpu-1-111
+gpu-1-102
+
+CPU info:
+Vendor ID: ARM
+
+Currently Loaded Modules:
+ 1) craype-arm-grace 7) cray-dsmml/0.3.0
+ 2) libfabric/2.3.1 8) cray-mpich/9.1.0
+ 3) craype-network-ofi 9) cray-libsci/26.03.0
+ 4) perftools-base/26.03.0 10) PrgEnv-cray/8.7.0
+ 5) xpmem/2.11.3-1.3_gdbda01a1eb3d 11) cuda/13.0
+ 6) cce/21.0.0 12) CrayEnv
+ 7) craype/2.7.36
+
+# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5
+# Datatype: MPI_CHAR.
+# Size Bandwidth (MB/s)
+1 1.14
+2 2.23
+4 4.56
+8 9.18
+16 18.41
+32 36.77
+64 74.20
+128 147.12
+256 275.37
+512 569.29
+1024 1161.92
+2048 2339.97
+4096 4640.06
+8192 9350.01
+16384 18583.90
+32768 23840.66
+65536 34521.83
+131072 39704.04
+262144 41814.18
+524288 44072.94
+1048576 44682.92
+2097152 45122.15
+4194304 45029.99
+
+# OSU MPI-CUDA Latency Test v7.5
+# Datatype: MPI_CHAR.
+# Size Avg Latency(us)
+1 3.31
+2 3.30
+4 3.24
+8 3.36
+16 3.21
+32 3.36
+64 3.24
+128 4.45
+256 4.43
+512 4.56
+1024 4.62
+2048 4.81
+4096 4.92
+8192 5.36
+16384 6.46
+32768 10.14
+65536 11.58
+131072 14.56
+262144 19.77
+524288 31.93
+1048576 56.43
+2097152 102.16
+4194304 181.70
+```
+
+
+## Conclusion
+There is a notable improvement in performance. While additional testing is still required, the current results are highly satisfactory.