Prereqs
- On Arista 7060‑CX‑32S‑R for GPU‑04 only: enable lossless RoCEv2 on GPU‑04’s ports and matching FlashBlade uplinks
- MTU ≈ 9214, PFC priority 3, DCBX IEEE, QoS trust DSCP
- Use scripts/arista_eos_gpu04_template.txt, replace placeholders with real port IDs, apply, then verify:
- show priority-flow-control interfaces
- show dcbx interface detail
- show interfaces counters queues
- De‑bond GPU‑04’s two 100G interfaces so they are discrete (RDMA disallows LACP on data paths).
- Ensure FlashBlade export is available (default /fsaai-shared) on 10.7.182.0/24 and DNS s500-data.fsaai.lab resolves to data VIPs (10.7.182.60–.63).
Configure env for GPU‑04
- Edit env/fsaai-gpu-04.env:
- Set IFACE0=<100G interface name> (not the bond) and HOST_IP0=<unused 10.7.182.x/24>
- Optionally add IFACE1 and HOST_IP1 for dual-path later
- Optionally pin FB_VIP0/FB_VIP1 (else DNS round robin is used)
Run setup and validation
- make init
- sudo bash scripts/fb_rdma_setup.sh --env env/fsaai-gpu-04.env
- sudo bash scripts/fb_rdma_validate.sh --env env/fsaai-gpu-04.env
What success looks like
- nfsstat -m shows mounts with proto=rdma and vers=3
- 8972-byte pings succeed to the FB VIP(s) from the matching interface(s)
- The I/O smoke test completes without errors
Cleanup (optional)
- sudo bash scripts/fb_rdma_cleanup.sh --env env/fsaai-gpu-04.env
Notes
- RDMA detection is vendor-agnostic (proceeds if ibv_devices lists any HCA).
- The scripts fail fast if a specified RDMA interface is part of a bond/team.
- Defaults align to the lab doc: 10.7.182.0/24 data, s500-data.fsaai.lab, /fsaai-shared, MTU=9000, nconnect=4.