Recopec
文章16
标签20
分类5

文章分类

一言

文章归档

NVIDIA BlueField-2 DPU 配置为 NIC 网卡的折腾过程

NVIDIA BlueField-2 DPU 配置为 NIC 网卡的折腾过程

前情提要

朋友搞来一块 NVIDIA BlueField-2 给我来玩玩,据说是他进 DPU 里面的 ARM 系统里面执行了一下更新命令,然后电脑设备管理器就感叹号了,里面的 ARM 跑着的系统也挂了。具体的情况他也不清楚,反正就到我手上了。

让他发了一块好的和一块坏的,两张都收到了,还带了一个 VMWare的 Edge 310 给我来研究。

实物情况

第一块

图片是后补的,不要介意。

背面标签

Model No:BF2M345A

P/N: MBF2M345A-VENOT_ES

S/N: MT219X37294

开机非常慢,设备管理器有一个模块报错误10

使用和主机通讯的串口,不知道账户和密码,只能重置了。

启动信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE:  BL2R: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL2R: Built : 18:59:31, Jul 22 2021
NOTICE:  BL2R built for hw (ver 1)
NOTICE:  No CDI given, can't complete Riot operation
NOTICE:  BL2R: Booting BL2
NOTICE:  BL2: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL2: Built : 18:59:30, Jul 22 2021
NOTICE:  BL2 built for hw (ver 1)
NOTICE:  Running as MBF2M345A-VENOT_ system
NOTICE:  No SPD detected on MSS0 DIMM0
NOTICE:  No SPD detected on MSS0 DIMM1
NOTICE:  Finished initializing DDR
NOTICE:  DDR POST passed.
NOTICE:  BL31: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL31: Built : 18:59:31, Jul 22 2021
NOTICE:  BL31 built for hw (ver 1)

固件版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
C:\Program Files\Mellanox\WinMFT>mlxfwmanager.exe
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
  PSID:             MT_0000000809
  PCI Device Name:  mt41686_pciconf0
  Base GUID:        b8cef60300f8d88a
  Base MAC:         b8cef6f8d88a
  Versions:         Current        Available
     FW             24.31.0356     N/A
     PXE            3.6.0401       N/A
     UEFI           14.24.0013     N/A
     UEFI Virtio blk   22.1.0011      N/A
     UEFI Virtio net   21.1.0011      N/A

  Status:           No matching image found

第二块

启动信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE:  BL2R: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL2R: Built : 18:59:31, Jul 22 2021
NOTICE:  BL2R built for hw (ver 1)
NOTICE:  No CDI given, can't complete Riot operation
NOTICE:  BL2R: Booting BL2
NOTICE:  BL2: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL2: Built : 18:59:30, Jul 22 2021
NOTICE:  BL2 built for hw (ver 1)
NOTICE:  Running as MBF2M345A-VENOT_ system
NOTICE:  No SPD detected on MSS0 DIMM0
NOTICE:  No SPD detected on MSS0 DIMM1
NOTICE:  Finished initializing DDR
NOTICE:  DDR POST passed.
NOTICE:  BL31: v2.2(release):3.7.1-1-g7a249ba
NOTICE:  BL31: Built : 18:59:31, Jul 22 2021
NOTICE:  BL31 built for hw (ver 1)

固件版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
  PSID:             MT_0000000809
  PCI Device Name:  mt41686_pciconf0
  Base GUID:        b8cef60300fc5446
  Base MAC:         b8cef6fc5446
  Versions:         Current        Available
     FW             24.31.0356     N/A
     PXE            3.6.0401       N/A
     UEFI           14.24.0013     N/A
     UEFI Virtio blk   22.1.0011      N/A
     UEFI Virtio net   21.1.0011      N/A

  Status:           No matching image found

资源

所有的资源都是来自 NVIDIA 官网

文档

DOCA 文档

https://docs.nvidia.com/networking/dpu-doca/index.html#doca

1.5.1 LTS 文档

https://docs.nvidia.com/doca/archive/doca-v1.5.1/index.html

NVIDIA DOCA Installation Guide for Linux

DOCA

https://developer.nvidia.com/doca-downloads

这里 下载 DOCA 历史版本

网卡固件

https://linux.mellanox.com/public/repo/

具体步骤

物主要求把卡弄好,当成正常的 CX6 网卡使用,参考了下面两个教程和官网的文档。

https://www.bilibili.com/video/BV1Cm421s7sq

https://www.bilibili.com/read/cv32771337

1、安装 Ubuntu

这个就不用说了吧。

2、安装 DOCA 环境

直接装最新版的就行,不需要特意装1.5.1版本的。

1
2
3
4
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.7.0/host/doca-host_2.7.0-209000-24.04-ubuntu2204_amd64.deb
sudo dpkg -i doca-host_2.7.0-209000-24.04-ubuntu2204_amd64.deb
sudo apt-get update
sudo apt-get -y install doca-all

如果碰到卡在 building initial module ,请关闭主板的 Secure Boot 功能。

3、启动 rshim

1
sudo systemctl start rshim

4、使用 minicom 连接 DPU

如果没装过的话记得 sudo apt get install minicom 一下。

1
sudo minicom -D /dev/rshim0/console

5、重置 DPU 的 ARM 核

1
sudo echo "SW_RESET 1" > /dev/rshim0/misc

6、向 DPU 更新 DOCA 1.5.1-LTS 版本

首先必须得更新到这个版本,再更新网卡驱动,直接更新最新的 DOCA 版本的话系统会起不来,如图所示。

1
bfb-install --rshim rshim0 --bfb DOCA_1.5.1_BSP_3.9.3_Ubuntu_20.04-4.2211-LTS.signed.bfb

7、启动成功,修改默认账户和密码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE:  No CDI passed to Riot core!
NOTICE:  BL2R: v2.2(release):3.9.3-4-g43fe858
NOTICE:  BL2R: Built : 19:38:23, Oct 21 2022
NOTICE:  BL2R built for hw (ver 1)
NOTICE:  BL2R: Booting BL2
NOTICE:  BL2: v2.2(release):3.9.3-4-g43fe858
NOTICE:  BL2: Built : 19:38:22, Oct 21 2022
NOTICE:  BL2 built for hw (ver 1)
NOTICE:  Running as MBF2M345A-VENOT_ system
NOTICE:  No SPD detected on MSS0 DIMM0
NOTICE:  No SPD detected on MSS0 DIMM1
NOTICE:  Finished initializing DDR
NOTICE:  DDR POST passed.
NOTICE:  BL31: v2.2(release):3.9.3-4-g43fe858
NOTICE:  BL31: Built : 19:38:22, Oct 21 2022
NOTICE:  BL31 built for hw (ver 1)
UEFI firmware (version BlueField:3.9.3-7-g8f2d8ca built at 19:40:49 on Oct 21 2022)

ubuntu ubuntu

ubuntu Bf112233

8、备份网卡固件

在宿主机上执行

1
2
3
4
5
6
7
8
9
10
11
12
sudo mst status

MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt41686_pciconf0        - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 01

备份固件命令,这里请根据具体的PCI地址来修改。

1
2
3
4
5
6
flint -d 06:00.0 query full > flint_query.txt
flint -d 06:00.0 hw query > flint_hwinfo.txt
flint -d 06:00.0 ri orig_firmware.mlx
flint -d 06:00.0 dc orig_firmware.ini
flint -d 06:00.0 rrom orig_rom.mlx
mlxburn -d 06:00.0 -vpd > orig_vpd.txt

9、启动 mst 服务,查询网卡版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// 启动 mst 服务
sudo mst start

Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success

// 查看 mst 状态
sudo mst status

MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt41686_pciconf0        - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:03:00.0 addr.reg=88 data.reg=92 cr_bar.gw_1
                                   Chip revision is: 01
// 查询网卡版本信息
sudo mlxfwmanager

Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gent
  PSID:             MT_0000000809
  PCI Device Name:  /dev/mst/mt41686_pciconf0
  Base GUID:        b8cef60300f8d88a
  Base MAC:         b8cef6f8d88a
  Versions:         Current        Available     
     FW             24.31.0356     N/A           
     PXE            3.6.0401       N/A           
     UEFI           14.24.0013     N/A           
     UEFI Virtio blk   22.1.0011      N/A           
     UEFI Virtio net   21.1.0011      N/A           

  Status:           No matching image found

10、更新网卡固件到 24.35 版本

这个固件是包含在 DOCA 1.5.1 内的,据作者在评论区所说这是最后一个包含这个网卡 PSID 的最后一个版本系统。所以先刷 DOCA 1.5.1,再升级到 DOCA 2.7,再升级最新的网卡固件。能不能跳过这个步骤直接升级最新的网卡固件我不知道,我也不愿意试试,毕竟不便宜。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// 更新固件
sudo /opt/mellanox/mlnx-fw-updater/firmware/mlxfwmanager_sriov_dis_aarch64_41686
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
  PSID:             MT_0000000809
  PCI Device Name:  /dev/mst/mt41686_pciconf0
  Base GUID:        b8cef60300f8d88a
  Base MAC:         b8cef6f8d88a
  Versions:         Current        Available
     FW             24.31.0356     24.35.2000
     NVMe           N/A            20.4.0001
     PXE            3.6.0401       3.6.0805
     UEFI           14.24.0013     14.28.0016
     UEFI Virtio blk   22.1.0011      22.4.0010
     UEFI Virtio net   21.1.0011      21.4.0010

  Status:           Update required

---------
Found 1 device(s) requiring firmware update...

Perform FW update? [y/N]: y
Device #1: Updating FW ...
FSMST_INITIALIZE -   OK
Writing Boot image component -   OK

从系统内提取固件(不需要操作)

下面是提取这个固件的命令,我已经提取好了,不用再操作了

1
scp ubuntu@192.168.100.2:/opt/mellanox/mlnx-fw-updater/firmware/mlxfwmanager_sriov_dis_aarch64_41686 /

提取出来的固件解包通过 mft-scripts 可以看到是有这个 PSID 的

1
81. MT_0000000809  MBF2M345A-VENOT_ES_Ax            NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disa

然后从 mlnx-fw-updater_23.10-3.2.2.0_arm64.deb 中解包找到了最新的固件 24.39.3560

1
57. MT_0000000809  MBF2M345A-VENOT_ES_Ax            NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disa

11、冷重启电脑,查看网卡版本

查看到网卡版本已经更新到24.35.2000了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@recopec-MS-7D25:/home/recopec# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
  PSID:             MT_0000000809
  PCI Device Name:  0000:06:00.0
  Base GUID:        b8cef60300f8d88a
  Base MAC:         b8cef6f8d88a
  Versions:         Current        Available     
     FW             24.35.2000     N/A           
     PXE            3.6.0805       N/A           
     UEFI           14.28.0016     N/A           
     UEFI Virtio blk   22.4.0010      N/A           
     UEFI Virtio net   21.4.0010      N/A           

  Status:           No matching image found

这个版本 UEFI BIOS 里面仍旧没有网卡模式选项,所以继续升级版本。

12、DPU 更新 DOCA 2.7 版本

1
bfb-install --rshim rshim0 --bfb bf-bundle-2.7.0-33_24.04_ubuntu-22.04_prod.bfb

更新过程中会提示更新 NIC FW 错误,不用管他

13、启动成功后修改默认账户和密码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Mellanox BlueField-2 A1 BL1 V1.1                                                
NOTICE:  No CDI passed to Riot core!                                            
NOTICE:  BL2R: v2.2(release):4.7.0-25-g5569834                                  
NOTICE:  BL2R: Built : 22:05:22, Apr 26 2024                                    
NOTICE:  BL2R built for hw (ver 1)                                              
NOTICE:  BL2R: Booting BL2                                                      
NOTICE:  BL2: v2.2(release):4.7.0-25-g5569834                                   
NOTICE:  BL2: Built : 22:05:22, Apr 26 2024                                     
NOTICE:  BL2 built for hw (ver 1)                                               
NOTICE:  Running as MBF2M345A-VENOT_ system                                     
NOTICE:  No SPD detected on MSS0 DIMM0                                          
NOTICE:  No SPD detected on MSS0 DIMM1                                          
NOTICE:  Finished initializing DDR                                              
NOTICE:  DDR POST passed.                                                       
NOTICE:  BL31: v2.2(release):4.7.0-25-g5569834                                  
NOTICE:  BL31: Built : 22:05:22, Apr 26 2024                                    
NOTICE:  BL31 built for hw (ver 1), lifecycle GA Non-Secured                    
UEFI firmware (version BlueField:4.7.0-42-g13081ae-BId13127 built at 22:23:12 o)

ubuntu ubuntu

ubuntu Bf1122334455

13、更新网卡版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
sudo mst start
sudo mst status
sudo mlxfwmanager                                            
Querying Mellanox devices firmware ...                                           
                                                                                 
Device #1:                                                                       
----------                                                                       
                                                                                 
  Device Type:      BlueField2                                                   
  Part Number:      MBF2M345A-VENOT_ES_Ax                                        
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-pt
  PSID:             MT_0000000809                                                
  PCI Device Name:  /dev/mst/mt41686_pciconf0                                    
  Base GUID:        b8cef60300f8d88a                                             
  Base MAC:         b8cef6f8d88a                                                 
  Versions:         Current        Available                                     
     FW             24.35.2000     N/A                                           
     PXE            3.6.0805       N/A                                           
     UEFI           14.28.0016     N/A                                           
     UEFI Virtio blk   22.4.0010      N/A                                        
     UEFI Virtio net   21.4.0010      N/A                                        
                                                                                 
  Status:           No matching image found

传送网卡固件到 DPU 内

1
scp mlxfwmanager_sriov_dis_aarch64_41686 ubuntu@192.168.100.2:/home/ubuntu/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
sudo chmod +x mlxfwmanager_sriov_dis_aarch64_41686
sudo ./mlxfwmanager_sriov_dis_aarch64_41686

Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      BlueField2
  Part Number:      MBF2M345A-VENOT_ES_Ax
  Description:      NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
  PSID:             MT_0000000809
  PCI Device Name:  /dev/mst/mt41686_pciconf0
  Base GUID:        b8cef60300f8d88a
  Base MAC:         b8cef6f8d88a
  Versions:         Current        Available
     FW             24.35.2000     24.39.3560
     NVMe           N/A            20.4.0001
     PXE            3.6.0805       3.7.0300
     UEFI           14.28.0016     14.32.0017
     UEFI Virtio blk   22.4.0010      22.4.0012
     UEFI Virtio net   21.4.0010      21.4.0013

  Status:           Update required

冷重启之后查看到更新完成

14、切换为 NIC 模式

https://docs.nvidia.com/doca/sdk/nvidia+bluefield+modes+of+operation/index.html#src-2609505413_id-.NVIDIABlueFieldModesofOperationv2.7.0-NICModeforBlueField-2

非常简单,官方提供了几种模式,其中最方便的是在 ARM 的 UEFI BIOS 里面修改。

  1. Select “Device Manager”.
  2. Select “System Configuration”.
  3. Select “BlueField Modes”.
  4. Set the “NIC Mode” field to NicMode to enable NIC mode.

上面的貌似不起作用,用下面这个试试。

1
2
// 启用 NIC 模式
mlxconfig -d mt41686_pciconf0 set INTERNAL_CPU_PAGE_SUPPLIER=1 INTERNAL_CPU_ESWITCH_MANAGER=1 INTERNAL_CPU_IB_VPORT0=1 INTERNAL_CPU_OFFLOAD_ENGINE=1

重启之后,网卡显示未插入网线,应该是正常了?我没有条件测试,就这样了,给物主发回去了。

所有的资源都在这里,网盘链接失效了的话就从我NAS里面慢慢拖吧,另外官网里面都有下载地址,随便找找就有了。

链接:https://pan.baidu.com/s/1UV7XDu6N3P9oROhStSS8hw?pwd=2333
提取码:2333

https://alist.irec.moe/@login

用户名:bf

密码:bf12345

本文作者:Recopec
本文链接:https://blog.irec.moe/nvidia_boyfriend.html
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可