
NVIDIA BlueField-2 DPU 配置为 NIC 网卡的折腾过程
前情提要
朋友搞来一块 NVIDIA BlueField-2 给我来玩玩,据说是他进 DPU 里面的 ARM 系统里面执行了一下更新命令,然后电脑设备管理器就感叹号了,里面的 ARM 跑着的系统也挂了。具体的情况他也不清楚,反正就到我手上了。
让他发了一块好的和一块坏的,两张都收到了,还带了一个 VMWare的 Edge 310 给我来研究。

实物情况
第一块
图片是后补的,不要介意。


背面标签
Model No:BF2M345A
P/N: MBF2M345A-VENOT_ES
S/N: MT219X37294
开机非常慢,设备管理器有一个模块报错误10

使用和主机通讯的串口,不知道账户和密码,只能重置了。
启动信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE: BL2R: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL2R: Built : 18:59:31, Jul 22 2021
NOTICE: BL2R built for hw (ver 1)
NOTICE: No CDI given, can't complete Riot operation
NOTICE: BL2R: Booting BL2
NOTICE: BL2: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL2: Built : 18:59:30, Jul 22 2021
NOTICE: BL2 built for hw (ver 1)
NOTICE: Running as MBF2M345A-VENOT_ system
NOTICE: No SPD detected on MSS0 DIMM0
NOTICE: No SPD detected on MSS0 DIMM1
NOTICE: Finished initializing DDR
NOTICE: DDR POST passed.
NOTICE: BL31: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL31: Built : 18:59:31, Jul 22 2021
NOTICE: BL31 built for hw (ver 1)
固件版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
C:\Program Files\Mellanox\WinMFT>mlxfwmanager.exe
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
PSID: MT_0000000809
PCI Device Name: mt41686_pciconf0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.31.0356 N/A
PXE 3.6.0401 N/A
UEFI 14.24.0013 N/A
UEFI Virtio blk 22.1.0011 N/A
UEFI Virtio net 21.1.0011 N/A
Status: No matching image found
第二块
启动信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE: BL2R: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL2R: Built : 18:59:31, Jul 22 2021
NOTICE: BL2R built for hw (ver 1)
NOTICE: No CDI given, can't complete Riot operation
NOTICE: BL2R: Booting BL2
NOTICE: BL2: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL2: Built : 18:59:30, Jul 22 2021
NOTICE: BL2 built for hw (ver 1)
NOTICE: Running as MBF2M345A-VENOT_ system
NOTICE: No SPD detected on MSS0 DIMM0
NOTICE: No SPD detected on MSS0 DIMM1
NOTICE: Finished initializing DDR
NOTICE: DDR POST passed.
NOTICE: BL31: v2.2(release):3.7.1-1-g7a249ba
NOTICE: BL31: Built : 18:59:31, Jul 22 2021
NOTICE: BL31 built for hw (ver 1)
固件版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
PSID: MT_0000000809
PCI Device Name: mt41686_pciconf0
Base GUID: b8cef60300fc5446
Base MAC: b8cef6fc5446
Versions: Current Available
FW 24.31.0356 N/A
PXE 3.6.0401 N/A
UEFI 14.24.0013 N/A
UEFI Virtio blk 22.1.0011 N/A
UEFI Virtio net 21.1.0011 N/A
Status: No matching image found
资源
所有的资源都是来自 NVIDIA 官网
文档
DOCA 文档
https://docs.nvidia.com/networking/dpu-doca/index.html#doca
1.5.1 LTS 文档
https://docs.nvidia.com/doca/archive/doca-v1.5.1/index.html
NVIDIA DOCA Installation Guide for Linux
DOCA
https://developer.nvidia.com/doca-downloads
这里 下载 DOCA 历史版本
网卡固件
https://linux.mellanox.com/public/repo/
具体步骤
物主要求把卡弄好,当成正常的 CX6 网卡使用,参考了下面两个教程和官网的文档。
https://www.bilibili.com/video/BV1Cm421s7sq
https://www.bilibili.com/read/cv32771337
1、安装 Ubuntu
这个就不用说了吧。
2、安装 DOCA 环境
直接装最新版的就行,不需要特意装1.5.1版本的。
1
2
3
4
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.7.0/host/doca-host_2.7.0-209000-24.04-ubuntu2204_amd64.deb
sudo dpkg -i doca-host_2.7.0-209000-24.04-ubuntu2204_amd64.deb
sudo apt-get update
sudo apt-get -y install doca-all
如果碰到卡在 building initial module ,请关闭主板的 Secure Boot 功能。
3、启动 rshim
1
sudo systemctl start rshim
4、使用 minicom 连接 DPU
如果没装过的话记得 sudo apt get install minicom 一下。
1
sudo minicom -D /dev/rshim0/console
5、重置 DPU 的 ARM 核
1
sudo echo "SW_RESET 1" > /dev/rshim0/misc
6、向 DPU 更新 DOCA 1.5.1-LTS 版本
首先必须得更新到这个版本,再更新网卡驱动,直接更新最新的 DOCA 版本的话系统会起不来,如图所示。
1
bfb-install --rshim rshim0 --bfb DOCA_1.5.1_BSP_3.9.3_Ubuntu_20.04-4.2211-LTS.signed.bfb
7、启动成功,修改默认账户和密码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE: No CDI passed to Riot core!
NOTICE: BL2R: v2.2(release):3.9.3-4-g43fe858
NOTICE: BL2R: Built : 19:38:23, Oct 21 2022
NOTICE: BL2R built for hw (ver 1)
NOTICE: BL2R: Booting BL2
NOTICE: BL2: v2.2(release):3.9.3-4-g43fe858
NOTICE: BL2: Built : 19:38:22, Oct 21 2022
NOTICE: BL2 built for hw (ver 1)
NOTICE: Running as MBF2M345A-VENOT_ system
NOTICE: No SPD detected on MSS0 DIMM0
NOTICE: No SPD detected on MSS0 DIMM1
NOTICE: Finished initializing DDR
NOTICE: DDR POST passed.
NOTICE: BL31: v2.2(release):3.9.3-4-g43fe858
NOTICE: BL31: Built : 19:38:22, Oct 21 2022
NOTICE: BL31 built for hw (ver 1)
UEFI firmware (version BlueField:3.9.3-7-g8f2d8ca built at 19:40:49 on Oct 21 2022)
ubuntu ubuntu
ubuntu Bf112233
8、备份网卡固件
在宿主机上执行
1
2
3
4
5
6
7
8
9
10
11
12
sudo mst status
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt41686_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 01
备份固件命令,这里请根据具体的PCI地址来修改。
1
2
3
4
5
6
flint -d 06:00.0 query full > flint_query.txt
flint -d 06:00.0 hw query > flint_hwinfo.txt
flint -d 06:00.0 ri orig_firmware.mlx
flint -d 06:00.0 dc orig_firmware.ini
flint -d 06:00.0 rrom orig_rom.mlx
mlxburn -d 06:00.0 -vpd > orig_vpd.txt
9、启动 mst 服务,查询网卡版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// 启动 mst 服务
sudo mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
// 查看 mst 状态
sudo mst status
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt41686_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:03:00.0 addr.reg=88 data.reg=92 cr_bar.gw_1
Chip revision is: 01
// 查询网卡版本信息
sudo mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gent
PSID: MT_0000000809
PCI Device Name: /dev/mst/mt41686_pciconf0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.31.0356 N/A
PXE 3.6.0401 N/A
UEFI 14.24.0013 N/A
UEFI Virtio blk 22.1.0011 N/A
UEFI Virtio net 21.1.0011 N/A
Status: No matching image found
10、更新网卡固件到 24.35 版本
这个固件是包含在 DOCA 1.5.1 内的,据作者在评论区所说这是最后一个包含这个网卡 PSID 的最后一个版本系统。所以先刷 DOCA 1.5.1,再升级到 DOCA 2.7,再升级最新的网卡固件。能不能跳过这个步骤直接升级最新的网卡固件我不知道,我也不愿意试试,毕竟不便宜。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// 更新固件
sudo /opt/mellanox/mlnx-fw-updater/firmware/mlxfwmanager_sriov_dis_aarch64_41686
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
PSID: MT_0000000809
PCI Device Name: /dev/mst/mt41686_pciconf0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.31.0356 24.35.2000
NVMe N/A 20.4.0001
PXE 3.6.0401 3.6.0805
UEFI 14.24.0013 14.28.0016
UEFI Virtio blk 22.1.0011 22.4.0010
UEFI Virtio net 21.1.0011 21.4.0010
Status: Update required
---------
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
Device #1: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
从系统内提取固件(不需要操作)
下面是提取这个固件的命令,我已经提取好了,不用再操作了
1
scp ubuntu@192.168.100.2:/opt/mellanox/mlnx-fw-updater/firmware/mlxfwmanager_sriov_dis_aarch64_41686 /
提取出来的固件解包通过 mft-scripts 可以看到是有这个 PSID 的
1
81. MT_0000000809 MBF2M345A-VENOT_ES_Ax NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disa
然后从 mlnx-fw-updater_23.10-3.2.2.0_arm64.deb 中解包找到了最新的固件 24.39.3560
1
57. MT_0000000809 MBF2M345A-VENOT_ES_Ax NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disa
11、冷重启电脑,查看网卡版本
查看到网卡版本已经更新到24.35.2000了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@recopec-MS-7D25:/home/recopec# mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
PSID: MT_0000000809
PCI Device Name: 0000:06:00.0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.35.2000 N/A
PXE 3.6.0805 N/A
UEFI 14.28.0016 N/A
UEFI Virtio blk 22.4.0010 N/A
UEFI Virtio net 21.4.0010 N/A
Status: No matching image found
这个版本 UEFI BIOS 里面仍旧没有网卡模式选项,所以继续升级版本。
12、DPU 更新 DOCA 2.7 版本
1
bfb-install --rshim rshim0 --bfb bf-bundle-2.7.0-33_24.04_ubuntu-22.04_prod.bfb
更新过程中会提示更新 NIC FW 错误,不用管他
13、启动成功后修改默认账户和密码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Mellanox BlueField-2 A1 BL1 V1.1
NOTICE: No CDI passed to Riot core!
NOTICE: BL2R: v2.2(release):4.7.0-25-g5569834
NOTICE: BL2R: Built : 22:05:22, Apr 26 2024
NOTICE: BL2R built for hw (ver 1)
NOTICE: BL2R: Booting BL2
NOTICE: BL2: v2.2(release):4.7.0-25-g5569834
NOTICE: BL2: Built : 22:05:22, Apr 26 2024
NOTICE: BL2 built for hw (ver 1)
NOTICE: Running as MBF2M345A-VENOT_ system
NOTICE: No SPD detected on MSS0 DIMM0
NOTICE: No SPD detected on MSS0 DIMM1
NOTICE: Finished initializing DDR
NOTICE: DDR POST passed.
NOTICE: BL31: v2.2(release):4.7.0-25-g5569834
NOTICE: BL31: Built : 22:05:22, Apr 26 2024
NOTICE: BL31 built for hw (ver 1), lifecycle GA Non-Secured
UEFI firmware (version BlueField:4.7.0-42-g13081ae-BId13127 built at 22:23:12 o)
ubuntu ubuntu
ubuntu Bf1122334455
13、更新网卡版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
sudo mst start
sudo mst status
sudo mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-pt
PSID: MT_0000000809
PCI Device Name: /dev/mst/mt41686_pciconf0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.35.2000 N/A
PXE 3.6.0805 N/A
UEFI 14.28.0016 N/A
UEFI Virtio blk 22.4.0010 N/A
UEFI Virtio net 21.4.0010 N/A
Status: No matching image found
传送网卡固件到 DPU 内
1
scp mlxfwmanager_sriov_dis_aarch64_41686 ubuntu@192.168.100.2:/home/ubuntu/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
sudo chmod +x mlxfwmanager_sriov_dis_aarch64_41686
sudo ./mlxfwmanager_sriov_dis_aarch64_41686
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M345A-VENOT_ES_Ax
Description: NVIDIA BlueField-2 E-Series Eng. sample DPU; 200GbE single-port QSFP56; PCIe Gen4 x16; Secure Boot Disabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management
PSID: MT_0000000809
PCI Device Name: /dev/mst/mt41686_pciconf0
Base GUID: b8cef60300f8d88a
Base MAC: b8cef6f8d88a
Versions: Current Available
FW 24.35.2000 24.39.3560
NVMe N/A 20.4.0001
PXE 3.6.0805 3.7.0300
UEFI 14.28.0016 14.32.0017
UEFI Virtio blk 22.4.0010 22.4.0012
UEFI Virtio net 21.4.0010 21.4.0013
Status: Update required
冷重启之后查看到更新完成

14、切换为 NIC 模式
非常简单,官方提供了几种模式,其中最方便的是在 ARM 的 UEFI BIOS 里面修改。
- Select “Device Manager”.
- Select “System Configuration”.
- Select “BlueField Modes”.
- Set the “NIC Mode” field to
NicMode
to enable NIC mode.



上面的貌似不起作用,用下面这个试试。
1
2
// 启用 NIC 模式
mlxconfig -d mt41686_pciconf0 set INTERNAL_CPU_PAGE_SUPPLIER=1 INTERNAL_CPU_ESWITCH_MANAGER=1 INTERNAL_CPU_IB_VPORT0=1 INTERNAL_CPU_OFFLOAD_ENGINE=1

重启之后,网卡显示未插入网线,应该是正常了?我没有条件测试,就这样了,给物主发回去了。
所有的资源都在这里,网盘链接失效了的话就从我NAS里面慢慢拖吧,另外官网里面都有下载地址,随便找找就有了。
链接:https://pan.baidu.com/s/1UV7XDu6N3P9oROhStSS8hw?pwd=2333
提取码:2333用户名:bf
密码:bf12345