一文轻松使能Linux PCIe的ASPM

1. ASPM概述

PCIe总线的电源管理包含ASPM(Active State Power Management)和软件电源管理两方面内容。所谓的ASPM是指PCIe链路在没有系统软件参与的情况下,由PCIe链路自发进行的电源管理方式。如下是PCIe的ASPM的状态机,其L1是强制性的规定,而L0s是可选的。

2. Debugging

2.1 如何查看ASPM的状态

对于Linux系统来说,可以使用“lspci -vvv”指令来查看ASPM的状态。

2.1.1 When ASPM is enabled

如下是一个PCIe ASPM使能的示例,请参考:

05:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

Subsystem: Atheros Communications Inc. Device 3099

Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-

Latency: 0, Cache Line Size: 64 bytes

Interrupt: pin A routed to IRQ 19

Region 0: Memory at dbdf0000 (64-bit, non-prefetchable) [size=64K]

Capabilities: [40] Power Management version 2

Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-

Address: 00000000 Data: 0000

Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00

DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-

DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-

MaxPayload 128 bytes, MaxReadReq 512 bytes

DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-

LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <512ns, L1 <64us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Capabilities: [90] MSI-X: Enable- Count=1 Masked-

Vector table: BAR=0 offset=00000000

PBA: BAR=0 offset=00000000

Capabilities: [100] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-

Capabilities: [140] Virtual Channel

Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00

Kernel driver in use: ath9k

Kernel modules: ath9k

2.1.2 When ASPM is disabled

如下是一个PCIe ASPM没有使能的示例,请参考:

localhost ~ # lspci -vvvv -s 03:00

03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

Subsystem: Atheros Communications Inc. Device 309a

Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-

Latency: 0, Cache Line Size: 64 bytes

Interrupt: pin A routed to IRQ 17

Region 0: Memory at f0100000 (64-bit, non-prefetchable) [size=64K]

Capabilities: [40] Power Management version 2

Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-

Address: 00000000 Data: 0000

Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00

DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-

DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-

MaxPayload 128 bytes, MaxReadReq 512 bytes

DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-

LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <512ns, L1 <64us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Capabilities: [90] MSI-X: Enable- Count=1 Masked-

Vector table: BAR=0 offset=00000000

PBA: BAR=0 offset=00000000

Capabilities: [100] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-

Capabilities: [140] Virtual Channel

Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00

Kernel driver in use: ath9k

Kernel modules: ath9k

2.1.3 为什么我的设备ASPM没有使能?

ASPM应该由RC和所有EP自动协商。如果你的设备查询之后,发现禁用了ASPM,可能有以下原因:

BIOS没有使能ASPM的相关设置L0s是可选的,可能你得设备只支持L1BIOS可能发生了相关未知的问题ASPM不仅需要EP的支持,也需要RC的支持

2.2 如何使能ASPM

当前我们大多数的Wi-Fi芯片所使用的接口都是PCIe,而且大多数都是手持设备,所以这时候对于设备的功耗就有很大的要求。所以也就需要支持ASPM的L0s状态,以达到最佳的功耗状态。

2.2.1 如何在Kernel中使能ASPM

操作系统一般不干涉ASPM,但是我们可以通过Kernel来调试PCIe RC/EP的ASPM设置。所以Kernel一般需要使能CONFIG_PCIEASPM配置,以此达到能够调试的目的:

config PCIEASPM

bool "PCI Express ASPM support(Experimental)"

depends on PCI && EXPERIMENTAL && PCIEPORTBUS

default n

help

This enables PCI Express ASPM (Active State Power Management) and

Clock Power Management. ASPM supports state L0/L0s/L1.

When in doubt, say N.

2.2.1.1 强制ASPM的状态

也可以通过boot的参数,强制enable/disable ASPM:

pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power

Management.

off Disable ASPM.

force Enable ASPM even on devices that claim not to support it.

WARNING: Forcing ASPM on may cause system lockups.

2.2.2 使用enable_aspm使能ASPM

enable_aspm是一个脚本,可以用于启用ASPM。具体你可以阅读如下链接:

http://drvbp1.linux-foundation.org/~mcgrof/scripts/enable-aspm

只需要修改如下三个参数:

ROOT_COMPLEX="00:1c.1"

ENDPOINT="03:00.0"

# We'll only enable the last 2 bits by using a mask

# of :3 to setpci, this will ensure we keep the existing

# values on the byte.

#

# Hex Binary Meaning

# -------------------------

# 0 0b00 L0 only

# 1 0b01 L0s only

# 2 0b10 L1 only

# 3 0b11 L1 and L0s

ASPM_SETTING=3

2.2.3 使用setpci使能ASPM

"PCIE Link Control Register"可以通过"lspci -vvv"进行读取,所以可以通过"setpci"工具修改相关寄存器,来使能PCIe ASPM。

2.2.3.1 如何读取"Link Control Register"?

如下是"Link Control Register"中关于ASPM的状态展示:

0b00 = L0 only

0b01 = L0s only

0b10 = L1 only

0b11 = L1 and L0s

2.2.3.2 如何找到"Link Control Register"?

首先查找你想要的设备,如下所示:

user@tux ~ $ lspci | grep -i atheros

03:00.0 Network controller: Atheros Communications Inc. Device 0030 (rev 01)

03:00.0是总线地址。现在,使用"lspci -t"检查该设备位于哪个RC上。

-[0000:00]-+-00.0

+-02.0

+-02.1

+-03.0

+-03.2

+-03.3

+-19.0

+-1a.0

+-1a.1

+-1a.7

+-1b.0

+-1c.0-[0000:02]--

+-1c.1-[0000:03]----00.0

+-1c.2-[0000:04]--

+-1c.3-[0000:05-0c]--

+-1c.4-[0000:0d-14]--

+-1d.0

+-1d.1

+-1d.2

+-1d.7

+-1e.0-[0000:15-18]--+-00.0

| \-00.1

+-1f.0

+-1f.1

+-1f.2

\-1f.3

在这种情况下,我们看到03:00.0位于00:1c.1上,你可以执行"lspci -s 00:1c.1 -xxx",以获取该设备的PCI配置空间。PCIe规范有一个有趣的小算法,可以从PCI配置空间中找到链路控制寄存器。逻辑如下:

Read 0x34 and read the register that points to

If that value is not 0x10 then read the next byte (0x35) and go read that register

If that register is not 0x10 then read the next byte and go read that register

Repeat this until you find a register that has 0x10

Once you find the register with 0x10 then add 0x10 to the final register you were reading

The Link Control Register is this final register + 0x10 Lets analyze a real world example of a root complex, specifically the one of the root complex above.

user@tux ~ $ sudo lspci -s 00:1c.1 -xxx

00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)

00: 86 80 41 28 07 05 10 00 03 00 04 06 10 00 81 00

10: 00 00 00 00 00 00 00 00 00 03 03 00 30 30 00 00

20: 00 dc 30 df e1 df e1 df 00 00 00 00 00 00 00 00

30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 02 04 00

40: 10 80 41 01 c0 8f 00 00 00 00 10 00 11 2c 11 02

50: 40 00 11 30 e0 a0 18 00 00 00 48 01 00 00 00 00

60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

80: 05 90 01 00 0c 30 e0 fe 69 41 00 00 00 00 00 00

90: 0d a0 00 00 aa 17 ad 20 00 00 00 00 00 00 00 00

a0: 01 00 02 c8 00 00 00 00 00 00 00 00 00 00 00 00

b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

d0: 00 00 00 00 00 00 00 00 80 00 11 08 00 00 00 00

e0: 00 0f c7 00 06 07 08 00 33 00 00 00 00 00 00 00

f0: 00 00 00 00 00 00 00 00 86 0f 05 00 00 00 00 00

首先读取地址0x34,我们看到它是0x40(在这里不要跳到下一个字节)。我们读取0x40,并看到它是0x10。现在我们加上0x40 + 0x10 = 0x50。我们读取0x50。0x50是链路控制寄存器的值。0x50的值是0x40。这意味着只有L0被启用,因此ASPM完全被禁用。要调整此RC的ASPM,我们需要首先保留原始值,然后与我们的新ASPM设置进行OR运算。

注意:事实证明,0x50也用于ICH6、ICH7、ICH8、ICH9的链路控制寄存器。

# Disables ASPM, enables only L0 (this was the existing setting)

sudo setpci -s 00:1c.1 0x50.B=0x40

# Enable L0s only

sudo setpci -s 00:1c.1 0x50.B=0x41

# Enable L1 only

sudo setpci -s 00:1c.1 0x50.B=0x42

# Enable L1 and L0s

sudo setpci -s 00:1c.1 0x50.B=0x43

现在,让我们可以调整你的设备。获取到的设备的PCIe配置空间如下所示:

user@tux ~ $ sudo lspci -s 03:00.0 -xxx

03:00.0 Network controller: Atheros Communications Inc. Device 0030 (rev 01)

00: 8c 16 30 00 03 01 10 40 01 00 80 02 10 00 00 00

10: 04 00 3e df 00 00 00 00 00 00 00 00 00 00 00 00

20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 16 31

30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00

40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00

50: 05 70 84 01 00 00 00 00 00 00 00 00 00 00 00 00

60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

70: 10 00 02 00 00 87 04 05 10 20 0b 00 11 5c 03 00

80: 41 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00

90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00

a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

这个例子稍微复杂一些,所以我们将逐行进行分析:

00: 8c 16 30 00 03 01 10 40 01 00 80 02 10 00 00 00

10: 04 00 3e df 00 00 00 00 00 00 00 00 00 00 00 00

20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 16 31

30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00

^ ^

| |

0x30 0x34

So 0x34 = 0x40. 0x40 is not 0x10 so we go read 0x40 now

40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00

^

|

0x40 = 0x01, this is not 0x10 so read the next byte

40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00

^

|

0x41 = 0x50, so go read that register next

50: 05 70 84 01 00 00 00 00 00 00 00 00 00 00 00 00

^

|

0x50 = 0x05, this is not 0x10, so go read the next byte.

The next byte 0x51 = 0x70 so we go read that register next.

70: 10 00 02 00 00 87 04 05 10 20 0b 00 11 5c 03 00

^

|

At last, 0x70 = 0x10. So now we do 0x70 + 0x10 = 0x80 and go read 0x80.

80: 41 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00

^

|

0x80 = 0x41

0x41 = 0b1000001 so this has ASPM L0s on only.

所以,使用如下指令修改PCIe ASPM的状态:

# Disables ASPM, enables only L0

sudo setpci -s 03:00.0 0x80.B=0x40

# Enable L0s only (this was the existing setting)

sudo setpci -s 03:00.0 0x80.B=0x41

# Enable L1 only

sudo setpci -s 03:00.0 0x80.B=0x42

# Enable L1 and L0s

sudo setpci -s 03:00.0 0x80.B=0x43