Manage the PCIe module temperature

Coral products that integrate the Edge TPU over PCIe must be operated using the Coral PCIe driver. This driver handles all device communications, but it also allows you to respond to the Edge TPU temperature and configure dynamic frequency scaling (DFS) thresholds. This page describes how you can use these features to maintain an optimal operating temperature with a PCIe-based Edge TPU.

This document applies to only the following products:

Note: To install the Coral PCIe driver, see the "get started" guide for your product (follow the above links).

PCIe parameters overview

The PCIe products listed above do not include a thermal solution to dissipate heat from the system. So in order to sustain maximum performance from the Edge TPU and avoid permanent damage, you must design your system so the Edge TPU always operates below the maximum operating temperature specified in the product datasheet.

To help you do so, the Coral PCIe driver includes some programmable parameters that help you manage the Edge TPU temperature in the following ways:

  • Read the Edge TPU temperature and then, if necessary, activate a cooling solution (such as a fan) or load-balance your work across other Edge TPUs in the system.
  • Use dynamic frequency scaling (DFS)—also known as throttling—to incrementally reduce the Edge TPU operating frequency as it heats up.
  • Shut down the Edge TPU when it reaches a critical temperature (highly recommended).

To employ any combination of these strategies, you need to read or write the Coral PCIe driver parameters defined in the following tables.

Exactly how you can read and write these parameters depends on your operating system, and is explained in the following sections (see the instructions for Linux and for Windows).

Read the Edge TPU temperature

You can periodically read the Edge TPU temperature using the temp parameter, and then respond with your own strategies to cool the system or load-balance your work.

Table 1. Read-only temperature parameter
Parameter Description Units
temp The current Edge TPU junction temperature.

On Linux, this is available via device-specific sysfs nodes only (not from the kernel module).

On Windows, this is available via performance counters only (not from the Windows Registry).

Millidegree Celsius

Use dynamic frequency scaling

By default, the Coral PCIe driver runs the Edge TPU at the maximum frequency of 500 MHz. Under some circumstances, extended operation at this frequency can cause overheating. So the PCIe driver includes a power throttling mechanism (known as dynamic frequency scaling, or DFS) that's enabled by default. This system periodically checks the Edge TPU temperature, and as it reaches the "trip points" specified by parameters in table 2, it reduces the Edge TPU operating frequency in 50-percent increments.

By reducing the operating frequency, the Edge TPU's inferencing speed becomes slower, but it also consumes less power and hopefully avoids reaching higher temperatures at which the Edge TPU may shut down or become permanently damaged.

As long as the chip does not shut down and the Edge TPU returns to lower temperatures, the DFS system restores the operating frequency in the reverse manner—ultimately returning to the maximum operating frequency.

Table 2. Parameters to configure dynamic frequency scaling
Parameter Description Default value Units
trip_point0_temp If the Edge TPU temperature reaches or exceeds this value, the system sets the operating frequency to "reduced" (250 MHz) 85000 Millidegree Celsius
trip_point1_temp If the Edge TPU temperature reaches or exceeds this value, the system sets the operating frequency to "low" (125 MHz) 90000 Millidegree Celsius
trip_point2_temp If the Edge TPU temperature reaches or exceeds this value, the system sets the operating frequency to "lowest" (62.5 MHz) 95000 Millidegree Celsius
temp_poll_interval The interval at which to read the temperature. Setting this to 0 disables DFS completely.

This should be several seconds because the temperature reading doesn't change instantly. Yet, it also doesn't need to be much larger than the default because the overhead of switching the operating frequency is negligible, so it isn't necessary to implement hysteresis around the trip points.

5000 Milliseconds

Whatever values you set for the trip_point* parameters, they must evaluate as follows:

trip_point0_temp <= trip_point1_temp <= trip_point2_temp

If you set values that don't match this logic, the driver silently reverts to the default values in table 2.

Note: You cannot manually specify the Edge TPU operating frequency. The Coral PCIe driver always runs the Edge TPU at the maximum frequency (500 MHz), except when it's reduced by DFS, as described above.

Configure the shutdown/warning temperatures

The parameters in table 3 have different behaviors depending on whether you're using the Accelerator Module (the solderable module) or one of the PCIe card modules (such as the Mini PCIe Accelerator or an M.2 Accelerator):

  • Accelerator Module: You can specify temperatures at which certain pins assert to warn you that the Edge TPU has reached that temperature. You can respond in whatever way suits your system, such as enabling a fan or shutting down the module.
  • PCIe card modules: You can specify the temperature at which the Edge TPU will shut down. You will not receive any warnings. If you want to manually respond to temperature changes, you can instead poll the temp parameter in table 1.
Table 3. Parameters to shut down the Edge TPU
Parameter Description Default value Units
PCIe card modules Accelerator Module
hw_temp_warn1 Not available. If the Edge TPU reaches or exceeds this temperature, the Edge TPU asserts the INTR line. 100000 Millidegree Celsius
hw_temp_warn1_en Not available. Enables/disables hw_temp_warn1. 1 Boolean:
1 = enabled
0 = disabled
hw_temp_warn2 If the Edge TPU reaches or exceeds this temperature, the Edge TPU shuts down.1

When the Edge TPU shuts down, it enters an idle state. Generally, you must then restart your system to resume work with the Edge TPU.

If the Edge TPU reaches or exceeds this temperature, the Edge TPU asserts the SD_ALARM line. It's your responsibility to shut down the Accelerator Module. 100000 Millidegree Celsius
hw_temp_warn2_en Enables/disables hw_temp_warn2. 1 Boolean:
1 = enabled
0 = disabled

1 This parameter is saved to a register in the Edge TPU (as are all parameters) and the shutdown mechanism is fully contained inside the PCIe card module. So even if the host system fails, the Edge TPU will safely shut down if it reaches this temperature.

Notice: The default values for the temperature warnings are conservative. You should change them based on your hardware's thermal properties. Just be sure the Edge TPU junction temperature never exceeds the maximum rating indicated in the product datasheet.
Warning: We strongly recommend that you use hw_temp_warn2 to shut down the Edge TPU before it exceeds the maximum operating temperature specified in the product datasheet. Failure to do so can result in permanent damage to the Edge TPU and surrounding components, and can possibly cause fire and other serious damage, injury, or death.

Using the parameters on Linux

On Linux, you can access the Coral PCIe driver parameters with files that are accessible as either kernel module parameters or sysfs nodes:

  • The kernel module parameters are located in this path:
    /sys/module/apex/parameters/

    These parameters are persistent and applied at boot time. This is useful if you have multiple modules for which you want to apply the same settings. For details about how to edit these, see how to specify kernel module parameters.

  • The sysfs nodes for each module are located at paths such as this:
    /sys/class/apex/apex_0

    These sysfs nodes are created by the PCIe driver at boot time and allow you to set different settings for different PCIe modules. The file name includes a unique number for each Edge TPU connected via PCIe (such as apex_0, apex_1, apex_2, and so on).

Note: All kernel module parameters apply at system boot time and apply the same setting to all Edge TPUs, whereas the individual sysfs nodes take immediate effect and apply to separate Edge TPUs.

Whether you decide to use the kernel module parameters or the individual sysfs node parameters, the files that specify each PCIe parameter are named the same as shown in tables 1, 2, and 3 (although the parameter to read the temperature is available only as a sysfs node).

Using the parameters on Windows

On Windows 10, you can access the Coral PCIe driver parameters using the Windows Registry as follows:

  1. Launch Registry Editor (type "regedit" from the Run window; you must be admin).

  2. Open the following path:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\coral\Parameters

    You should see the PCIe parameters as registry keys, as shown in figure 1.

    Figure 1. Coral PCIe parameters in Registry Editor
  3. Double-click to edit any of the parameters.

  4. Reboot your system to apply any changes.

Note: Each PCIe parameter available in the Windows Registry applies the same setting to all Edge TPUs—you cannot set different parameters for separate Edge TPU on Windows.

However, notice that the temp parameter is not available in the Windows Registry, because this parameter changes over time and is read-only. Instead, you can see the current temperature with the Windows Performance Monitor as follows:

  1. Launch Performance Monitor (type "perfmon" in the Run window).

  2. Select Performance Monitor in the left pane, and click Add in the toolbar.

  3. In the Add Counters dialog, select the Coral PCIe Accelerator counter, select which instances you want to view, and then click OK.

    Figure 2. The Add Counters dialog

    The activity chart then shows the Edge TPU temperature over time in degrees Celcius. But notice that the actual value below the chart is in millidegree Celsius (as indicated in table 1).

    Figure 3. The temperature for multiple Edge TPUs in Performance Monitor

You can also get the Edge TPU temperature with the following PowerShell command:

Get-Counter -Counter '\Coral PCIE Accelerator(*)\Temperature'

Or, you can write your own tool to consume performance counter data.