Explicit Scaling on Multi

您所在的位置：网站首页 › guid_partition_scheme › Explicit Scaling on Multi

Explicit Scaling on Multi

2023-04-04 04:12| 来源: 网络整理| 查看: 265

The section we describe the SYCL explicit scaling language API and provide usage examples on Arctic Sound-based platform for multi-GPU and multi-stack execution.

Device discovery

Before you run an application, it is recommended to run the sycl-ls command to find out which devices are available on this platform, especially when the run is for a performance measure. This ensures the run is not taking a fallback path.

Root-device

Intel GPUs are represented as SYCL GPU devices, and as root-devices. The discovery of root-devices is best with the sycl-ls tool, for example:

$ sycl-ls [opencl:0] GPU : Intel(R) OpenCL HD Graphics 3.0 [21.35.020776] [level_zero:0] GPU : Intel(R) Level-Zero 1.1 [1.1.20776] [host:0] HOST: SYCL host platform 1.2 [1.2]

Note that sycl-ls shows devices from all platforms of all SYCL backends that are seen by SYCL runtime. Thus in the example above, there are GPUs corresponding to the single physical GPU (managed by either OpenCL or Level-Zero backend). root-devices.

One can use the environment variable ONEAPI_DEVICE_SELECTOR described in EnvironmentVariables.md

$ ONEAPI_DEVICE_SELECTOR=level_zero:* sycl-ls [level_zero:0] GPU : Intel(R) Level-Zero 1.1 [1.1.20776]

If there are multiple GPUs in a system then they will be seen as multiple different root-devices. On Linux these would be multiple SYCL root-devices of the same SYCL platform (representing Level-Zero driver). On Windows* these would appear as root-devices of multiple different SYCL platforms (Level-Zero drivers).

CreateMultipleRootDevices=N NEOReadDebugKeys=1 environment variables can be used to emulate multiple GPU cards. For example:

$ CreateMultipleRootDevices=2 NEOReadDebugKeys=1 \ SYCL_DEVICE_FILTER=level_zero sycl-ls [level_zero:0] GPU : Intel(R) Level-Zero 1.1 [1.1.20776] [level_zero:1] GPU : Intel(R) Level-Zero 1.1 [1.1.20776] Sub-device

Intel® Data Center GPU Max 1350 or 1550 has 2 stacks. The root-device, corresponding to the whole GPU, can be partitioned to 2 sub-devices, each sub-device corresponding to a physical stack.

try { vector SubDevices = RootDevice.create_sub_devices< cl::sycl::info::partition_property::partition_by_affinity_domain>( cl::sycl::info::partition_affinity_domain::numa); }

Each call to create_sub_devices will return exactly the same sub-devices and in the persistent order. To control what sub-devices are exposed by Level-Zero UMD one can use ZE_AFFINITY_MASK environment variable. Note that the partition_by_affinity_domain is the only partitioning supported for Intel GPUs.

Similarly, next_partitionable and numa are the only partitioning properties supported (both doing the same thing). CreateMultipleRootDevices=N NEOReadDebugKeys=1 environment variables can be used to emulate multiple stacks of a GPU.

Sub-sub-device

Each sub-device (stack) can be further decomposed to a set of sub-sub-devices (Compute Slice). One can create a context associating with a sub-sub-device. In this scheme, the execution resource will be limited to the sub-sub-device, giving the program fine-grained control at compute slice level. The following code finds all sub-devices and sub-sub-devices of a device:

//============================================================== // Copyright © 2022 Intel Corporation // // SPDX-License-Identifier: MIT // ============================================================= #include #include namespace sycl; int main() { sycl::device d(sycl::gpu_selector{}); std::vector *subdevices = new std::vector(); std::vector *CCS = new std::vector(); auto part_prop = d.get_info(); size_t num_of_tiles; size_t num_of_ccs; if (part_prop.empty()) { num_of_tiles = 1; } else { for (int i = 0; i < part_prop.size(); i++) { if (part_prop[i] == sycl::info::partition_property::partition_by_affinity_domain) { auto sub_devices = d.create_sub_devices< sycl::info::partition_property::partition_by_affinity_domain>( sycl::info::partition_affinity_domain::numa); num_of_tiles = sub_devices.size(); for (int j = 0; j < num_of_tiles; j++) subdevices->push_back(sub_devices[j]); break; } else { num_of_tiles = 1; } } } std::cout

【本文地址】

Explicit Scaling on Multi

Explicit Scaling on Multi

今日新闻

推荐新闻