-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: openzfs/zfs
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: hpc/zfs
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: zia
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
Can’t automatically merge.
Don’t worry, you can still create the pull request.
- 1 commit
- 57 files changed
- 1 contributor
Commits on Jan 17, 2025
-
ZFS Interface for Accelerators (Z.I.A.)
The ZIO pipeline has been modified to allow for external, alternative implementations of existing operations to be used. The original ZFS functions remain in the code as fallback in case the external implementation fails. Definitions: Accelerator - an entity (usually hardware) that is intended to accelerate operations Offloader - synonym of accelerator; used interchangeably Data Processing Unit Services Module (DPUSM) - https://github.com/hpc/dpusm - defines a "provider API" for accelerator vendors to set up - defines a "user API" for accelerator consumers to call - maintains list of providers and coordinates interactions between providers and consumers. Provider - a DPUSM wrapper for an accelerator's API Offload - moving data from ZFS/memory to the accelerator Onload - the opposite of offload In order for Z.I.A. to be extensible, it does not directly communicate with a fixed accelerator. Rather, Z.I.A. acquires a handle to a DPUSM, which is then used to acquire handles to providers. Using ZFS with Z.I.A.: 1. Build and start the DPUSM 2. Implement, build, and register a provider with the DPUSM 3. Reconfigure ZFS with '--with-zia=<DPUSM root>' 4. Rebuild and start ZFS 5. Create a zpool 6. Select the provider zpool set zia_provider=<provider name> <zpool> 7. Select operations to offload zpool set zia_<property>=on <zpool> The operations that have been modified are: - compression - non-raw-writes only - decompression - checksum - not handling embedded checksums - checksum compute and checksum error call the same function - raidz - generation - reconstruction - vdev_file - open - write - close - vdev_disk - open - invalidate - write - flush - close Successful operations do not bring data back into memory after they complete, allowing for subsequent offloader operations reuse the data. This results in only one data movement per ZIO at the beginning of a pipeline that is necessary for getting data from ZFS to the accelerator. When errors ocurr and the offloaded data is still accessible, the offloaded data will be onloaded (or dropped if it still matches the in-memory copy) for that ZIO pipeline stage and processed with ZFS. This will cause thrashing if a later operation offloads data. This should not happen often, as constant errors (resulting in data movement) is not expected to be the norm. Unrecoverable errors such as hardware failures will trigger pipeline restarts (if necessary) in order to complete the original ZIO using the software path. The modifications to ZFS can be thought of as two sets of changes: - The ZIO write pipeline - compression, checksum, RAIDZ generation, and write - Each stage starts by offloading data that was not previously offloaded - This allows for ZIOs to be offloaded at any point in the pipeline - Resilver - vdev_raidz_io_done (RAIDZ reconstruction, checksum, and RAIDZ generation), and write - Because the core of resilver is vdev_raidz_io_done, data is only offloaded once at the beginning of vdev_raidz_io_done - Errors cause data to be onloaded, but will not re-offload in subsequent steps within resilver - Write is a separate ZIO pipeline stage, so it will attempt to offload data The zio_decompress function has been modified to allow for offloading but the ZIO read pipeline as a whole has not, so it is not part of the above list. An example provider implementation can be found in module/zia-software-provider - The provider's "hardware" is actually software - data is "offloaded" to memory not owned by ZFS - Calls ZFS functions in order to not reimplement operations - Has kernel module parameters that can be used to trigger ZIA_ACCELERATOR_DOWN states for testing pipeline restarts. abd_t, raidz_row_t, and vdev_t have each been given an additional "void *<prefix>_zia_handle" member. These opaque handles point to data that is located on an offloader. abds are still allocated, but their payloads are expected to diverge from the offloaded copy as operations are run. Encryption and deduplication are disabled for zpools with Z.I.A. operations enabled Aggregation is disabled for offloaded abds RPMs will build with Z.I.A. Signed-off-by: Jason Lee <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f42b511 - Browse repository at this point
Copy the full SHA f42b511View commit details
There are no files selected for viewing