InTAR: Inter-Task Auto-Reconfigurable Accelerator Design

We propose a novel accelerator design paradigm on FPGAs: inter-task auto-reconfigurable accelerator (InTAR). InTAR can switch execution patterns automatically based on on-chip memory and computation resources. When a task produces large intermediate data, InTAR pipelines multiple tasks to avoid accessing off-chip memory for these data. Otherwise, InTAR will process tasks sequentially to maximize compute efficiency by eliminating pipeline stalls. Compared with other reconfigurable accelerators, InTAR allows model-specific circuit optimization that keeps only necessary control logic and interconnects. Hence, InTAR requires fewer reconfiguration resources, achieves a high clock frequency, and has a low reconfiguration overhead (10 to 20 ns). Since computations are reconfigured at the task level, InTAR is one of the first works regarding FPGA-based reconfigurable accelerators that support high-level hardware generation tools such as High-Level Synthesis (HLS) for fast accelerator design.

Preprint: https://arxiv.org/abs/2502.08807

Project Structure

/benchmark: contains multi-task DNN kernels that are HDV
/gpt-2-medium: HLS code and bitstreams for hardware emulation and on-board execution of GPT-2 Medium model with InTAR

Dependencies

Software

Vitis/Vivado 2024.1+ for VPK180
Vitis/Vivado 2021.2+ for U280
TAPA: https://github.com/rapidstream-org/rapidstream-tapa
- For the older version (TAPA 2024), Autobridge/RapidStream is integrated. Please install from the source or use apptainer on the VAST Lab cluster.
- For the main version, TAPA is only used for HLS code generation, and you need to use RapidStream for floorplanning. It can still be used to compile the host code.
RapidStream: https://docs.rapidstream-da.com/

Tip

On the VASTlab cluster, you can execute the following to enable the TAPA old version in a container

ml load tapa apptainer
apptainer instance start $(which tapa.sif) tapa-2024 # now the container has the name tapa-2024
apptainer shell instance://tapa-2024

Hardware Platform

Alveo U280: xilinx_u280_xdma_201920_3 (dev, deploy)
Versal VPK180: Custom Platform (Please follow this documentation)

Note

You can still use the most recent U280 platform, but you have to use the most updated TAPA with Rapidstream to regenerate the bitstream for your own FPGA board. It may be easier to flash the older platform with the given link.

Multi-Task Kernel Testbench

Run make <kernel-name>-intrra and ./<kernel-name>-intrra --bitstreams <bitstream-path> to perform on-board execution on Alveo U280. Similar for sequential and dataflow kernels in the corresponding folders.

GPT-2 Medium

U280: make opt350 and ./opt350 <sequence-length> --bitstream bitstreams/opt_kernel_latest.xclbin to execute an older version of kernel (less optimized, lower frequency). make opt350-ultrascale and ./opt350 <sequence-length> --bitstream bitstreams/opt_kernel_xilinx_u280_full.xclbin for an optimized version of kernel.
VPK180: Follow this documentation to generate a custom platform, run hardware emulation using QEMU, and perform bitstream generation.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
benchmark		benchmark
figures		figures
gpt-2-medium		gpt-2-medium
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InTAR: Inter-Task Auto-Reconfigurable Accelerator Design

Project Structure

Dependencies

Software

Hardware Platform

Multi-Task Kernel Testbench

GPT-2 Medium

About

Releases 2

Packages

Contributors 3

Languages

OswaldHe/InTAR

Folders and files

Latest commit

History

Repository files navigation

InTAR: Inter-Task Auto-Reconfigurable Accelerator Design

Project Structure

Dependencies

Software

Hardware Platform

Multi-Task Kernel Testbench

GPT-2 Medium

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages