Skip to content
/ InTAR Public

[FCCM 2025] Code Samples for InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs

Notifications You must be signed in to change notification settings

OswaldHe/InTAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InTAR: Inter-Task Auto-Reconfigurable Accelerator Design

DOI

intar-template

We propose a novel accelerator design paradigm on FPGAs: inter-task auto-reconfigurable accelerator (InTAR). InTAR can switch execution patterns automatically based on on-chip memory and computation resources. When a task produces large intermediate data, InTAR pipelines multiple tasks to avoid accessing off-chip memory for these data. Otherwise, InTAR will process tasks sequentially to maximize compute efficiency by eliminating pipeline stalls. Compared with other reconfigurable accelerators, InTAR allows model-specific circuit optimization that keeps only necessary control logic and interconnects. Hence, InTAR requires fewer reconfiguration resources, achieves a high clock frequency, and has a low reconfiguration overhead (10 to 20 ns). Since computations are reconfigured at the task level, InTAR is one of the first works regarding FPGA-based reconfigurable accelerators that support high-level hardware generation tools such as High-Level Synthesis (HLS) for fast accelerator design.

Preprint: https://arxiv.org/abs/2502.08807

Project Structure

  • /benchmark: contains multi-task DNN kernels that are HDV
  • /gpt-2-medium: HLS code and bitstreams for hardware emulation and on-board execution of GPT-2 Medium model with InTAR

Dependencies

Software

  • Vitis/Vivado 2024.1+ for VPK180
  • Vitis/Vivado 2021.2+ for U280
  • TAPA: https://github.com/rapidstream-org/rapidstream-tapa
    • For the older version (TAPA 2024), Autobridge/RapidStream is integrated. Please install from the source or use apptainer on the VAST Lab cluster.
    • For the main version, TAPA is only used for HLS code generation, and you need to use RapidStream for floorplanning. It can still be used to compile the host code.
  • RapidStream: https://docs.rapidstream-da.com/

Tip

On the VASTlab cluster, you can execute the following to enable the TAPA old version in a container

ml load tapa apptainer
apptainer instance start $(which tapa.sif) tapa-2024 # now the container has the name tapa-2024
apptainer shell instance://tapa-2024

Hardware Platform

Note

You can still use the most recent U280 platform, but you have to use the most updated TAPA with Rapidstream to regenerate the bitstream for your own FPGA board. It may be easier to flash the older platform with the given link.

Multi-Task Kernel Testbench

Run make <kernel-name>-intrra and ./<kernel-name>-intrra --bitstreams <bitstream-path> to perform on-board execution on Alveo U280. Similar for sequential and dataflow kernels in the corresponding folders.

GPT-2 Medium

intar-gpt2

  • U280: make opt350 and ./opt350 <sequence-length> --bitstream bitstreams/opt_kernel_latest.xclbin to execute an older version of kernel (less optimized, lower frequency). make opt350-ultrascale and ./opt350 <sequence-length> --bitstream bitstreams/opt_kernel_xilinx_u280_full.xclbin for an optimized version of kernel.

  • VPK180: Follow this documentation to generate a custom platform, run hardware emulation using QEMU, and perform bitstream generation.

About

[FCCM 2025] Code Samples for InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •