LAMP: Tools for creating application-specific FPGA coprocessors
Field Programmable Gate Arrays (FPGAs) have begun to appear as accelerators for general computation. Their potential for massive parallelism, high on-chip memory bandwidth, and customizable interconnection networks all contribute to demonstrated 100-1000x increases in application performance relative to current PCs. FPGA coprocessors have been available in niche markets for years, and are now appearing in mainstream supercomputers from vendors including Cray and Silicon Graphics. Available development tools do not address developers of computing applications, however. Traditional FPGA design tools meet the gate-level needs of logic designers, but present a computing model that vanishingly few software developers can use. Likewise, logic designers understand logic structures for high computing performance, but rarely know the biology, biochemistry, or other applications that need acceleration. Logic designers and application developers must both participate in creating efficient, useful accelerators, but their different kinds of participation are not supported by current tools. This work presents two major sets of contributions. The first is proof by example that FPGAs give 100-1000x speedups for large families of applications in bioinformatics and computational biology (BCB), including sequence alignment, molecule docking, and string analysis. These demonstrations also provide the beginnings of a library of reusable computing structures. The second set of contributions appear as novel features of accelerator design tools based on Logic Architecture by Model Parameterization (LAMP). The LAMP tools address broad, customizable families of applications, not point solutions to narrow problem statements. LAMP also separates the logic designers, who create efficient hardware computing structures, from the application specialists who tailor the accelerator to specific members of the application family. This separation enables accelerator hardware customization without access to hardware design skills. Finally, LAMP provides mechanisms for automating the tradeoff between complexity and quantity of parallel processing elements (PEs), allowing fewer large PEs or larger numbers of small ones, subject to the the FPGA's resource constraints. This creates a unique ability to allocate the FPGA's computing resources differently for each member of an application family, according to the datatypes and functions specific to that family member. Performance results based on prototype LAMP tools are presented, using sample BCB applications.