Current Project(s)
Characterizing the Efficiency vs Accuracy Trade-off for Long-Context NLP Models
With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs efficiency trade-off on two widely used long-sequence models – Longformer-Encoder-Decoder [1] and Big Bird [2] – during fine-tuning and inference on four datasets from the SCROLLS benchmark [3]. To study how this trade-off differs across input sequence lengths and model sizes, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget.
Published results from this project can be found on arXiv.
Past Projects
Hardware Accelerator Composer
Hardware accelerators are increasingly employed to perform advanced calculations for technical applications ranging from audio analysis to artificial intelligence. To simplify the design and the acceleration of parallelizable algorithms, the composer framework was created as an intuitive and powerful alternative to existing accelerator development frameworks that targets FPGAs. This study aims to accelerate the frequently-used General Matrix Multiply (GEMM) workload using the composer framework in order to highlight its ability to create designs that outperform CPU and comparable HLS-generated designs that target the same FPGAs. However, the performance gains can come at the cost of considerable efforts on the user’s part. In an effort to expedite and guide the design process on the composer framework, an analytical model which can accurately predict the end-to-end performance of accelerator designs that targets Amazon’s AWS EC2 f1.2xlarge instance was created. The accuracy of the analytical model allows users to estimate the minimum input sizes needed for their accelerator designs to surpass a CPU baseline as well as predict the performance increase obtained with each implemented optimization.
Elasticity of Parallel Work-Time Tasks
The tranditional real-time task model often assumes static task characterisitics in order to provide gurantees to the schedulability of a certain taskset. However, the elastic scheduling model for sequential real-time tasks developed by Buttazzo et al. [1] shows that task can dynamically change their periods during runtime in order overcome overload conditions or user inputs. This project aims to extend Buttazzo’s elastic task model by allowing parallel real-time tasks to dynamically change their periods (period elasticity) or computational loads (computational elasticity) while ensuring the entire taskset remains schedulable. Additionally, the equivalence of computational and period elasticity is also explored.
Published results from this project can be found here.
3D Dual-Particle Tracking
Nanoscopic optical rulers can be used to determine the separation distance of binding targets, binding kinetics, conformational change, or even the orientation of a targeting construct. However, the length scales of FRET and plasmonic resonance energy transfer are limited to the maximum of 10 nm and 80 nm, respectively. To further extend the accessible distance, we propose a new a dual-particle tracking technique that can simultaneously localize two spectrally distinct targets in three dimensions with a separation distance up to 400 nm with 20 nm accuracy. This technique will be applied to observe DNA flexing dynamics in free solution with different lengths of DNA constructs.
Published results from this project can be found here.