Reliability of Accelerators against SDCs
Working on understanding the reliability of SoTA accelerators against silent data corruptions, and exploring the design space of mitigations
Silent Data Corruptions in hardware
Guide:Prof. Mengjia Yan
In collaboration with AMD and Prof. Joel Emer
We have studied the effects of SDCs on CNN accelerators by analyzing faulty model weights and accuracy. We are also analyzing various fault models for systolic array-based LLM attention accelerators and studying the performance-hardware tradeoffs of various mitigation techniques. Finally, we also developing an in-house simulator to model the effects of hardware faults in LLM attention.