Shaolin Xie | publications

2020

A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix–Matrix Multiplication Accelerator Park, Dong-Hyeon, Pal, Subhankar, Feng, Siying, Gao, Paul, Tan, Jielun, Rovinski, Austin, Xie, Shaolin, Zhao, Chun, Amarnath, Aporva, Wesley, Timothy, Beaumont, Jonathan, Chen, Kuan-Yu, Chakrabarti, Chaitali, Taylor, Michael Bedford, Mudge, Trevor, Blaauw, David, Kim, Hun-Seok, and Dreslinski, Ronald G. IEEE Journal of Solid-State Circuits 2020

2019

Evaluating Celerity: A 16-nm 695 Giga-RISC-V Instructions/s Manycore Processor With Synthesizable PLL Rovinski, Austin, Zhao, Chun, Al-Hawaj, Khalid, Gao, Paul, Xie, Shaolin, Torng, Christopher, Davidson, Scott, Amarnath, Aporva, Vega, Luis, Veluri, Bandhav, Rao, Anuj, Ajayi, Tutu, Puscar, Julian, Dai, Steve, Zhao, Ritchie, Richmond, Dustin, Zhang, Zhiru, Galton, Ian, Batten, Christopher, Taylor, Michael B., and Dreslinski, Ronald G. IEEE Solid-State Circuits Letters 2019
A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS Rovinski, Austin, Zhao, Chun, Al-Hawaj, Khalid, Gao, Paul, Xie, Shaolin, Torng, Christopher, Davidson, Scott, Amarnath, Aporva, Vega, Luis, Veluri, Bandhav, Rao, Anuj, Ajayi, Tutu, Puscar, Julian, Dai, Steve, Zhao, Ritchie, Richmond, Dustin, Zhang, Zhiru, Galton, Ian, Batten, Christopher, Taylor, Michael B, and Dreslinski, Ronald G In 2019 Symposium on VLSI Circuits 2019
A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm Pal, Subhankar, Park, Dong-hyeon, Feng, Siying, Gao, Paul, Tan, Jielun, Rovinski, Austin, Xie, Shaolin, Zhao, Chun, Amarnath, Aporva, Wesley, Timothy, Beaumont, Jonathan, Chen, Kuan-Yu, Chakrabarti, Chaitali, Taylor, Michael, Mudge, Trevor, Blaauw, David, Kim, Hun-Seok, and Dreslinski, Ronald In 2019 Symposium on VLSI Technology 2019

2018

Progress in a novel architecture for high performance processing Zhang, Zhiwei, Liu, Meng, Liu, Zijun, Du, Xueliang, Xie, Shaolin, Ma, Hong, Ding, Guangxin, Ren, Weili, Zhou, Fabiao, Sun, Wenqin, and others, Japanese Journal of Applied Physics 2018
Fast and efficient deep sparse multi-strength spiking neural networks with dynamic pruning Chen, Ruizhi, Ma, Hong, Xie, Shaolin, Guo, Peng, Li, Pin, and Wang, Donglin In 2018 International Joint Conference on Neural Networks (IJCNN) 2018
FBNA: A Fully Binarized Neural Network Accelerator Guo, Peng, Ma, Hong, Chen, Ruizhi, Li, Pin, Xie, Shaolin, and Wang, Donglin In 2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018
The BaseJump Manycore Accelerator Network Xie, Shaolin, and Taylor, Michael Bedford arXiv preprint arXiv:1808.00650 2018
Low Latency Spiking ConvNets with Restricted Output Training and False Spike Inhibition Chen, Ruizhi, Ma, Hong, Guo, Peng, Xie, Shaolin, Li, Pin, and Wang, Donglin In 2018 International Joint Conference on Neural Networks (IJCNN) 2018
Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds Xie, Shaolin, Scott, Davidson, Ikuo, Magaki, Moein, Khazraee, Luis, Vega, Lu, Zhang, and Michael, B. Taylor ACM SIGOPS Operating Systems Review 2018
Parallel Polar Encoding in 5G Communication Guo, Yang, Xie, Shaolin, Liu, Zijun, Yang, Lei, and Wang, Donglin In 2018 IEEE Symposium on Computers and Communications (ISCC) 2018
Parallel filtering method and corresponding apparatus Wang, Donglin, Yin, Leizu, Yang, Yongyong, Xie, Shaolin, and Wang, Tao 2018
The Celerity open-source 511-core RISC-V tiered accelerator fabric: Fast architectures and design methodologies for fast chips Davidson, Scott, Xie, Shaolin, Torng, Christopher, Al-Hawai, Khalid, Rovinski, Austin, Ajayi, Tutu, Vega, Luis, Zhao, Chun, Zhao, Ritchie, Dai, Steve, and others, IEEE Micro 2018

2017

Celerity: An open source RISC-V tiered accelerator fabric Ajayi, Tutu, Al-Hawaj, Khalid, Amarnath, Aporva, Dai, Steve, Davidson, Scott, Gao, Paul, Liu, Gai, Lotfi, Atieh, Puscar, Julian, Rao, Anuj, and others, In Hot Chips: A Symposium on High Performance Chips 2017
Experiences Using the RISC-V Ecosystem to Design an Accelerator-Centric SoC in TSMC 16nm Ajayi, Tutu, Al-Hawaj, Khalid, Amarnath, Aporva, Dai, Steve, Davidson, Scott, Gao, Paul, Liu, Atieh, Puscar, Julian, Rao, Anuj, and others, 2017
A self-indexed register file for efficient arithmetical computing hardware Yang, Lei, Xie, Shaolin, Liu, Zijun, Du, Xueliang, and Wang, Donglin In 2017 9th Computer Science and Electronic Engineering (CEEC) 2017
A reconfigurable ASIC-like image polyphase interpolation implementation method Yang, Lei, Guo, Ruoshan, Xie, Shaolin, and Wang, Donglin In 2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC) 2017

2016

MaPU: A novel mathematical computing architecture Wang, Donglin, Xie, Shaolin, and others, In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2016
Methods and devices for multi-granularity parallel FFT butterfly computation Wang, Donglin, Wang, Tao, Xie, Shaolin, Hao, Jie, and Yin, Leizu 2016
Data access method and device for parallel FFT computation Xie, Shaolin, Wang, Donglin, Lin, Xiao, Hao, Jie, Xue, Xiaojun, Wang, Tao, and Yin, Leizu 2016
Parallel bit reversal devices and methods Xie, Shaolin, Wang, Donglin, Hao, Jie, Wang, Tao, and Yin, Leizu 2016

2015

Multi-granularity parallel storage system Wang, Donglin, Liu, Zijun, Xue, Xiaojun, Zhang, Xing, Zhang, Zhiwei, and Xie, Shaolin 2015
Multi-granularity parallel storage system and storage Wang, Donglin, Xie, Shaolin, Xue, Xiaojun, Liu, Zijun, and Zhang, Zhiwei 2015