Abstract: We propose an efficient quantum subroutine for matrix multiplication that computes a state vector encoding the entries of the product of two matrices in superposition. The subroutine ...
Abstract: Analog computing-in-memory accelerators promise ultra-low-power, on-device AI by reducing data transfer and energy usage. Yet inherent device variations and high energy consumption for ...
This repository contains the CUDA kernels for general matrix-matrix multiplication (GEMM) and the corresponding performance analysis. The correctness of the CUDA kernels is guaranteed for any matrix ...
Sparse general matrix-matrix multiplication (SpGEMM) is fundamental to numerous scientific applications. Traditional hash-based approaches fail to strike a trade-off between reducing hash collisions ...