Data Compression and Vectorization of Matrix Multiplication on HXDSP

Open Access

Issue		MATEC Web Conf. Volume 173, 2018 2018 International Conference on Smart Materials, Intelligent Manufacturing and Automation (SMIMA 2018)


Article Number		03008
Number of page(s)		5
Section		Digital Signal and Image Processing
DOI		https://doi.org/10.1051/matecconf/201817303008
Published online		19 June 2018

Nath R, Tomov S, and Dongarra J, et al.. An improved MAGMA GEMM for fermi graphics processing units. International Journal of High Performance Computing Applications, 2010, 24(4): 511-515. [CrossRef] [Google Scholar]
Tan G, Li L, and Triechle S, et al.. Fast implementation of DGEMM on Fermi GPU[C]. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2011: 35. [Google Scholar]
Liu, Gang, et al. Optimization of DGEMM Function for Loongson3B1500 Architecture. Journal of Chinese Computer Systems. 35.7(2014):1523-1527. [Google Scholar]
Jaiswal, Manish Kumar, and N. Chandrachoodan. FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture. IEEE Transactions on Computers. 61.1(2011):60-72. [CrossRef] [Google Scholar]
Michailidis P D, and Margaritis K G. Implementing parallel LU factorization with pipelining on a multicore using OpenMP[C]. Computational Science and Engi-neering(CSE),2010 IEEE 13th International Conference on. IEEE, 2010: 253-260. [CrossRef] [Google Scholar]
Venetis I E, and Gao G R. Mapping the LU decomposition on a many-core architecture: challenges and solutions[C]. Proceedings of the 6th ACM Conference on Computing Frontiers. ACM, 2009: 71-80. [CrossRef] [Google Scholar]
TANG Yun. Study and implementation on distributed large scale matrix computation algorithms with Spark[D]. Nanjing University,2016. [Google Scholar]
YANG Fei, and MA Yuchun, and HOU Jin, et al.. Research on acceleratin of matrix multiplication based on parallel scheduling on MPSoC. Computer Science, 2017, 44(8):36-41. [Google Scholar]
LONG Zhuoqun, WANG Xiaoyu, and WANG Changming. Epiphany-OpenCL large matrix multiplication parallel computation method based on DCT predictive coding. Automation & Instrucmentation, 32.7(2017):16-21. [Google Scholar]
SHEN Junzhong, Xiao Tao, and QIAO Yuran, et al.. A matrix multiplicationaccelerator design for optimization blocking strategy. Computer Engineering & Science, 38.9(2016):1748-1754. [Google Scholar]
SHEN Junzhong, Xiao Tao, and QIAO Yuran, et al.. A matrix multiplicationaccelerator design for optimization blocking strategy. Computer Engineering & Science, 2016, 38(9):1748-1754. [Google Scholar]
WEI Shuai. Research on vectorization algorithm and reorganization technology for SIMD[D]. PLA Information Engineering University, 2012. [Google Scholar]
ZHANG Kai. High efficient matrix operations on vector-SIMD DSPs[D]. National University of Defence Technology,2013. [Google Scholar]
Zhu, H., et al. Optimization of matrix multiplication based on a multi-core architecture extended with vector units. Journal of University of Science & Technology of China 41.2(2011):173-182. [Google Scholar]
Wang Jie. The implementation of a high performance vector processor [D]. TianJin University,2016. [Google Scholar]
LIU Zhong, TIAN Xi. Vectorization of matrix multiplication for multi-core vector processors*, 2017, Vol.40, Online Publishing No. 94. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.