03 Block Prunning
- Problems with weight prunning: Irregularity of sparse ,atricies limits the maximum performance and energy efficency achevable on hardware accelerators.
- Block-sparse formats store blocks continiously in memory reducing irregular memory accesses.
- If the maximum magnitude weight of the block is below the current threshold, we set all the weights to zero.
- If the maximum magnitude weight of a block is below the current threshold we set all the weights in that block to zeroz
- For block prunning we need to modify the starting slope to account for the number of elements in a block \((N_b)\)
- Start slope for weight prunning \(\Theta_w=\Theta\)
- Start slope for weight prunning \(\Theta_b=\Theta_w \times \sqrt[4]{N_b}\)
Bank-Ballanced Sparsity (BBS) Split tata to banks and remove 50% of the weights within each bank
![[Pasted image 20221208131112.png]]