1. INTRODUCTION
Two popular neural network compression methods are filter pruning and low rank approximation. Pruning typically starts with a filter importance score estimation, where a dedicated algorithm estimates the importance score of each pretrained filter in the model. There are several ways to estimate filter importance, some methods (i.e. [1], [2], [3]) estimate the contribution of each filter to some loss and use it as a proxy for importance while others estimate filter importance based on the statistics of the output feature map [4]. Least important filters are then removed from the model, either iteratively or all at once and the resulting compressed model then is finetuned to recover any loss of performance.