二. AI 硬件体系结构
https://www.knime.com/blog/a-friendly-introduction-to-deep-neural-networks
https://machine-learning.paperspace.com/wiki/activation-function
https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/
https://arxiv.org/pdf/1704.04861
英伟达 GPU 架构白皮书:https://www.NVIDIA.cn/technologies/
In-Datacenter Performance Analysis of a Tensor Processing Unit
[An in-depth look at 谷歌’s first Tensor Processing Unit (TPU)](https://cloud.谷歌.com/blog/products/ai-machine-learning/an-in-depth-look-at-谷歌 s-first-tensor-processing-unit-tpu)
谷歌 Tensor G3: The new chip that gives your Pixel an AI upgrade
Wikipedia-Tensor Processing Unit
A Domain-Specific Supercomputer for Training Deep Neural Networks
[1] Chen T , Du Z , Sun N ,et al.DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//International Conference on Architectural Support for Programming Languages & Operating Systems.ACM, 2014.DOI:10.1145/2541940.2541967.
[2] Chen Y , Luo T , Liu S ,et al.DaDianNao: A Machine-Learning Supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.0[2024-04-14].DOI:10.1109/MICRO.2014.58.
[3] Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O., 2015. ShiDianNao: shifting vision processing closer to the sensor, in: Proceedings of the 42nd Annual International Symposium on Computer Architecture. Presented at the ISCA ’15: The 42nd Annual International Symposium on Computer Architecture, ACM, Portland Oregon, pp. 92–104. https://doi.org/10.1145/2749469
[4] Liu, D., Chen, T., Liu, S., Zhou, J., Zhou, S., Teman, O., Feng, X., Zhou, X., Chen, Y., 2015. PuDianNao: A Polyvalent Machine Learning Accelerator, in: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. Presented at the ASPLOS ’15: Architectural Support for Programming Languages and Operating Systems, ACM, Istanbul Turkey, pp. 369–381. https://doi.org/10.1145/2694344
[5] Liu S, Du Z, Tao J, et al. Cambricon: An Instruction Set Architecture for Neural Networks[C]// Acm/ieee International Symposium on Computer Architecture. 2016.
[6] 寒武纪 CAMBRICON BANG C/C++ 编程指南
[7] 陈云霁,李玲,李威,郭崎,杜子东,2020. 《智能计算系统》, 机械工业出版社
[1] 未名超算队. "北大未名超算队高性能计算入门讲座(一):概论." Bilibili, [https://www.bilibili.com/video/BV1814y1g7YC/]
[2] 专用架构与 AI 软件栈(1). Zhihu, [https://zhuanlan.zhihu.com/p/387269513]
[3] "AMD’s CDNA 3 Compute Architecture." Chips and Cheese, [https://chipsandcheese.com/2023/12/17/amds-cdna-3-compute-architecture/]
[4] CUDA 生态才是英伟达 AI 霸主护城河-深度分析 2024. WeChat, [https://mp.weixin.qq.com/s/VGej8Jjags5v0JsHIuf_tQ]
[1] 未名超算队. "北大未名超算队高性能计算入门讲座(一):概论." Bilibili, [https://www.bilibili.com/video/BV1814y1g7YC/]
[2] 专用架构与 AI 软件栈(1). Zhihu, [https://zhuanlan.zhihu.com/p/387269513]
[3] "AMD’s CDNA 3 Compute Architecture." Chips and Cheese, [https://chipsandcheese.com/2023/12/17/amds-cdna-3-compute-architecture/]
[4] CUDA 生态才是英伟达 AI 霸主护城河-深度分析 2024. WeChat, [https://mp.weixin.qq.com/s/VGej8Jjags5v0JsHIuf_tQ]
[1] "David Patterson: A Decade of Machine Learning Accelerators:Lessons Learned and Carbon Footprint" YouTube, [https://www.youtube.com/watch?v=PLK3pGELbSs]
[2] "TPU 演进十年:谷歌的十大经验教训" 知乎, [https://zhuanlan.zhihu.com/p/573794328]
四. 推理系统&引擎
Deep Learning Inference in Meta Data Centers: Characterization, Performance Optimizations and Hardware Implications
Clipper: A Low-Latency Online Prediction Serving System
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
TensorFlow-Serving: Flexible, High-Performance ML Serving
Optimal Aggregation Policy for Reducing Tail Latency of Web Search
A Survey of Model Compression and Acceleration for Deep Neural Networks
CSE 599W: System for ML - Model Serving
https://developer.NVIDIA.com/deep-learning-performance-training-inference
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
Learning both Weights and Connections for Efficient Neural Networks
DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT
Halide: A Language and Compiler for Optimizing Parallelism,Locality, and Recomputation in Image Processing Pipelines
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
8-bit Inference with TensorRT
microsoft/AI-System
推理系统&引擎
NCNN、OpenVino、 TensorRT、MediaPipe、ONNX,各种推理部署架构,到底哪家强?
【AI System】第 8 章:深度学习推理系统
【AI】推理系统和推理引擎的整体架构
Deep Learning Inference in Meta Data Centers: Characterization, Performance Optimizations and Hardware Implications
Clipper: A Low-Latency Online Prediction Serving System
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
TensorFlow-Serving: Flexible, High-Performance ML Serving
Optimal Aggregation Policy for Reducing Tail Latency of Web Search
A Survey of Model Compression and Acceleration for Deep Neural Networks
CSE 599W: System for ML - Model Serving
Deep Learning Performance Training Inference
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
Learning both Weights and Connections for Efficient Neural Networks
DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT
Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
8-bit Inference with TensorRT
Microsoft AI System
模型推理服务化之 Triton:如何基于 Triton 开发自己的推理引擎?
Deep Learning Inference in Meta Data Centers: Characterization, Performance Optimizations and Hardware Implications
Clipper: A Low-Latency Online Prediction Serving System
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
TensorFlow-Serving: Flexible, High-Performance ML Serving
Optimal Aggregation Policy for Reducing Tail Latency of Web Search
A Survey of Model Compression and Acceleration for Deep Neural Networks
CSE 599W: System for ML - Model Serving
https://developer.nvidia.com/deep-learning-performance-training-inference
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
Learning both Weights and Connections for Efficient Neural Networks
DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT
Halide: A Language and Compiler for Optimizing Parallelism,Locality, and Recomputation in Image Processing Pipelines
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
8-bit Inference with TensorRT
microsoft/AI-System
【AI System】第 8 章:深度学习推理系统
Tengine-Kit 人脸检测及关键点
Crazy Rockets-教你如何集成华为 HMS ML Kit 人脸检测和手势识别打造爆款小游戏
记录自己神经网络模型训练的全流程
推理系统和推理引擎的整体架构
Pytorch-Onnx-Tensorrt 模型转换教程案例
昇思 MindSpore 基本介绍
飞桨产品全景
Deep Learning Inference in Meta Data Centers: Characterization, Performance Optimizations and Hardware Implications
Clipper: A Low-Latency Online Prediction Serving System
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
TensorFlow-Serving: Flexible, High-Performance ML Serving
Optimal Aggregation Policy for Reducing Tail Latency of Web Search
A Survey of Model Compression and Acceleration for Deep Neural Networks
CSE 599W: System for ML - Model Serving
https://developer.NVIDIA.com/deep-learning-performance-training-inference
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
Learning both Weights and Connections for Efficient Neural Networks
DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT
Halide: A Language and Compiler for Optimizing Parallelism,Locality, and Recomputation in Image Processing Pipelines
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
8-bit Inference with TensorRT
https://github.com/microsoft/AI-System
J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “MoDNN: Local distributed mobilecomputing system for deep neural network,” in Proc. Design, Autom. Test Eur. Conf. Exhibit.(DATE), Mar. 2017, pp. 1396–1401.
Z. Zhao, K. M. Barijough, and A. Gerstlauer, “Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 11, pp. 2348–2359, Nov. 2018.
1.Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, 2012
2.Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90. https://doi.org/10.1145/3065386
3.Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
4.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
5.Mohamed S Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O’Connell,Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C Ling, et al. Dla: Compiler and fpga overlay for neural network inference acceleration. In International Conference on Field Programmable Logic and Applications, pages 411–4117. IEEE, 2018.
1.François Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv:1610.02357, 2016.
2.Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. arXiv:1611.05431, 2016.
3.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
4.Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
5.Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
6.Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE (2017) 2755–2763
7.Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2015) 1984–1992
8.Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE transactions on pattern analysis and machine intelligence 38(10) (2016) 1943–1955
9.Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B.,Shelhamer, E.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
10.Das, D., Avancha, S., Mudigere, D., Vaidynathan, K., Sridharan, S., Kalamkar,D., Kaul, B., Dubey, P.: Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709 (2016)
11.Ioannou, Y., Robertson, D., Cipolla, R., Criminisi, A.: Deep roots: Improving cnn efficiency with hierarchical filter groups. arXiv preprint arXiv:1605.06489 (2016)
12.Zhang, T., Qi, G.J., Xiao, B., Wang, J.: Interleaved group convolutions for deep neural networks. In: International Conference on Computer Vision. (2017)
13.Xie, G., Wang, J., Zhang, T., Lai, J., Hong, R., Qi, G.J.: Igcv 2: Interleaved structured sparse convolutional neural networks. arXiv preprint arXiv:1804.06202(2018)
14.Sun, K., Li, M., Liu, D., Wang, J.: Igcv3: Interleaved low-rank group convolutions for efficient deep neural networks. arXiv preprint arXiv:1806.00178 (2018)
15.Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI. Volume 4. (2017)
16.Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE transactions on pattern analysis and machine intelligence 38(10) (2016) 1943–1955
17.Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B.,Shelhamer, E.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
18.O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,et al. Imagenet large scale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252,2015.
19.S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou.Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016. 2
20.B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012, 2017. 1,
1.M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al.TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from TensorFlow. org, 1,2015.
2.I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint arXiv:1609.07061, 2016. 2
3.F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J.Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 1mb model size. arXiv preprint arXiv:1602.07360, 2016. 1, 6
4.S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift.arXiv preprint arXiv:1502.03167, 2015.
5.M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions.arXiv preprint arXiv:1405.3866, 2014. 2
6.Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 4
7. J. Jin, A. Dundar, and E. Culurciello. Flattened convolutional neural networks for feedforward acceleration. arXiv preprint arXiv:1412.5474, 2014. 1, 3
8.A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei.Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition,Colorado Springs, CO, June 2011. 6
9.J. Krause, B. Sapp, A. Howard, H. Zhou, A. Toshev,T. Duerig, J. Philbin, and L. Fei-Fei. The unreasonable effectiveness of noisy data for fine-grained recognition. arXiv preprint arXiv:1511.06789, 2015. 6
10.R. Avenash and P. Vishawanth. Semantic segmentation of satellite images using a modified cnn with hard-swish activation function. In VISIGRAPP, 2019. 2, 4
11. Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi,Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR,2017. 7
12.Wei Liu, Dragomir Anguelov, Dumitru Erhan,Christian Szegedy, Scott Reed, Cheng-Yang Fu,and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, 2016.
13.Jonathan Huang, Vivek Rathod, Derek Chow,Chen Sun, and Menglong Zhu. TensorFlow object detection api, 2017. 7
14.Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. CoRR, abs/1706.05587, 2017. 7
15.Matthias Holschneider, Richard KronlandMartinet, Jean Morlet, and Ph Tchamitchian.A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets:Time-Frequency Methods and Phase Space, pages 289–297. 1989. 7
16.Pierre Sermanet, David Eigen, Xiang Zhang,Michael Mathieu, Rob Fergus, and Yann Le- ¨Cun. Overfeat: Integrated recognition, localization and detection using convolutional networks.arXiv:1312.6229, 2013. 7
17.George Papandreou, Iasonas Kokkinos, and PierreAndre Savalle. Modeling local and global deformations in deep learning: Epitomic convolution,multiple instance learning, and sliding window detection. In CVPR, 2015. 7
18.T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 7
19C. Liu, B. Zoph, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. L.Yuille, J. Huang, and K. Murphy.Progressive neural architecture search. CoRR, abs/1712.00559, 2017. 2
20.H. Liu, K. Simonyan, and Y. Yang. DARTS: differentiable architecture search. CoRR, abs/1806.09055, 2018. 2
21.W. Liu, A. Rabinovich, and A. C. Berg. Parsenet: Looking wider to see better. CoRR, abs/1506.04579, 2015. 7
22. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. 8
22.S. Mehta, M. Rastegari, A. Caspi, L. G. Shapiro, and H. Hajishirzi. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Computer Vision -ECCV 2018 - 15th European Conference, Munich, Germany,September 8-14, 2018, Proceedings, Part X, pages 561–580,2018. 8
23.S. Mehta, M. Rastegari, L. G. Shapiro, and H. Hajishirzi. Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. CoRR, abs/1811.11431, 2018.
24.H. Park, Y. Yoo, G. Seo, D. Han, S. Yun, and N. Kwak.Concentrated-comprehensive convolutions for lightweightsemantic segmentation. CoRR, abs/1812.04920, 2018. 8
25.H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean.Efficient neural architecture search via parameter sharing.CoRR, abs/1802.03268, 2018. 2
26.P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. CoRR, abs/1710.05941, 2017. 2, 4
27.F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J.Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. CoRR,abs/1602.07360, 2016. 2
28.J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng. Quantized convolutional neural networks for mobile devices. CoRR,abs/1512.06473, 2015. 2
29.S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou. Dorefanet: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR, abs/1606.06160, 2016.
30.Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, and Anurag Ranjan. Fastvit: A fast hybrid vision transformer using structural reparameterization. arXiv preprint arXiv:2303.14189, 2023.
1.Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR. (2017)
2.He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: ECCV. (2014)
3.Ess, A., Muller, T., Grabner, H., Van Gool, L.J.: Segmentation-based urban traffic scene understanding. In: BMVC. (2009)
4.Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR. (2015)
5.Xiang, Y., Fox, D.: DA-RNN: Semantic mapping with data associated recurrent neural networks. Robotics: Science and Systems (RSS) (2017)
6.Chollet, F.: Xception: Deep learning with depthwise separable convolutions. CVPR (2017)
7.Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. ICLR (2016)
8.Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. CVPR (2017)
9.Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545 (2017)
10.Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation.In: CVPR. (2015)
11.Tao Lei, Yu Zhang, and Yoav Artzi. Training rnns as fast as cnns. In EMNLP, 2018. 8
12.Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017. 5
13.Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. Semantic contours from inverse detectors. In ICCV, 2011. 6
14.Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578,2016.2
15.M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, and M.Jagersand. rtseg: Real-time semantic segmentation comparative study. In 2018 25th IEEE International Conference on Image Processing (ICIP).7
1.Anonymous. Snas: stochastic neural architecture search. In Submitted to International Conference on Learning Repre sentations, 2019. under review.
2.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages770–778, 2016.
3.G. Huang, S. Liu, L. van der Maaten, and K. Q. Weinberger.Condensenet: An efficient densenet using learned group convolutions. group, 3(12):11, 2017.
4.E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144,2016.
5.D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
6.H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.
7.X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arxiv 2017. arXiv preprint arXiv:1707.01083.
8.T. Veniat and L. Denoyer. Learning time/memory-efficient deep architectures with budgeted super networks. arXiv preprint arXiv:1706.00046, 2017.
9.T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam. Netadapt: Platform-aware neuralnetwork adaptation for mobile applications. Energy, 41:46,2018.
10.Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy aware pruning. arXiv preprint arXiv:1611.05128, 2016.
11.Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le. Learning transferable architectures for scalable image recognition. pages 8697–8710, 06 2018
12.HongyiZhang, MoustaphaCisse, YannNDauphin, andDavid Lopez-Paz. mixup: Beyond empirical risk minimization.ICLR, 2018. 5
13.NingningMa,XiangyuZhang,Hai-TaoZheng, andJianSun.ShuffleNet V2: Practical guidelines for efficient CNN architecture design. arXiv preprint arXiv:1807.11164, 2018.
14.Mingxing Tan and Quoc V Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019.
15.M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le.Mnasnet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018.
16. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database.In CVPR, 2009. 5
17.Piotr Dollár, Mannat Singh, and Ross Girshick. Fast and accurate model scaling. arXiv preprint arXiv:2103.06877, 2021. 12
18.Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts:Differentiable architecture search. ICLR, 2019. 3
19.Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang,Yang, AlanYuille, and Jianchao Yang. Atomnas: Fine-grained end-to-end neural architecture search. ICLR, 2020. 7
20.Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian,and Rodrigo Fonseca. Neural architecture search using deep neural networks and monte carlo tree search. In AAAI, 2020.
1.Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,K. Q. Densely connected convolutional networks. CVPR,2017.
2.Kornblith, S., Shlens, J., and Le, Q. V. Do better imagenet models transfer better? CVPR, 2019.
3.Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical Report, 2009.
4.Lin, H. and Jegelka, S. Resnet with one-neuron hidden layers is a universal approximator. NeurIPS, pp. 6172 6181, 2018.
5.Ma, N., Zhang, X., Zheng, H.-T., and Sun, J. Shufflenet v2:Practical guidelines for efficient cnn architecture design.ECCV, 2018.
6.Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. Learning transferable architectures for scalable image recognition.CVPR, 2018.
7.Zagoruyko, S. and Komodakis, N. Wide residual networks.BMVC, 2016.
8.Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba,A. Learning deep features for discriminative localization.CVPR, pp. 2921–2929, 2016.
9.Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. ICLR, 2018.
10.Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical Report, 2009.
11.Xie, Q., Luong, M.-T., Hovy, E., and Le, Q. V. Self training with noisy student improves imagenet classification. CVPR, 2020.
12.Zhang, H., Cisse, M., Dauphin, Y. N., and Lopez-Paz, D.Mixup: Beyond empirical risk minimization. ICLR, 2018.
13. Ridnik, T., Lawen, H., Noy, A., Baruch, E. B., Sharir,G., and Friedman, I. Tresnet: High performance gpu dedicated architecture. arXiv preprint arXiv:2003.13630,2020.
14.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. CVPR, 2018.
15.Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. The expressive power of neural networks: A view from the width.NeurIPS, 2018. Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L.The expressive power of neural networks: A view from the width.NeurIPS, 2018.
1.Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019.
2.Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang,Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, and Qi Tian. Data-free learning of student networks. In ICCV, 2019.
3.Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos,Kevin Murphy, and Alan L Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2016.
4.Xinyu Gong, Shiyu Chang, Yifan Jiang, and Zhangyang Wang. Autogan: Neural architecture search for generative adversarial networks. In ICCV, 2019.
5.Kai Han, Jianyuan Guo, Chao Zhang, and Mingjian Zhu.Attribute-aware attention model for fine-grained representation learning. In ACM MM, 2018.
6.Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
7.Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. In ICLR, 2019.
8.Mingzhu Shen, Kai Han, Chunjing Xu, and Yunhe Wang. Searching for accurate binary neural architectures. In ICCV Workshops, 2019.
9.Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In CVPR, pages 8697–8710, 2018.
10.Shan You, Chang Xu, Chao Xu, and Dacheng Tao. Learning from multiple teacher networks. In SIGKDD, 2017.
11.Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
12.Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
13.Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV),pages 116–131, 2018.
14.Sachin Mehta and Mohammad Rastegari. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
15.Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, et al. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
1.Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295, 2019.
2.Chun-Fu Chen, Quanfu Fan, and Rameswar Panda. CrossVit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021a.
3.Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
4.Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, and Zicheng Liu. Mobile-former: Bridging mobilenet and transformer. arXiv preprint arXiv:2108.05895,2021b.
5.Franc ¸ois Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
6.Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment:Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123, 2019.
7.Zihang Dai, Hanxiao Liu, Quoc V Le, and Mingxing Tan. Coatnet: Marrying convolution and attention for all data sizes. arXiv preprint arXiv:2106.04803, 2021.
8.St´ ephane d’Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, and Levent Sagun.Convit: Improving vision transformers with soft convolutional inductive biases. arXiv preprint arXiv:2103.10697, 2021.
9.Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszko reit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
10.Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural net works. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
11.Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Doll´ar, and Ross Girshick. Early convolutions help transformers see better. Advances in Neural Information Processing Systems, 34:30392–30400, 2021.
12.Yufei Xu, Qiming Zhang, Jing Zhang, and Dacheng Tao. Vitae: Vision transformer advanced by exploring intrinsic inductive bias. Advances in Neural Information Processing Systems, 34:28522 28535, 2021b.
13.Qinglong Zhang and Yu-Bin Yang. Rest: An efficient transformer for visual recognition. Advances in Neural Information Processing Systems, 34:15475–15485, 2021.
14.Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
15.Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
16.Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020.
17.Apoorv Vyas, Angelos Katharopoulos, and François Fleuret. Fast transformers with clustered attention.Advances in Neural Information Processing Systems, 33:21665–21674, 2020.
18.Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
19.[Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992.](Acceleration of stochastic approximation by averaging)
20.Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
1.Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to end object detection with transformers. In ECCV, 2020. 2,4, 5, 7, 8
2.Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic relu. In ECCV,2020. 2, 3, 4, 6
3.Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Va sudevan, and Quoc V. Le. Autoaugment: Learning augmen tation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 2019. 5
4.St´ephane d’Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, and Levent Sagun. Convit: Improving vision transformers with soft convolutional inductive biases.arXiv preprint arXiv:2103.10697, 2021. 2, 3, 5, 6
5.Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5, 6, 12
6.Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. Cswin transformer: A general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652, 2021. 2
7.Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. 1, 2, 3
8.Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron,Pierre Stock, Armand Joulin, Herv´ e J´ egou, and Matthijs Douze. Levit: a vision transformer in convnet’s clothing for faster inference. arXiv preprint arXiv:22104.01136, 2021. 1,2, 3, 6
9.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and patter recognition, pages 770–778, 2016. 3, 7, 8
10.Geoffrey E. Hinton. How to represent part-whole hierarchies in a neural network. CoRR, abs/2102.12627, 2021. 2
11.Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu,Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), October 2019. 1, 2, 4, 5, 6, 7, 8, 11, 12
12.Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An dreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 1, 2
13.Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 2
14.Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu,Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers, 2021. 1, 2, 3
15.DaquanZhou, Qi-BinHou, Y.Chen, Jiashi Feng, and S. Yan.Rethinking bottleneck structure for efficient mobile network design. In ECCV, August 2020. 2
1.Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. 6, 8
2.Hugo Touvron, Matthieu Cord, and Herve J ´ egou. Deit iii: ´Revenge of the vit. arXiv preprint arXiv:2204.07118, 2022.13
3.Zizheng Pan, Jianfei Cai, and Bohan Zhuang. Fast vision transformers with hilo attention. arXiv preprint arXiv:2205.13213, 2022. 1
4.Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, and Le Hou. Talking-heads attention. arXiv preprint arXiv:2003.02436, 2020. 4
5.Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, and Shuicheng Yan. Inception transformer. arXiv preprint arXiv:2205.12956, 2022. 1, 2, 4
6.Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen,Xinggang Wang, Wenyu Liu, Gang Yu, and Chunhua Shen.Topformer: Token pyramid transformer for mobile semantic segmentation, 2022. 2
7.Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan Arik, and Tomas Pfister. Nested hierarchical transformer:Towards accurate, data-efficient and interpretable visual understanding. 2022. 2
8.Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng,and Shuicheng Yan. Metaformer is actually what you need for vision. arXiv preprint arXiv:2111.11418, 2021
9.Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, and Jonni Kanerva. Sparse is enough in scaling transformers. Advances in Neural Information Processing Systems, 34:9895–9907, 2021.
10.Sachin Mehta and Mohammad Rastegari. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
1.Khalid Ashraf, Bichen Wu, Forrest N. Iandola, Matthew W. Moskewicz, and Kurt Keutzer. Shallow networks for high-accuracy road object-detection. arXiv:1606.01561, 2016.
2.Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. SegNet: A deep convolutional encoderdecoder architecture for image segmentation. arxiv:1511.00561, 2015.
3.Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu,
Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for
heterogeneous distributed systems. arXiv:1512.01274, 2015a.
4.Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor
Darrell. Decaf: A deep convolutional activation feature for generic visual recognition.
arXiv:1310.1531, 2013.
5.Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, Bryan Catanzaro, John Tran, and William J. Dally. Dsd: Regularizing deep neural networks with dense-sparse-dense training flow. arXiv:1607.04381, 2016b
6.C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW),2011 IEEE Computer Society Conference on, pages109–116, 2011.
7.M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014.
8.M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi.Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision, pages 525–542, 2016.
9. S. Williams, A. Waterman, and D. Patterson. Roofline:an insightful visual performance model for multicore architectures. Communications of the ACM, 52(4):65–76, 2009.
10.B. Wu, A. Wan, X. Yue, and K. Keutzer. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. arXiv preprint arXiv:1710.07368, 2017.
11.K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014.
12.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
13.S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine
learning, pages 448–456, 2015.
14.S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations(ICLR), 2016.
15.A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko,W. Wang, T. Weyand, M. Andreetto, and H. Adam.Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks(ICCV 2019)
IR-Net: Forward and Backward Information Retention for Highly Accurate Binary Neural Networks(CVPR 2020)
Towards Unified INT8 Training for Convolutional Neural Network(CVPR 2020)
Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition(CVPR 2020)
DMS: Differentiable diMension Search for Binary Neural Networks(ICLR 2020 Workshop)
Nagel, Markus, et al. "A white paper on neural network quantization." arXiv preprint arXiv:2106.08295 (2021).
Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018)
全网最全-网络模型低比特量化 https://zhuanlan.zhihu.com/p/453992336
Practical Quantization in PyTorch
Jacob, Benoit, et al. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Wu, Hao, et al. "Integer quantization for deep learning inference: Principles and empirical evaluation." arXiv preprint arXiv:2004.09602 (2020).
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks(ICCV 2019)
IR-Net: Forward and Backward Information Retention for Highly Accurate Binary Neural Networks(CVPR 2020)
Towards Unified INT8 Training for Convolutional Neural Network(CVPR 2020)
Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition(CVPR 2020)
DMS: Differentiable diMension Search for Binary Neural Networks(ICLR 2020 Workshop)
Nagel, Markus, et al. "A white paper on neural network quantization." arXiv preprint arXiv:2106.08295 (2021).
Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018)
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630.
Wu, H., Judd, P., Zhang, X., Isaev, M., & Micikevicius, P. (2020). Integer quantization for deep learning inference: Principles and empirical evaluation. arXiv preprint arXiv:2004.09602.
8-bit Inference with TensorRT
Practical Quantization in PyTorch
全网最全-网络模型低比特量化
Jianping Gou et al. Knowledge Distillation: A Survey. https://doi.org/10.1007/s11263-021-01453-z
Hinton et al. Distilling the Knowledge in a Neural Network. http://arxiv.org/abs/1503.02531
Longhui Wei et al. Circumventing outlier of autoaugment with knowledge distillation. https://doi.org/10.1007/978-3-030-58580-8_36
Caruana et al. Model compression. https://doi.org/10.1145/1150402.1150464
模型压缩(上)--知识蒸馏(Distilling Knowledge)https://www.jianshu.com/p/a6d87b338bcf
DeiT:注意力也能蒸馏 https://www.cnblogs.com/ZOMI/p/16496326.html
AI 框架部署方案之模型转换
AI 技术方案(个人总结)
人工智能系统 System for AI 课程介绍 Lecture Introduction
【AI】推理引擎的模型转换模块
Pytorch 和 TensorFlow 在 padding 实现上的区别
训练模型到推理模型的转换及优化
使用 Grappler 优化 TensorFlow 计算图
死代码消除
AI 编译器之前端优化-下(笔记)
PyTorch 官方教程中文版
MindSpore 教程
TensorFlow Core
保存和加载 Keras 模型
探索 ONNX 模型:动态输入尺寸的实践与解决方案
Pytorch 复习笔记--导出 Onnx 模型为动态输入和静态输入
PyTorch 学习—19.模型的加载与保存(序列化与反序列化)
开源 AI 模型序列化总结
ONNX 学习笔记
深入 CoreML 模型定义
Swift loves TensorFlow and CoreML
什么是 Protobuf?
Protobuf 语法指南
深入浅出 FlatBuffers 之 Schema
FlatBuffers,MNN 模型存储结构基础 ---- 无法解读 MNN 模型文件的秘密
华为昇思 MindSpore 详细教程(一)
如何将在 GPU 上训练的模型加载到 CPU(系统)内存中?
11 模型的保存加载¶
Lecun Y , Bottou L .Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324.DOI:10.1109/5.726791.
Fukushima, Kunihiko and Sei Miyake. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition.” (1982).
Bouvrie J .Notes on Convolutional Neural Networks[J].neural nets, 2006.
Krizhevsky A , Sutskever I , Hinton G .ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in neural information processing systems, 2012, 25(2).DOI:10.1145/3065386.
Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning. MIT Press, 2016.
卷积神经网络优化算法
Winograd, Shmuel. Arithmetic complexity of computations. Vol. 33. Siam, 1980.
Lavin, Andrew, and Scott Gray. "Fast algorithms for convolutional neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
A simple python module for computing minimal Winograd convolution algorithms for use with convolutional neural networks
video: Fast Algorithms for Convolutional Neural Networks by Andrew Lavin and Scott Gray
video: Even Faster CNNs Exploring the New Class of Winograd Algorithms
Understanding ‘Winograd Fast Convolution’
详解卷积中的 Winograd 加速算法
一文看懂 winograd 卷积加速算法
详解 Winograd 变换矩阵生成原理
AI 算法基础 [4]:Winograd 算法原理
[DL]Winograd 快速卷积算法
MegEngine Inference 卷积优化之 Im2col 和 winograd 优化
Winograd 卷积的纯 Python 实现
Winograd 优化算法
五. AI 框架核心模块
深入浅出:AI 框架与计算图的关系
4.1. 计算图的设计背景和作用
【AI】推理系统和推理引擎的整体架构
谈谈深度学习框架的数据排布
从零构建 AI 推理引擎系列
一篇就够:高性能推理引擎理论与实践 (TensorRT)
序列化之 FlatBuffers
【AI】推理引擎的模型转换模块
深度学习模型转换
deep-learning-model-convertor
hb_mapper_tools_guide
模型转换:由 Pytorch 到 TFlite
AI 框架部署方案之模型转换
Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification
模型部署入门教程(一):模型部署简介
模型部署入门教程(三):PyTorch 转 ONNX 详解
[1] Boris Ginsburg, Sergei Nikolaev, Paulius Micikevicius, (2017). TRAINING WITH MIXED PRECISION. Retrieved from https://on-demand.GPUtechconf.com/gtc/2017/presentation/s7218-training-with-mixed-precision-boris-ginsburg.pdf.
[2] Wickipedia. Half-precision floating-point format. Retrieved from https://en.wikipedia.org/wiki/Half-precision_floating-point_format.
[3] The huggingface Authors. (2024). Methods and tools for efficient training on a single GPU. Retrieved from https://huggingface.co/docs/transformers/main/en/perf_train_GPU_one
[1] Li S, Zhao Y, Varma R, et al. Pytorch distributed: Experiences on accelerating data parallel training[J]. arXiv preprint arXiv:2006.15704, 2020.
[2] Rajbhandari S, Rasley J, Ruwase O, et al. Zero: Memory optimizations toward training trillion parameter models[C]//SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020: 1-16.
[3] Li M, Zhou L, Yang Z, et al. Parameter server for distributed machine learning[C]//Big learning NIPS workshop. 2013, 6(2).
[1] The Pytorch Authors. (2024). Getting Started with Distributed Data Parallel. Retrieved from https://pytorch.org/tutorials/intermediate/ddp_tutorial.html.
[1] Rajbhandari S, Rasley J, Ruwase O, et al. Zero: Memory optimizations toward training trillion parameter models[C]//SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020: 1-16.
[2] Rajbhandari S, Ruwase O, Rasley J, et al. Zero-infinity: Breaking the GPU memory wall for extreme scale deep learning[C]//Proceedings of the international conference for high performance computing, networking, storage and analysis. 2021: 1-14.
[3] Lv K, Yang Y, Liu T, et al. Full parameter fine-tuning for large language models with limited resources[J]. arXiv preprint arXiv:2306.09782, 2023.