查询结果:   刘琦,黄咨,陈璐艳,胡福乔.基于GPU的卷积检测模型加速[J].计算机应用与软件,2016,33(5):226 - 230.
中文标题
基于GPU的卷积检测模型加速
发表栏目
算法
摘要点击数
762
英文标题
CONVOLUTION-BASED DETECTION MODELS ACCELERATION BASED ON GPU
作 者
刘琦 黄咨 陈璐艳 胡福乔 Liu Qi Huang Zi Chen Luyan Hu Fuqiao
作者单位
上海交通大学自动化系系统控制与信息处理教育部重点实验室 上海 200240     
英文单位
Key Laboratory of System Control and Information Processing,Ministry of Education of China, Department of Automation,Shanghai Jiao Tong University,Shanghai 200240,China     
关键词
卷积检测模型 计算机视觉 GPU
Keywords
Convolution-based detection model Computer vision GPU
基金项目
国家自然科学基金项目(61175009);上海市产学研合作项目(沪CXY-2013-82)
作者资料
刘琦,硕士生,主研领域:计算机视觉,并行计算。 黄咨,硕士生。陈璐艳,硕士生。胡福乔,副教授。 。
文章摘要
近年来,形变部件模型和卷积神经网络等卷积检测模型在计算机视觉领域取得了极大的成功。这类模型能够进行大规模的机器学习训练,实现较高的鲁棒性和识别性能。然而训练和评估过程中卷积运算巨大的计算开销,也限制了其在诸多实际场景中进一步的应用。利用数学理论和并行技术对卷积检测模型进行算法和硬件的双重加速。在算法层面,通过将空间域中的卷积运算转换为频率域中的点乘运算来降低计算复杂度;而在硬件层面,利用GPU并行技术可以进一步减少计算时间。在PASCAL VOC数据集上的实验结果表明,相对于多核CPU,该算法能够实现在单个商用GPU上加速卷积过程2.13~4.31倍。
Abstract
In recent years,convolution-based detection models (CDM),such as the deformable part-based models (DPM) and the convolutional neural networks (CNN),have achieved tremendous success in computer vision field.These models allow for large-scale machine learning training to achieve higher robustness and recognition performance.However,the huge computational cost of convolution operation in training and evaluation processes also restricts their further application in many practical scenes.In this paper,we accelerate both the algorithm and hardware of convolution-based detection models with mathematical theory and parallelisation technique.In the aspect of algorithm,we reduce the computation complexity by converting the convolution operation in space domain to the point multiplication operation in frequency domain.While in the aspect of hardware,the use of graphical process unit (GPU) parallelisation technique can reduce the computational time further.Results of experiment on public dataset Pascal VOC demonstrate that compared with multi-core CPU,the proposed algorithm can realise speeding up the convolution process by 2.13 to 4.31 times on single commodity GPU.
下载PDF全文