开发工具:
文件大小: 218kb
下载次数: 0
上传时间: 2013-01-14
详细说明: SSE4 Home › Articles by Kiefer Kuah April 2007 Intel Software Solutions Group Abstract Intel® SSE4 is a new set of Single Instruction Multiple Data (SIMD) instructions that will be introduced in the 45nm Next Generation Intel® Core™2 processor family (Penryn) and improve the performance and energy efficiency of a broad range of applications. This white paper describes how video encoders can utilize Intel SSE4 instructions to achieve 1.6x to 3.8x performance speedups in integer motion vector search, a frequently used motion estim ation function. Contents 1. Introduction 2. Motion Estimation Using MPSADBW and PHMINPOSUW 3. Results 4. Conclusion A. SSE2 - Optimized Function for 4x4 Blocks B. Intel® SSE4 - Optimized Function for 4x4 Blocks C. SSE2 - Optimized Function for 8x8 Blocks D. Intel® SSE4 - Optimized Function for 8x8 Blocks E. SSE2 - Optimized Function for 16x16 Blocks F. Intel® SSE4 - Optimized Function for 16x16 Blocks 1. Introduction Intel® Streaming SIMD Extensions 4 (Intel® SSE4) is a new set of Single Instruction Multiple Data (SIMD) instructions designed to improve the performance of various applications, such video encoders, image processing, and 3D games. Intel SSE4 builds upon the Intel® 64 and IA-32 instruction set, the most popular and broadly used computer architecture for developing 32-bit and 64-bit applications. Intel SSE4 will be introduced in the 45nm Next Generation Intel® Core™2 processor family (Penryn). This white paper will describe how video encoders can benefit from the Intel SSE4 instructions, achieving 1.6x to 3.8x performance speedups in integer motion vector search, a frequently used motion estimation function. Three different block sizes, 4x4, 8x8, and 16x16, are used in this paper to represent some of the variations that are used in motion estimation and to illustrate how the code can be adapted to suit these variations. 2. Motion Estimation Using MPSADBW and PHMINPOSUW Motion estimation is one of the main bottlenecks in video encoders. It involves searching reference frames for best matches and often accounts for about 40% of the total CPU cycles consumed by an encoder. The quality of the search is a factor that determines the compression ratio and the video quality of the enco ded video. This search operation is often the target of algorithmic and SIMD optimizations to improve the encoding speed. An un-optimized version of the block matching function for 4x4 block size is shown in Figure 2 -1. The example code in this paper performs only the integer motion vector search of the motion estimation stage. -F cigoullarpes 2e- s1o. uUrcneovpietwim pilzaeindc oVpeyr tsoi oclnip boof aarndp Irnintte?ger Block Matching Function ...展开收缩
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.