By Topic

Real-time GPU-based software beamformer designed for advanced imaging methods research

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Yiu, B.Y.S. ; Med. Eng. Program, Univ. of Hong Kong, Hong Kong, China ; Tsang, I.K.H. ; Yu, A.C.H.

High computational demand is known to be a technical hurdle for real-time implementation of advanced methods like synthetic aperture imaging (SAI) and plane wave imaging (PWI) that work with the pre-beamform data of each array element. In this paper, we present the development of a software beamformer for SAI and PWI with real-time parallel processing capacity. Our beamformer design comprises a pipelined group of graphics processing units (GPU) that are hosted within the same computer workstation. During operation, each available GPU is assigned to perform demodulation and beamforming for one frame of pre-beamform data acquired from one transmit firing (e.g. point firing for SAI). To facilitate parallel computation, the GPUs have been programmed to treat the calculation of depth pixels from the same image scanline as a block of processing threads that can be executed concurrently, and it would repeat this process for all scanlines to obtain the entire frame of image data - i.e. low-resolution image (LRI). To reduce processing latency due to repeated access of each GPU's global memory, we have made use of each thread block's fast-shared memory (to store an entire line of pre-beamform data during demodulation), created texture memory pointers, and utilized global memory caches (to stream repeatedly used data samples during beamforming). Based on this beamformer architecture, a prototype platform has been implemented for SAI and PWI, and its LRI processing throughput has been measured for test datasets with 40 MHz sampling rate, 32 receive channels, and imaging depths between 5-15 cm. When using two Fermi-class GPUs (GTX-470), our beamformer can compute LRIs of 512-by-255 pixels at over 3200 fps and 1300 fps respectively for imaging depths of 5 cm and 15 cm. This processing throughput is roughly 3.2 times higher than a Tesla-class GPU (GTX-275).

Published in:

Ultrasonics Symposium (IUS), 2010 IEEE

Date of Conference:

11-14 Oct. 2010