By Topic

Multiplierless Algorithm for Multivariate Gaussian Random Number Generation in FPGAs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
David B. Thomas ; Department of Electrical and Electronic Engineering, Imperial College London, London, U.K. ; Wayne Luk

The multivariate Gaussian distribution is used to model random processes with distinct pair-wise correlations, such as stock prices that tend to rise and fall together. Multivariate Gaussian vectors with length n are usually produced by first generating a vector of n independent Gaussian samples, then multiplying with a correlation inducing matrix requiring O(n2) multiplications. This paper presents a method of generating vectors directly from the uniform distribution, removing the need for an expensive scalar Gaussian generator, and eliminating the need for any multipliers. The method relies only on small read-only memories and adders, and so can be implemented using only logic resources (lookup-tables and registers), saving multipliers, and block-memory resources for the numerical simulation that the multivariate generator is driving. The new method provides a ten times increase in performance (vectors/second) over the fastest existing field-programmable gate array generation method, and also provides a five times improvement in performance per resource over the most efficient existing method. Using this method, a single 400-MHz Virtex-5 FPGA can generate vectors ten times faster than an optimized implementation on a 1.2-GHz graphics processing unit, and a hundred times faster than vectorized software on a general purpose quad core 2.2-GHz processor.

Published in:

IEEE Transactions on Very Large Scale Integration (VLSI) Systems  (Volume:21 ,  Issue: 12 )