Preparing for GTC
This week, I am pleased to introduce a guest blogger. My colleague Brooks Moses is getting ready for a conference:
Colin has graciously let me use this space to tell you about the things I’m doing to get ready for my presentation at NVIDIA’s GPU Technology Conference on May 15th. So, hello! …
Now, wait a minute, you’re probably wondering; what are we doing talking about GPUs? Isn’t this an embedded software blog? And indeed it is; the systems that I’m working on come from an odd corner of the world called High Performance Embedded Computing. The control system in an unmanned aircraft might be equivalent to a multi-server compute cluster, but it has size, weight, and power constraints just like any other embedded system. The challenge is to get as much computational performance as you can within those constraints — and, when you really need a lot of compute power, the performance-per-watt numbers on a GPU start to look very attractive.
However, making GPUs work in an embedded system is really difficult on a lot of levels. In absolute terms, they draw a lot of power — about 70W for a typical embedded chip. Our friends at GE Intelligent Platforms, Curtiss Wright, and Mercury Computer have been building hardware that can support GPUs and keep them cool in a flight-capable system, and some of the tricks they’ve used are pretty intense. (Ask Curtiss-Wright about their air-flow-through technology!) But even when you have the hardware in place, that’s not the end of the problems; GPUs are also notoriously difficult to program if you want to get maximum performance.
That’s where we come in, with Mentor’s Sourcery VSIPL++ library. It’s a library for writing embedded high-performance signal- and image-processing applications, designed so that the user can write simple code that’s hardware-independent, and we can make their code go fast on many different hardware platforms. We’ve been working on the GPU version for several years now, but we also have versions for a couple of different CPUs, and we did a version for IBM’s Cell/B.E. processor back when that was the hot new thing. Like the hardware builders, we’ve put a lot of tricks into making this actually work and work well, and that’s part of what I’m going to be presenting. I’m trying to pack a lot of technical detail into the presentation, so people can really learn from what we’ve done and use some of the same tricks in their own code.
No presentation is complete without a good demo, though, and that’s the other part of what I’m hurrying to get finished by the 15th. We have a sample Synthetic Aperture Radar computation that we’ve been using as a benchmark for a number of years, and a brand new NVIDIA GTX-680 GPU in our lab that we just got last week (it’s the first card they’ve released with the new “Kepler” architecture), and so we’ll be presenting some benchmark numbers on how well that works. I’ve only run a couple of quick things so far, but the first look indicates that it’s substantially faster than the last-generation Tesla C2050 card we’ve been using.
So, if you’re in the San Jose, California area on the 15th, come by the conference and listen to the presentation, or stop by our booth. I’d be very glad to see you! And, if you can’t make it, come back here in a couple of weeks and I’ll be posting a recap of the presentation for the loyal readers here.