Get packing
I have frequently made the observation that a key difference between embedded and desktop system programming is variability: every Windows PC is essentially the same, whereas every embedded system is different. There are a number of implications of this variability tools need to be more sophisticated and flexible; programmers need to be ready to accommodate the specific requirements of their system; standard programming languages are mostly non-ideal for the job.
I have written on a number of occasions [like here] about the non-ideal nature of standard programming languages for embedded applications. A specific aspect that can give trouble is control of optimization …
Optimization is a big topic. Broadly it is a set of processes and algorithms that enable a compiler to advance from translating code from [say] C into assembly language to translating an algorithm expressed in C into a functionally identical one expressed in assembly. This is a subtle, but important difference. Again, I have explored this before.
A key aspect of optimization is memory utilization. Typically, a decision has to be made in the trade-off between having fast code or small code – it is rare to have the best of both worlds. This decision also applies to data. The way that data is stored into memory affects its access time. With a 32-bit CPU, if everything is aligned with word boundaries, access time is fast; this is termed “unpacked data”. Alternatively, if bytes of data are stored as efficiently as possible, it may take more effort to retrieve data and hence the access time is slower; this is “packed” data. So, you have a choice much the same as with code: compact data that is slow to access or a bit of wasted memory, but fast access to data.
Most embedded compilers have a switch to select what kind of code generation and optimization is required. However, there may be a situation where you decide to have all your data unpacked for speed, but have certain data structures where you would rather save memory by packing. Or perhaps you pack all the data and have certain items which you want unpacked either for speed or for sharing with other software. For these situations, many embedded compilers feature two extension keywords – packed and unpacked – which override the appropriate code generation options. It is unlikely that you would use both keywords in one program, as only one of the two code generation options can be active at any one time.
Colin,
One other point worth mentioning – something that surprised me when I first observed it…
Sometimes optimizing for size can actually result in improved performance (speed) as well. Although I guess the “size” is really more “code size” than data size, but I’ve still seen it work that way.
A somewhat contrived example… suppose you have a switch statement with cases for ‘A’ through ‘Z’… and suppose the (dumb) compiler implements it as a series of cascaded if-else statements (I did mention this was a “contrived” example with a “dumb” compiler, didn’t I?)
When told to optimize for (code) size, it would probably use a jump table, which would also run faster.
I’ve seen more intricate real-world examples of this effect, unfortunately I don’t have any at hand, hence my silly example. But I wanted to mention this because most of my experience & intuition is that when it comes to optimization, you have to “Rob Peter to pay Paul” — but that isn’t always the case.
I did a course on Embedded Software Optimization, including a case study from Mentor, a couple years back. The course is located here:
http://www.eetimes.com/electrical-engineers/education-training/courses/4000148/Fundamentals-of-Embedded-Software-Optimization
Thanks for the input Dan.
I wrote about switch statement optimization a while back: https://blogs.mentor.com/colinwalls/blog/2009/05/27/assembly-language-is-always-smallestfastest-not/
BTW, that course you referenced is not from Mentor, but is good anyway.
Hi Colin,
I guess I wasn’t clear. I didn’t mean to imply that Mentor sourced / authored the course (I did), but if you go to slide 38, the start of the case study, you’ll see it involves Nucleus & the Mentor tools (compiler, Edge Suite profiler, etc.)
Ah right. I see. I didn’t go through all the slides. Thanks for the clarification.