Portable (standards-compliant) systems software is usually associated with unavoidable overhead from the standards-prescribed interface. For example, consider the POSIX Threads standard facility for using thread-specific data (TSD) to implement multithreaded code. The first TSD reference must be preceded by pthread_getspecific( ), typically implemented as a function or macro with 40-50 instructions. This paper proposes a method that uses the runtime specialization'facility of the Tempo program specializer to convert such unavoidable source code into simple memory references of one or two instructions for execution. Consequently, the source code remains standard compliant and the executed code's performance is similar to direct global variable access. Measurements show significant performance gains over a range of code sizes. A random number generator (10 lines of C) shows a speedup of 4.8 times on a SPARC and 2.2 times on a Pentium. A time converter (2,800 lines) was sped up by 14 and 22 percent, respectively, and a parallel genetic algorithm system (14,000 lines) was sped up by 13 and 5 percent.