Software thread integration (STI) is a compilation technique which enables the efficient use of an application's fine-grain idle time on generic processors without special hardware support. With STI, a primary function is automatically interleaved with a secondary function to create a single implicitly multithreaded function which minimizes context switching and, hence, both improves performance and also offers very fine-grain concurrency. In this work, we extend STI techniques to address two challenges. First, we reduce response time for interrupts or other high-priority threads by introducing polling servers into integrated threads. Second, we enable integration with long host threads, expanding the domain of STI. We derive methods to evaluate the response time for threads in systems with and without these new integration methods. We demonstrate these concepts with the integration of various threads in a sample hard-real-time system on a highly-constrained microcontroller. We use an inexpensive 20 MHz AVR 8-bit microcontroller to generate monochrome NTSC video while servicing a high-speed (115,2 kbaud) serial communication link. We have built and tested this system, achieving graphics rendering speed-ups of 3.99× to 13.5×.