On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups | IEEE Journals & Magazine | IEEE Xplore

On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups


Abstract:

On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance...Show More

Abstract:

On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages’ original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect ac...
Published in: IEEE Transactions on Software Engineering ( Volume: 51, Issue: 2, February 2025)
Page(s): 355 - 370
Date of Publication: 26 November 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.