I. INTRUDUCTION
The evolution of computer technology has witnessed the emergence of various architectures since the advent of first-generation computers in the 1940s. Over time, efforts have been made to enhance computer performance. One popular technique used in current CPU designs, microprocessors, and microcontrollers is instruction pipelining, which can significantly increase the number of instructions executed in a specific time interval [1][2]. Our design philosophy for the architecture was based on the RISC principle of favoring a smaller and simpler set of instructions that could be executed in the same amount of time. This led us to maintain a highly simplified instruction set [3]. Pipelining is a fundamental aspect of RISC processors that mimics the workflow of a manufacturing assembly line. By processing different stages of the instruction simultaneously, the processor can execute more instructions in a shorter span of time [4]. RISC processors are preferred over CISC processors when it comes to implementing pipelining. This is because traditional instruction cycles of CISC processors tend to waste CPU resources by including other services, such as reading or writing to memory or input/output devices, which leave the CPU idle. To improve instruction latency and program throughput, pipelining has proven to be highly efficient. As computer systems continue to evolve, technological advances such as speed up circuits and improved organization, including the addition of instruction pipelines to processors, are utilized to achieve higher performance [5][6][7]. The pipeline processor operates by initially placing the first instruction in the decode stage. Subsequently, the second instruction is fetched while the first instruction is in the execute stage. Once the first instruction completes execution after three cycles, the second instruction moves to the decode stage and the third instruction is fetched. Thereafter, instructions are completed every cycle