This paper uses a theoretical model to approximate the delay of different characteristic two primitives used in an elliptic curve scalar multiplier architecture (ECSMA) implemented on k input lookup table (LUT)-based field-programmable gate arrays. Approximations are used to determine the delay of the critical paths in the ECSMA. This is then used to theoretically estimate the optimal number of pipeline stages and the ideal placement of each stage in the ECSMA. This paper illustrates suitable scheduling for performing point addition and doubling in a pipelined data path of the ECSMA. Finally, detailed analyses, supported with experimental results, are provided to design the fastest scalar multiplier over generic curves. Experimental results for GF(2163) show that, when the ECSMA is suitably pipelined, the scalar multiplication can be performed in only 9.5 μs on a Xilinx Virtex V. Notably the design has an area which is significantly smaller than other reported high-speed designs, which is due to the better LUT utilization of the underlying field primitives.