This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing languages. The model is supported in the ISA by number of instructions to create and manage abstract concurrency. The paper estimates the cost of supporting these instructions in silicon. The model and its implementation uses dynamic parameterisation of concurrency creation, where a single instruction captures asynchronous remote function execution, mutual exclusion and the execution of a general concurrent loop structure and all associated communication. Concurrent loops may be dependent or independent, bounded or unbounded and may be nested arbitrarily. Hierarchical concurrency allows compilers to restructure and parallelise sequential code to meet the strict constraints on the model, which provide its freedom from deadlock and locality of communication. Communication is implicit in both the model and micro-architecture, due to the dynamic distribution of concurrency. The result is location-independent binary code that may execute on any number of processors. Simulation and analysis of the micro-architecture indicate that the model is a strong candidate for the exploitation of many-core processors. The results show near-linear speedup over two orders of magnitude of processor scaling, good energy efficiency and tolerance to large latencies in asynchronous operations. This is true for both independent threads as well as for reductions.