Skip to Main Content
The Large Hadron Collider (LHC) at CERN, the European Organization for Nuclear Research, needs to produce unprecedented volumes of data when it starts operation in 2007. To provide for its computational needs, the LHC computing grid (LCG) should be deployed as a worldwide computational grid service, providing the middleware upon which the physics analysis for the LHC can be carried out. In 2003, versions of this middleware were deployed which were based on the middleware produced by the European Data Grid project (EDG). In 2004 the LCG-2 release, which consisted of the EDG middleware with some minor modifications, was deployed for use by the LHC experiments. A series of data challenges by these experiments were the first real experiment production use of LCG. During the course of the data challenges, many issues and problems were exposed which had not shown up in more limited tests. The deployment, service and development teams worked closely with the experiments to understand these issues and while some of the problems were solved during the data challenges, others exposed fundamental problems with the middleware as deployed in LCG-2. One of these fundamental problems was the performance under real load of the catalog component provided by EDG, the replica location service. To solve these problems a new component was designed, the LCG file catalog (LFC). The LFC moves away from the replica location service model used in previous LCG releases, towards a hierarchical file system model which is more like a UNIX file system. It also adds missing functionality which was requested by the experiments. This paper presents the architecture and implementation of the LFC and evaluates it in a series of performance tests, with up to forty million entries and one hundred requesting threads from multiple clients. The results show good scalability up to the limits of these tests, and compare favourably with other grid catalog implementations.