Skip to Main Content
Using a cluster of PCs or workstations or the like (called nodes) to implement the database server can bring us two great benefits: high scalability and parallel processing capability. Before such a database server can be put into actual use, however two problems have to be solved. The one is how we cope with the data-skew since it can degrade the system performance significantly. The other is how a node is connected to or disconnected from a database server without affecting the users. One general solution to both problems is to redistribute the data. Unfortunately, this would take the data offline for a long time. In fact, numerous applications such as that for reservations, finance, process control, hospitals, police, and armed forces cannot afford the offline data for any significant amount of time. We address the subject of balancing data load online, i.e., balancing data load concurrently with users' reading and writing of the database. The main contributions are an effective approach for this purpose and a comprehensive performance study of the possible alternatives.