Skip to Main Content
In this paper we explore the potential of using application data dependency information to reduce the average memory consumption in distributed streaming applications. By analyzing data dependencies during the application runtime, we can infer which data items are not going to influence the application's output. This information is then incorporated into the garbage collector, extending the garbage identification problem to include not only data items that are not reachable, but also those data items that are not fully processed and dropped. We present three garbage collection algorithms. Each of the algorithms uses different data dependency information. We implement the algorithms and compare their performance for a color tracker application. Our results show that these algorithms not only succeed in substantially reducing the average memory usage but also improve the overall performance of the application. The results also indicate that the garbage identification algorithms that achieve a low memory footprint perform their garbage identification decisions locally; however, they base these decisions on best-effort global information. The results also indicate that the garbage identification algorithms perform best when they base their decisions on best-effort global information obtained from other components of the distributed application.