Skip to Main Content
Many data stream sources are prone to dramatic spikes in volume, and data items arrive in a bursting fashion. Peak load during a spike can be orders of magnitude higher than typical load, and processing all the arrived data items will exceed memory availability. It becomes necessary to shed load by dropping some fraction of the unprocessed data items during a spike. We consider the problem of load shedding for continuous sliding window join-aggregation queries over data streams when the available system memory may be insufficient to keep the entire query state and model load shedding as insertion of drop operators into query plan. Then a new semantic load shedding strategy is presented. The key idea of the load shedding strategy is to partition the domain of the join attribute into certain sub-domains, and filter out certain input tuples based on their join values by maintaining simple data stream statistics.