A Survey and Taxonomy on Energy-Aware Data Management Strategies in Cloud Environment

During the past ten years, the energy consumption problem in cloud-related environments has attracted substantial attention in research and industrial communities. Researchers have conducted many surveys on energy efﬁciency issues from different perspectives. All of the surveys can be classiﬁed into ﬁve categories: surveys on the energy efﬁciency of the whole cloud related system, surveys on the energy efﬁciency of a certain level or component of the cloud, surveys on all of the energy efﬁcient strategies, surveys on a certain energy efﬁciency techniques, and other energy efﬁciency related surveys. However, to the best of our knowledge, surveys on energy-aware data management strategies in cloud-related environment are absent. In this paper, we conduct a comprehensive survey on energy saving-aware data management strategies in cloud-related environments, such as data classiﬁcation, data placement and data replication strategies. Compared to current existing reviews on energy efﬁciency in cloud-related environments, we ﬁrstly conduct the survey on the energy consumption problem from the data management perspective. Furthermore, we classify the energy-aware data management strategies from different perspectives. This survey and the taxonomy of the energy-aware data management strategies demonstrate the potential for reducing the energy consumption at the data management level of a cloud storage system, which will compress more space for energy reduction and ﬁnally achieve energy proportionality. Moreover, this survey and taxonomy on the energy efﬁciency issue from the data management perspective is an important supplement to current existing surveys on energy efﬁciency in cloud-related environments.


I. INTRODUCTION
Energy consumption is one of the most serious problems in cloud-related environments, especially in datacenter part. Statistics indicate that the data volume is increasing at a rate of 50% every year, and the total of data volume is predicted to reach 40ZB in 2020 [1] (as shown in Figure 1). The exponential growth of the data volume leads to an increasingly more serious energy consumption problem. It has been reported by The associate editor coordinating the review of this manuscript and approving it for publication was Honghao Gao . literature [2] that the energy consumed by data centers will be more than 1000TWh in 2013-2025, which will surpass the total energy consumption of Japan and Germany. In addition, the energy consumption of the data centers and their cooling equipments will reach 5% of the total energy consumption of the world. Furthermore, the increasing energy consumption will produce high carbon and GHG (Greenhouse Gases) emissions [3], which will result in serious environmental pollution.
As shown in Figure 2, a large number of studies have focused on energy efficiency in cloud-related environments, which has made remarkable achievements with respect to energy consumption. However, the growing energy consumption is still an emergent issue in the cloud computing field. Many scholars have tried to search the new space to reduce the energy consumption by conducting surveys on energy consumption from different perspectives. In article [117], we classified the current existing surveys on energy consumption in cloud-related environment into five categories, which are, surveys on the energy efficiency of the whole cloud related system, surveys on energy efficiency of the certain level or component of the cloud, surveys on all of energy efficient strategies, surveys on a certain energy efficiency techniques, and other energy efficiency related survey. According to our observations, there is no article conducting the surveys and taxonomy of energy-aware data management strategies. Therefore, the survey and taxonomy of energy-aware data management strategies in cloud-related environments are conducted in this paper, which is an important supplement for the current energy efficiency relevant surveys. Furthermore, the survey and taxonomy on improving energy efficiency utilizing the data management strategies, such as data classification, data placement and data replication, can offer more opportunities to reduce the energy consumption in cloudrelated environments, especially for data-intensive related applications in the scale-growing data centers.

II. OUR PREVIOUS WORK AND FOCUS OF THIS SURVEY
As mentioned before, there are many related surveys on the energy efficiency in cloud-related environments over past the five years. We conducted our research from the survey perspective [117]: all of the surveys are classified into five categories: surveys on the energy efficiency of the whole cloud related system [40]- [50], surveys on the energy efficiency of the certain level or component of the cloud [39], [51]- [59], surveys on all energy efficient strategies [60]- [64], surveys on a certain energy efficiency technique [65]- [76], and other energy efficiency related surveys [77]- [80]. In addition, the different categories of surveys are summarized from five aspects, which include the title, survey focus, perspective, target system and publication years. From the summary and statistics on the existing surveys on the energy efficiency of cloud related environments, we have four observations: (1) energy efficiency surveys are related to many aspects of cloud related systems; (2) surveys on certain energy efficient strategies in cloud-related systems have aroused most concern; (3) the popularity of the research on energy efficiency issue on-going, and (4) surveys on energy-aware data management strategies in cloud related systems are absent.
Based on our previous observations, we focus our survey on the energy-aware data management strategies in this paper. The data management strategies in cloud related environments, usually include data classification strategies, data placement algorithms and data replication techniques. Therefore, we review the energy-aware data management strategies from the data classification, data placement and data replication perspectives, which aim to provide a new perspective and more space for improving the energy efficiency in cloudrelated systems.
Our goal is to discover ways to optimize energy efficiency in cloud-related environments. To discover more opportunities to reduce the energy consumption in data centers, we focus on the lack of surveys on energy-aware data management strategies in cloud related environments. As a whole, the main contributions in this paper can be summarized as following items.
1) The survey and taxonomy on energy saving strategies through employing data classification strategies, data layout policies, and data replication techniques from different perspectives are conducted in this paper. 2) The taxonomy and summary of the energy-aware data management strategies are done thereafter. 94280 VOLUME 8, 2020 3) Observations are deduced from the statistics on the taxonomy and the summary on energy-aware data strategies. 4) Finally, future directions for reducing energy consumption are comprehensively suggested.
The remaining parts of this paper are organized as follows. Energy saving strategies utilizing data classification, energyaware data layout polices and energy savings strategies utilizing data replication techniques are described in section 3, section 4, and section 5 respectively. The Summary and observations of the energy-aware data management strategies are listed in section 6. We will conclude our paper and set forth our further work in the final section 7.

III. ENERGY SAVINGS STRATEGIES THROUGH DATA CLASSIFICATION
Energy-aware data classification strategies usually divide the storage system into different zones, and then the data are classified into different classes according to a certain rules. Therefore, different types of dataset are stored in their corresponding zones. Energy consumption savings are achieved by managing the power states of the different zones. Firstly, T.Xie proposed a striping-based energy-aware strategy for data placement in RAID storage system [81], in which the storage system is divided into Hot Disk Zone and Cold Disk Zone. The Hot Disk Zone stores the popular data. The Cold Disk Zone stores the unpopular data. Disks in Hot Disk Zone run at high transfer rates and high power consume rates, while disks in Cold Disk Zone run at low transfer rates and low power rates. Analysis and simulation results show that the proposed SEA mechanism can noticeably reduce energy consumption with only a little performance degradation. Only mathematical analysis and simulation experiments have been conducted to evaluate the energy efficiency of the proposed SEA mechanism. According to the analysis of the Yahoo traces, the data access patterns in a Hadoop cluster are significantly heterogeneity. R.T.Kaushik et al design a GreenHDFS mechanism. GreenHDFS classified the data by their temperature, and the Hadoop cluster is divided into multi-zones (Hot Zone and Cold Zone). The data's temperature is changed according to its availability or performance requirements. GreenHDFS mechanism utilizes the data's heterogeneity and employs data classification techniques to place the data into the corresponding zone according to their temperature. The simulation results from three months of real trace from Yahoo, show that only by managing the low power consumption rate of the Cold Zone, energy consumption can be reduced by 26% [82]. In the paper, GreenHDFS also has not been evaluated in real cloud environments, which means that its energy efficiency is not credible enough. Similar energyaware data classification policies for a cloud storage system named Lighting were also designed by Kaushik team [83]. Inspired by the work of Kuashik's team, we have proposed a green data classification strategy based on anticipation named AGDC, in which a neural network is employed to predict the temperature of the data. Based on the predicted data's temperature, the data are classified into cold data, seasonal hot data and hot data. The cloud storage system is also divided into corresponding zones. Simulation experiments based on Gridsim show that the AGDC mechanism lowered energy consumption by 16% at the expense of increasing average response time by 0.005s. AGDC has its advantages while compared to the TDCS integrated general classification algorithm [84]. However, the temperatures predicted by the neural network will impact the energy efficiency of AGDS. In [88], the RACK is divided into an Active-Zone and a Sleep-Zone, and data are stored in the corresponding zone according to the data access regularity and frequency. Simulation results obtained from the MATLAB and Gridmix environment show that the proposed algorithm can saved energy consumption by up to 39.01%. The performance degradation was not analyzed in the paper. In reference [85], an energy-efficient algorithm based on data classification for a cloud storage system is proposed by Z.Tao et al. They divided the cloud storage area into HotZone, ClodZone and Reduplication Zone. The data are stored in the corresponding zone based on the repetition and activity factor characteristics. The experimental results show that the proposed algorithm improves the energy utilization rate by nearly 25%. Furthermore, the algorithm performs well especially when the system load is light. However, three zones may induce frequent data migration, which will result in performance degradation. A dynamic data aggregation algorithm for green cloud computing is proposed in reference [86]. According to the data access pattern, the data and the nodes are aggregately and dynamically stored. By managing the power states of the storage nodes, the energy consumption can be reduced while considering QoS. There is the same problem as in reference [85]. Aiming to reduce the energy consumption in cloud storage systems, Dr. Long designed the static and dynamic file layout, replica and data layout policies [87]. The static file layout strategy (SFLS) first divided the data into hot files and big files according to their access frequency and service time, and the disks were correspondingly divided into different groups. The I/O requests were distributed to the different disk groups according to the access frequency and service time. The results obtained from the Cloudsim simulator demonstrate that the SFLS can save power consumption by over 35% while compared to the default HDFS. Evaluating Energy efficiency evaluated in real cloud environments is also absent in the paper. More recently, Yadav et al designed three adaptive energy-aware algorithms to minimize the energy consumption and to reduce the SLA violations, in which the real workload traces are utilized to validate their feasibility [118]. However, the testing war limited to the simulation stage. Wang et al design a pipsCloud for the remote sensing of big data management and processing [119].
According to the above description, we summarize the taxonomy of the energy-aware data classification strategies from the aspects of the data classification critera, the zones divisions, the experimental datasets, the experimental environment, the energy effectiveness and the publication year, which is shown in Table 1.

IV. ENERGY AWARE DATA LAYOUT POLICIES
In order to carry out a gear-shifting mechanism among the storage systems and to achieve power-proportionality, placing the data in a reasonable way is important. H.Amur designed a robust and flexible power-proportional storage named Rabbit [89]. Rabbit utilizes the equal-work data layout policy, which places the primary replica on the first ten nodes, the second replica is placed on the next ten nodes, and so on. The formulated policy and its implementation in the prototype Rabbit verified its power-proportionality. However, the data placement policy in Rabbit does not consider write requests when nodes are inactive. They evaluated the Rabbit and the PARAID's system performance costs for write access in low gear. The evaluated results showed that PARAID offers better performance when dealing with a frequently updated dataset [90]. Similar to Rabbit, Accordion, a data placement mechanism was proposed in literature [91], uses the elaborated data replication strategies to smooth gears shifting among the nodes. Substantial experiments conducted in the Hadoop DFS show that the Accordion mechanism can improve the powerproportional performance by 20% compared to the Rabbit Mechanism. N.Maheshwari proposed a dynamic energy efficient data placement and cluster reconfiguration algorithm for the MapReduce framework [92], in which nodes are turned on or off is according to the current workload and the extent to which the requirements are satisfied. Data are created or deleted to improve the performance or save power consumption while the nodes are turned on or off. Simulation experiments done on the Gridsim demonstrated the proposed algorithm can save 33% energy consumption under the average workload and up to 54% in the low workloads. And the experiments in real cloud environments are also absent. A semantic data placement algorithm designed for archival-by-accident workloads is described in reference [93]. They divided the data into access groups according to the semantic or incidental labels, including the file system placement, timestamps, the authors in a LaTex document and file type etc. The grouped data ensure that the fast, consecutive accesses to the same group do not need an extra disk to spin-up, which can achieve power savings. Experiments from the California Department of Water Resources show that a 30% hit rate can result in at least 12% power savings. Similarly, R. Reddy et al designed a data layout for power TABLE 1. Taxonomy of the energy-aware data classification strategies. efficient archival storage system, in which an access-aware intelligent data layout mechanism is provided [94]. A twotier architecture that consist of online and offline disks is designed to store the archival data in the spin-down disks. The result obtained from experiments with real-world archival traces showed that the optimized data layout algorithm can achieve power savings up to 78% compared to the random data placement policy. Moreover, a Semi-RAID data layout policy based on a sequential data access pattern is proposed by X.Li et al [95], in which the grouping strategy is employed. The grouping strategy leaves only part of the whole array active and let the rest of the array in standby status. The analysis and experiments show that as to the typical video surveillance application, the proposed group strategy can achieve power consumption saving up to 28%. In recent years, energy-aware data placement algorithms have aroused the attention of Chinese scholars. In 2013, Y.W.Xiao et al integrated the data placement policy with the nodes scheduling strategy for energy savings [96]. A heuristic data placement policy and two node scheduling algorithms, which use the greedy algorithm to discover the plan to turn on minimum nodes to cover the maximum data block were proposed. Simulation experiments conducted on Cloudsim showed that the proposed algorithm can save energy consumption under the constrained budget QoS requirement. Aiming at a heterogeneous Hadoop cluster, a snakelike data placement mechanism (SLDP) is proposed in literatures [97], [98]. SLDP first divide the storage nodes into virtual storage tiers (VST), and then circuitously place the data into the VST based on data's hotness, which can achieve the effective power control on the nodes according to their hotness to achieve energy savings. The experimental results from two real data-intensive applications demonstrate that the SLDP is energy efficient, saves spaces, and favors in heterogeneous Hadoop environment. An energy consumption optimization aware data placement layout is proposed by J.Song [99], in which the data are distributed to the nodes according to their processing abilities. The analysis and experiments performed on a modified Hadoop (LocalHadoop and Neo-Hadoop) environment demonstrated that the proposed data placement policy has a great advantage over the uniform Hash algorithm and has a narrow advantage over the uniform VOLUME 8, 2020 Hash algorithm with stronger fairness. Dynamic adaptability to PowerCass is designed in literature [100], in which nodes are divided into three groups, they are, active, dormant and sleepy. Those three groups aim to respond to high, medium and low workloads respectively. In addition, the data are dynamically distributed among the groups according to the workload situations. Experiments conducted on the Apache Cassandra demonstrate that the energy savings can reach 66% when compared to the unmodified Cassandra. More recently, Song et al. designed a Modulo based on data placement algorithm to optimize the energy consumption in Mapreduce system [120]. In order to reduce the wasted energy, Modulo places data with the goals of ''fairness of size'', ''fairness of range'', and ''best adaptability'', which are energy efficient without introducing additional costs and delaying data loading. However, the three algorithms are designed for MapReduce related applications, which mean that they can be adapted to the other kinds of applications. Due to the new resource management and allocation framework (YARN) in the HDFS system, the default data layout schemes are not energy efficient, literature [121] proposed a new data layout scheme, which exploits the heterogeneity of the computing resource characteristics. Servers are sorted by three sets (termed the high-performance set, the energy-efficient set and the inefficient set). Data blocks are placed in the highperformance set and energy-efficient set, and the replicas are placed in the energy-inefficient set. A comparison of experimental results shows that the new data layout scheme can significantly reduce the energy consumption at the slightly higher mean response time of the jobs. The new data layout scheme is also designed for the MapReduce framework related system, which also has no scalability.
As a whole, the taxonomy of the energy-aware data placement policies is summarized in Table 2.

V. ENERGY SAVINGS STRATEGIES UTILIZING DATA REPLICATION TECHNIQUE
Generally, a replication technique is utilized to assure the data availability, accelerate the data access speed, improve the workload balance and enhance the system performance. Recently, data replication techniques were also utilized to reduce the energy consumption in Cloud related environments. W.Lang et al first utilized the Chained Declustering (CD) replication strategy to conduct energy management in reference [101]. In this strategy, the replicas are placed in the CD ring, which enable the system to turn off some nodes and ensure data accessibility in light load conditions. Furthermore, they discuss and resolve the problem of the load balance among the remaining active nodes. Experiments conducted on a constructed system with 1000 nodes verify the energy efficiency of the proposed method. Jacob Leverich and Kozyrakis at Stanford University take advantage of the existing replicas on the Hadoop System, design a covering subset, which contains sufficient nodes, to ensure the data availability. Then the uncovered nodes can be set to inactive status to save energy consumption during low server utilization. Experimental results that were obtained from the Hadoop cluster show that the fractional configuration among the nodes can save energy consumption by 9% to 50%, at the expense of slight system performance degradation. Replication as a tool to save energy consumption in RAID systems is investigated in literature [102]. They proposed a novel approach named iRGS, which employs the replication strategy to allow gradually gear shifting among the disks according to the varied workload. The applied replication strategy set super RAID Groups (RDGs) and ordinary RDGs to replicate data from each other at different rates to ensure that in every gear, the energy consumption and system performance can be traded off while assuring the data availability and users' requirements. A practical powerproportionality in the data center storage named Sierra is proposed by E. Thereska et al [103]. The goal of the layout of the replicas in Sierra is maintaining g available copies using only g/r servers, where r is the total number of replicas. The replicas are placed using power-aware grouping pattern, which relaxes the Naïve grouping constraint to some degree to achieve energy consumption savings, but a reasonable tradeoff between energy savings, rebuild parallelism and load balancing is introduced. A Distributed virtual log (DVL) is utilized to record the updates to the replicas that are powered down or failed, which ensures the write consistency. To evaluate the proposed Sierra, a full prototype is implemented using live traces from Hotmail servers. The experimental results demonstrated that the 23% of the energy consumption can be saved. An energy efficient replication mechanism based on node's addresses was proposed by Y.Y. Liu, in which the nodes are sequentially addressed by their racks and places. And the replicas in the nodes among the racks are not random but sequentially addressed in the same rack. The data replicas are placed in the nodes with the least number of replicas until it is full. Therefore, the access will be skewed in the subset of the nodes in the cloud storage system when the load is light. Powering down the nodes can achieve the energy consumption savings. In addition, the experiments that are conducted on the constructed Hadoop cluster verified its energy efficiency. By leveraging data access behavior, a power-aware data replication strategy is proposed in article [105]. According to the 80/20 rule (80% of the data access is often served by 20% of the data), they replicate small amounts of data that are frequently accessed in the hot nodes and set the nodes as always in the active state. The remaining 80% of data are placed in the code nodes that are in low power state. An access trace generated by the Zipf-distribution data access pattern, and the experiments performed on the simulator consisting of 16 data storage nodes and 1 metadata server demonstrate that the designed replication strategy is energy efficient. A replica management mechanism using the replication factor as one of the three phases for energy savings was designed by S.Q.Long et al [107]. Because the number of replicas influences the energy consumption, more replicas usually means more that storage space is occupied and more energy is consumed. The replica management strategy aimed to minimize number of replicas. When the number of replicas in the cloud storage system is more than the minimum number, the replica deletion policy is carried out according to the throughput of the data nodes of the replicas. Experimental results from conducted on Cloudsim simulator show that the proposed algorithm has advantage over the existing schemes in energy consumption. They also proposed a replication management strategy to optimize VOLUME 8, 2020 the multi-objectives (containing of the energy consumption index) of cloud storage cluster named MORM. The replication factor and layout are improved by the artificial immune algorithms. The proposed MORM mechanism try to find out the near optimal solutions to balance the trade-offs among the file availability, load variance, mean service time, access latency and energy consumption. Substantive experiments done on the constructed Hadoop cluster showed that the proposed MORM mechanism outperforms the default replication management mechanism in Hadoop in terms of the load balance and response time. X.L.Cui et al dealt with the energy and fault-tolerance problems using shadow replication [109], in which the main process is associated with a suite of shadow processes. In addition, a profit-based optimization model is utilized to determine the optimal speed of the task in order to reduce energy consumption while maximizing profits. Experiments on the self-developed evaluation framework with three different benchmarks verified the energy efficiency of the shadow replication. According to the users' visiting characteristic, a dynamic energy-aware replication management strategy was investigated by Z.Y. Wang et al [110]. The main idea of the proposed strategy is transforming the users' access characteristics so that they can be used to compute the access hotness in a Block. According to the integrated hotness, when the hotness of DataNode n in a Hadoop cluster is under a certain threshold, it is put to sleep. Then, the Data in the sleepy DataNode is temporarily replicated on the emergent DataNode. Benefitting from the considerable number of sleepy nodes while the system workload is light, energy consumption savings can be achieved. Energy efficiency and network consumption are combined in the replication mechanism in [111]. In this research, the data are replicated closer to the data consumers, which may be a promising solution to minimize the bandwidth usage and network delays and save the energy consumption. The RM (Replication Management) model that is located in Datacenter DB computes the update and access rates in previous intervals and predicts the future values. Then the replication decision is done based on the predicted values, which aims to save energy and network consumption. Experiments that were conducted on the self-developed GreenCloud Simulator demonstrate its effectiveness with respect to energy consumption and network usage. Based on the consistent hashing distribution, the power-proportional replication mechanism named GreenCHT is proposed in [12]. In this approach, the replicas are organized in virtual tiers. The first replica of the object is placed in tier 0, and the second replica is placed in tier 1, the third replica is placed in tier 2, and so on. A powermode predictive model is employed to predict the load of the next period, which can determine the power mode state and determine which tier should be active to handle the load and which tier should be powered down to achieve the energy consumption savings. In Addition, the log-replicas are designed to address writing consistency problem. A trace driven by twelve real enterprise data center workloads were collected from Microsoft Cambridge servers. The prototype that was implemented in the Sleepdog demonstrates that the powerproportional replication mechanism can reduce the energy consumption up to 31%-60% under the different workload at the expense of a 4-5 ms higher response time. However, the replication management mechanism in GreenCHT does not consider the heterogeneity of the object, and all of the objects have the same number of replicas. In actual applications, the files in the cloud storage system are heterogeneous with respect to certain properties, especially with respect to the files' popularity. Hot files need more replicas to serve the requests, and cold files need fewer replicas to save energy and storage space. According to the heterogeneity of the files, an energy-aware adaptive file replication mechanism for data intensive systems named EAFR is designed in [113], in which the number of the replicas of the data is decided by its hotness. The replica selection is also according to the heterogeneity of the server. The server with more capacity is selected first. Moreover, the hot files stored on the hot servers run at high power rates to achieve quick response, and the cold data stored on the cold servers run at low power rates to save energy consumption. Experiments using the trace that were conducted on the Palmetto Cluster of Clemson University's demonstrate that more than 150kWh per day energy consumption can be saved in a cluster consisting of 300 servers. The approach based and inspired by the energy-efficient replicas placement strategy named Superset was proposed by X.Y.Luo in her PhD thesis [106]. Z.L. Shi designed an energy-aware replica management strategy, which includes the replica factor decision, the replica selection strategy and the replica placement algorithm [113]. A file's hotness is computed according to the life cycle characteristics and the access rate. In addition, the number of replicas is determined by the hotness of the file. Different degrees of file hotness cause the different numbers of replicas. Then, the file sets and node sets are divided according to the files' hotness. Files are organized using super sets, which can assure that different numbers of the replicas are stored in the different node sets. The replica placement strategy provides support to the power-proportional gear-shifting mechanism while guaranteeing the file's availability. Experiments conducted on the Cloudsim simulator showed that the proposed management strategy can lower energy consumption by up to 16% over the non-power-aware replica strategy. In 2015, D. Kliazovich et al briefly analyze an energy-aware replication management strategy [114]. In 2016, a comprehensive survey on the data replication techniques in cloud storage systems is discussed in [115], and it points out that energyaware replication is one of the future directions of replication strategies. Recently, article [122] formulated the replication problem as an optimization problem, and it used a hybrid metaheuristic algorithm that combined the global search capability of the Particle Swarm Optimization (PSO) algorithm and the local search capability of the Tabu Search (TS) to achieve high-quality solutions. Simulation experiments conducted in MATLAB indicated that the proposed method outperforms other optimization algorithms in terms of energy consumption and costs. However, experiments on the real cloud environment are lacking, which lowers the confidence in the proposed. An energy-aware and adaptive fog storage mechanism that uses spatio-temporal content popularity was designed in literature [125]. In the mechanism, the factors of user data demand, energy consumption and node distance are considered to determine to replicate data to the node. Testbed results verified the energy efficiency and adaptability of this approach However, the storage, processing and network features have not been considered in the current work. There are some related work can be used as references [124]- [134], which are not described in our body text.
Taxonomy of the energy savings strategies utilizing data replication technique is summarized in Table 3.

VI. SUMMARY AND OBSERVATIONS OF THE ENERGY-AWARE DATA MANAGEMENT STRATEGIES
The above surveys on energy-aware data management strategies demonstrate that it is possible to reduce energy consumption through elaborately designed data management strategies. Furthermore, the energy-aware data management strategies provide more space and opportunities to reduce energy consumption in the cloud related environment, which is important for supplementing other energy saving techniques, such as resource allocation, workload consolidation, VM scheduling, VM migration, VM consolidation and workload characterization. According to the above investigation, the energy-aware data management strategies usually include three categories, which are data classification, data layout and data replication techniques. In addition, we have the following observations. Observation 1: Data replication is the main technique of the energy-aware data management strategies To our best knowledge, all of the published articles about energy-aware data management strategies are usually fall into three categories, data classification, data layout and data replication techniques. The numbers of the three categories are shown in the Figure 3.
As shown in Figure 3, among the energy-aware data management strategies, the proportion of data classification mechanisms 27%, the data layout polices is 29%, and the data replication techniques is 44%. Obviously, most of the studies are conducted on the energy-aware data replication techniques, which imply that data replication is the most important technique in energy-aware data management field, especially for solving power-proportionality problems.
Observation 2: Energy-aware data management strategies have been comprehensively and thoroughly investigated Since the concept of the cloud originated in 2008, energyaware data management strategies have also generated aroused concern. Scholars have continued investigating this field until it reach the peak in 2015. We searched every possible keyword about energy and data management to find the related published papers. Unfortunately, there are no published papers on the topic in two nearest years 2016 and 2017. Recently, there are 5 papers published in 2018, and  1 paper published in 2019. Accordingly, we speculate that the research on energy consumption utilizing data management strategies has matured, and the related methods have been comprehensively and thoroughly investigated. Furthermore, the detailed numbers of papers that have been published during the past ten years in Figure 4 verified this speculation. Innovative research methods should to be excavated in the future.
Observation 3: Workload that are adopted while evaluating the methods usually fall into four categories With respect to evaluating the energy efficiency of data management strategies, the workloads that are adopted usually fall into four categories, which are synthetic workloads according to the access pattern, and workloads that are generated based on the real I/O traces, real life I/O traces, and benchmarks. The numbers in the four categories are exhibited in Figure 5. Among these categories, synthetic workloads and workloads based on benchmarks are utilized more often than the other two categories Observation 4: Experimental environments usually fall into four categories In order to evaluate the energy efficiency of the proposed data management strategies, the experimental environments that are utilized usually fall into four categories, they are prototype system, simulator or simulator extension, VOLUME 8, 2020    self-constructed cloud environment and actual business cloud environment. The numbers of papers utilized in the three categories of involving experimental environments are provided in Figure 6.
As shown in Figure 6, the most common environments that are employed in evaluation experiments are simulators and self-constructed cloud environments. The use of self-designed prototype systems is limited due to f the mass of developing work that are required for these systems. On the other hand, utilizing famous actual business cloud environments is a very small part due to the mass of debugging work.

VII. CONCLUSION AND FUTURE DIRECTIONS
Based on our previous work and observations on the survey and taxonomy of the current existing surveys on energy efficiency strategies in cloud related environments, we focus this survey on energy-aware data management strategies. Energy-aware data management strategies are classified into three categories, which are data classification strategies, data layout polices and data replication techniques. For every data management category, we investigate its main ideas, test data, experimental environments and published years respectively. Finally, the observations during the reviewing of energy-aware data management strategies are presented.
There are three research directions for energy saving techniques in our future work.

Future direction 1: Reducing energy consumption through multi-level combination
As energy consumption savings strategies in cloud related environments has been attracted comprehensive investigations over the past ten years, there is little space to reduce energy consumption at single level of cloud related systems. Furthermore, the respective energy efficient strategies may conflict with each other when integrating them into real cloud systems, which make the designed energy consumption strategies inefficient. Therefore, how to combine the energy efficiency strategies in every level and to form a holistic energy efficient framework is one of the future directions.
Future direction 2: Combine data classification, data layout and data replication techniques to pursue further energy saving e VOLUME 8, 2020 Based on our investigation on the energy-aware data management strategies in cloud related systems, the one-fold data management strategy, such as only utilizing data classification, data layout or data replication technique, has been thoroughly studied, and effective energy savings have been achieved. Combining different data management strategies into an integrated energy efficient framework may achieve more energy saving space, which is one of the future research directions in the energy efficiency field.
Future direction 3: Combine energy-aware data management strategies with other traditional energy efficient techniques.
Data management is an important supplementary technique for energy efficiency in cloud related environments, since energy-aware data management and traditional energy efficient strategies have been thoroughly investigated. Combining data layout policies and data replication with energyaware scheduling algorithms, DVFS techniques and energy gear-shifting mechanism, may be one of the future directions for further improving energy efficiency in cloud related systems.