Offline and Real-Time Deadline-Aware Scheduling and Resource Allocation Algorithms Favoring Big Data Transmission Over Cognitive CRANs

Big data is generated from various sources, such as the Internet of things, social media, databases, wearables, smart cars, and so on, and is characterized by five V’s: volume, value, variety, velocity, and veracity. Transmitting big data to secondary users (SUs) over a cognitive cloud radio access network (CRAN) offers multiple benefits and critical challenges. To address these limitations, we have designed two deadline-aware, non-preemptive algorithms that maximize the sum of weighted data transferred by the network over admission, time scheduling, spectrum, and remote radio head (RRH) allocation decisions. Each data request can have a different size, target bit error rate (BER), minimum signal-to-noise ratio (SNR) requirement, and deadline, incorporating the simultaneous provision of various types of big data and ordinary data jointly. Furthermore, our formulation considers all five V’s of big data. The first algorithm we propose is an offline batch scheduling (OFB) algorithm, which assumes that all data requests are available at the time of optimization. While this sub-optimal algorithm has a lower complexity and can be implemented in larger networks than the global optimum algorithm, it is not practical for real-time applications since it requires collecting all data requests beforehand for joint scheduling. Thus, our second one is a sub-optimal online real-time scheduling (ONR) algorithm that performs admission and resource allocation on-the-fly using predictions of upcoming data requests and future availability of spectrum channels. After deriving these two algorithms, we conduct a thorough performance analysis and derive bounds on their objective values compared to the global optimum. We then demonstrate their effectiveness in achieving higher weighted sums of transferred data and prioritizing SUs with big data requests over existing alternatives through extensive numerical comparisons.


I. INTRODUCTION
We are in the big data era. In the past decade, immense amount of new data generated by the proliferation of smart mobile phones, the internet of things, wireless smart meters and cloud computing has led to wireless big data [1], [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Quansheng Guan .
Data generation rates are neither decreasing nor stable [3], and on the contrary, it is expected that wireless networks face significant growth in wireless big data due to future emerging services such as the internet of everything (IoE) and holographic telepresence. According to the International Telecommunication Union Radio (ITU-R) [4], total mobile data traffic is expected to experience 77-fold growth in ten years such that it increases from 57 exabytes (10 18 bytes) per month in 2020 to 4394 exabytes per month in 2030. The accuracy of this estimation is confirmed in [5] where it reports that the total mobile traffic has reached 59 exabytes per month at the end of 2020. In fact, Quarter 3 of 2021 has witnessed a data generation rate of 80 exabytes per month. It is predicted that each subscriber will demand and/or generate almost 257.1 gigabytes of data traffic per month by 2030 [4].
Big data refers to large, complex datasets that are difficult to process and analyze using traditional methods. Some of the main categories of big data include: (i) High-resolution audio and video streaming (ii) Data generated by social networking websites such as Instagram, Facebook, Twitter, and Flickr, (iii) Mobile TV, (iv) Real-time gaming and control, (v) High-speed downloading, (vi) Online remote monitoring.
These categories are expected to continue growing in the coming years, and they are characterized by five main features, often referred to as the ''5 V's of big data'' [6], [7]: 1) Volume: Big data sets are typically massive, ranging from hundreds of gigabytes to petabytes in size. 2) Velocity: Big data must be transmitted quickly to meet the time-sensitive needs of various applications. 3) Variety: Big data comes in many different forms, from structured data in databases to unstructured data in social media feeds. 4) Value: Big data has significant priority and usefulness with the potential to create value for businesses and organizations, but it must be properly analyzed and interpreted. 5) Veracity: Big data quality can be compromised by errors, inconsistencies, or biases, so it's important to ensure data accuracy and reliability. A fundamental challenge is to support these big data characteristics in future wireless networks. In fact, they impose tremendous technical burdens on designing efficient networks [8]. Traditional networks are inadequate for dealing with big data. The traditional cellular network, also known as Radio Access Networks (RAN), consists of numerous standalone base stations (BSs). Each BS covers a limited geographical area, and multiple BSs work together to provide seamless network coverage. Each BS is responsible for processing and transmitting its own signal to and from the mobile device, and forwarding data to and from the mobile device to the core network through the backhaul. However, the current RAN architecture has some drawbacks. Each BS has its own cooling system, backhaul transportation, backup battery, and monitoring system, which can be costly to build and maintain. Moreover, due to limited spectral resources, network operators ''reuse'' the frequency among different base stations, which can lead to interference between neighboring cells and affect network performance.
To address the challenges posed by big data, new technologies such as cloud radio access network (CRAN) [9] offer a flexible and promising infrastructure. The CRAN comprises three main components: a centralized pool of baseband processing units (BBUs), distributed remote radio heads (RRHs), and high-bandwidth, low-latency wired or wireless fronthaul links that connect the BBU pool and RRHs. In contrast to traditional base stations, the BBU is separated from its corresponding RRH, providing an efficient structure for cloud-based resource sharing.
The CRAN architecture has several distinct characteristics that set it apart from other cellular architectures. First, it promotes large-scale centralized deployment by enabling many RRHs to connect to a centralized BBU pool. Second, it supports collaborative radio technologies, allowing any BBU to communicate with any other BBU within the BBU pool with high bandwidth and low latency. Finally, it provides real-time virtualization capability, which ensures that resources in the pool can be dynamically allocated to base station software stacks, such as 4G/3G/2G function modules from different vendors, based on network load. This paper tackles the significant challenges of downlink big data transmission for unlicensed or secondary users (SUs) in cognitive CRANs. The aim is to maximize the total sum weighted transferred data while taking into account the five V characteristics of big data. To achieve this, the paper simultaneously optimizes SU selection, the association of remote radio heads (RRHs) with selected SUs, allocation of temporarily available spectrum, deadline-aware nonpreemptive time scheduling, and adaptive modulation to account for time-varying channels between each SU and the connected RRHs. This is a complex and challenging problem, involving a high-dimensional mixed continuous and integer program of highly non-convex nature.
Before summarizing the contributions of this paper, we review current prior art on this topic.

A. RELATED WORKS
Given our design focus, we classify the relevant literature into four categories: Big data transmission, RRH and spectrum allocation, user selection, and time scheduling.

1) BIG DATA TRANSMISSION
Reference [10] focuses on utilizing big data for machine learning applications that require large amounts of data. To reduce the transmission of wasteful data that does not significantly impact the learning algorithm's performance, they combine edge and cloud computing. This approach involves caching selected data content on various RRHs and BBU pools, which is determined based on predictions of the demanded data's content.
In [11], the big data transmission problem in a wireless network is addressed, taking into account link capacity constraints, current loads of links, requested data sizes, and network delay limits. The goal is to optimize service/waiting time and throughput of the network. To achieve this, a new centralized algorithm is designed to carry out routing and scheduling simultaneously.
In multimedia big data wireless services, meeting deterministic constraints on service delay is challenging, especially when bandwidth and transmit power are constrained.
To tackle this issue, reference [12] substitutes the deterministic delay constraint with a statistical one for software-defined radios over 5G networks. They solve the optimization problem over routing, cache placement, and power allocation decisions and demonstrate that three techniques should be jointly utilized. Specifically, (i) network function virtualization is exploited to find optimal data transmission paths, (ii) information-centric network concept derives optimal caching locations for big data, and (iii) software-defined networks (SDN) help allocate resources dynamically.
Overall, these references propose innovative approaches to address the challenges of big data transmission in wireless networks. By utilizing edge and cloud computing, designing centralized algorithms for routing and scheduling, and leveraging techniques such as network function virtualization, information-centric networking, and SDN, these approaches aim to optimize performance and throughput while reducing wasteful data transmission and meeting service delay constraints.
A variety of techniques have been proposed to address the challenges associated with transmitting big data wirelessly. For instance, Terahertz transmission has been suggested in [13] as a way to communicate big data between autonomous vehicles, thereby increasing network capacity due to its tremendous bandwidth. In [14], the authors study multiple parts of a wireless network infrastructure to efficiently transmit geographically distributed big data to data centers, including servers inside a data center, different data centers, backbone, and access networks.
Big data transmission has also been investigated under different wireless network architectures such as CRANs, SDNs, 5G, wireless sensor networks, D2D communication, and 6G integrated space-air-ground networks, as discussed in [2], [15], [16], [17], [18], [19], and [20], respectively. Reference [21] introduces a cooperative cache-based strategy on ground stations to reduce the load on satellite links and their latency. To ensure the confidentiality of big data transmission while sharing tasks between graphic processing units across various ground stations, compression techniques were adopted.
Moreover, transfer control protocol (TCP) with simultaneous data transmission in multiple paths is introduced as a promising transport layer protocol for big data applications in [22]. This approach offers improved reliability and throughput over traditional TCP, making it a suitable candidate for large-scale big-data transmission.
Reference [23] treats video traffic as the dominating real-time big data application, and designs a new scheduling policy for packet transmission such that more users are simultaneously served without degrading current users' experiences. This algorithm offers a guaranteed improvement in the total number of served users. This achievement is a result of the proper assignment of big data requests and the corresponding bandwidth on each server on a small time scale. The problem of deadline-aware bandwidth allocation is investigated in a wired setup by [24], where both an offline batch scheduling algorithm and online dynamic scheduling were derived to ensure acceptance of a maximum number of big data requests. Upon solving the posed optimization problem, admission and scheduling decisions, data rates, and path selection for every admitted request are determined. The allocated bandwidth may be varied in an adaptive fashion at any time during big data transmission. Contrary to [24], we consider a scenario where both big data and non-big data requests arrive simultaneously and we aim to assign a larger priority to big data requests. Furthermore, our model is wireless instead of wired. Finally, we strive to maximize the weighted sum of transferred data instead of the number of served users.

2) RRH AND SPECTRUM ALLOCATION
Reference [25] jointly assigns RRHs and allocates virtual machines (VMs) to minimize the total delay including task execution time on BBU pool and transmission delay to the corresponding RRH cluster over programmable hierarchical CRANs. Energy consumption for CRANs is minimized in [26], [27], [28], and [29] where RRH selection is considered. The BBU pool performs joint RRH selection, RRH-user association, transmit beamforming, and VM allocation in [26] over CRANs with limited fronthaul capacity. A new model of energy usage for the BBU pool is derived by using collected empirical data from a programmable CRAN testbed in [27]. Upon model fixation, power-bandwidth assignment and active VMs selection are carried out. The goal of the power-bandwidth assignment is to meet the quality of service (QoS) for users, while VM assignment is performed to minimize energy usage. Heuristic green energy-aware RRHs selection algorithm is derived for coordinated multi-point (CoMP) communication over CRANs in [28]. In [30], a RRH clustering algorithm is proposed to jointly perform load balancing and maximize coverage range in the CRAN. RRH clusters are formed by mapping as large a number of RRHs as possible with different traffic to each BBU while minimizing the number of active BBUs. Furthermore, the optimal spectrum assignment problem is solved in each cluster by a genetic algorithm to maximize communicated traffic load under overall energy consumption. The RRH-BBU mapping is also studied in [31] and [32]. A traffic anticipation model is leveraged to assign every BBU with certain RRHs in [31]. In addition to this offline approach, a real-time BBU-RRH mapping is also derived to provide load balancing while maintaining QoS upon the arrival of every data request. Joint user association and RRH-BBU mapping subject to QoS constraints are carried out in [32]. Orthogonal frequencydivision multiple access (OFDMA) based CRAN is used for downlink data transmission in [33], where the weighted sum rate is maximized in two successive steps. First, RRHs, spectrum, and users are allocated given a fixed transmission power. Then, transmit power is optimized for the given spectrum, RRHs, and users. Reference [34] enhances sum capacity by jointly assigning time-frequency resources and RRHs where RRH cooperation, i.e., CoMP, is assumed over CRAN. Spectrum trading between network and service providers in a virtual CRAN is investigated by [35]. Virtual CRAN is comprised of a set of separate RRH-BBUs but one assumes that the BBUs are integrated into one BBU pool. Full duplex CRANs are looked at by [36] and [37], where RRH selection is carried out.

3) USER SELECTION
The concept of user selection in wireless communications involves selecting the users with the best channel quality at any given time to allocate system resources to those who can best exploit them. This approach leads to improved system capacity and performance. While this concept has been around for a long time, recently machine learning has been deployed to reach it. For example, in [38], power allocation using deep unsupervised learning is performed first, followed by user selection.
In addition to this, several studies have been conducted on user selection in CRANs. One such study [39] focuses on maximizing the weighted sum rate in CRAN by jointly selecting users and their corresponding beamforming vectors. To achieve this, the study finds the maximal independent sets in the user selection graph while optimizing the beamforming vectors for every possible user to multi-antenna RRH assignments. Similarly, user selection has also been performed in conjunction with RRH and spectrum allocation [33]. Additionally, another study [40] performs user selection to minimize network power consumption in full duplex CRANs while meeting QoS requirements.

4) TIME SCHEDULING
Reference [41] performs time and power allocation when users' requests arrive in real-time and must be served within a specific deadline and signal-to-interference plus noise ratio (SINR). This work strives to maximize power efficiency while minimizing per-processor power consumption. Thus, it formulates a maximization problem with a weighted sum of power efficiency and processors power consumption. Optimization parameters are power allocation and processor scheduling in CRANs. A maximum transmission time minimization with constraints on spectrum and power resources and tolerable delay is studied in [42] where VMs are optimally allocated. A real-time BBU and RRH assignment is considered in [43]. The backhaul design problem of the CRAN is formulated in [44] to maximize a weighted sum of energy and spectral efficiency by a joint allocation of power and time slots for RRHs.

B. OUR CONTRIBUTION
Our proposed approach addresses several challenges in wireless big data transmission that have not been jointly investigated in existing literature. Specifically, we optimize jointly over SU selection, RRHs association to selected SUs, allocation of temporarily available spectrum, deadline-aware non-preemptive time scheduling, and adaptive modulation to maximize the weighted sum of transferred data. In addition, we take into account the 5 V features of wireless big data in our optimization problem. To address the Volume feature, we include data size in the objective function. To address the Value feature, we assign different priorities to each data request. To address the Velocity, Variety, and Value features, we consider different hard deadlines for the completion of data delivery to various users. Finally, to address the Veracity and Variety features, we use minimum signal-to-noise ratio (SNR) and a target bit error rate. By considering these factors in our optimization problem, we can better allocate resources and improve the efficiency of wireless big data transmission.
Wireless big data transmission requires a significant amount of bandwidth, which makes unlicensed spectrum allocation particularly challenging. The spectrum crunch is caused by both primary user activity and spectrum scarcity. To address this issue, we propose an offline, non-preemptive scheduling algorithm that assumes all requests are collected by the BBU pool and then jointly scheduled. However, real-time user admission and online resource allocation are also needed. Therefore, we leverage predictions of possible upcoming data requests and spectrum availability to make decisions on the fly. For example, by analyzing data request history, we can reserve resources for SUs that have a higher impact on the objective function. In addition, we constrain the maximum number of SUs that an RRH can serve based on the energy supply of each RRH. Finally, we use adaptive modulation to adjust the transmission rate due to the variable nature of RRH-SU links. To summarize, we face several challenges in wireless big data transmission, and we propose remedies for each of these challenges, which are summarized in Table 1.
Before delving into our proposed algorithms, it is important to highlight the main differences between our work and the algorithm proposed in [45]. While our system model and optimization problem remain the same, our contribution lies in the development of new algorithms. Reference [45] presents a dynamic programming algorithm that achieves global optimality but suffers from high complexity, making it suitable only for small networks. Additionally, it is an offline algorithm that requires all upcoming data requests to be collected before scheduling, causing unacceptable delays for real-time applications.
In contrast, we present two new algorithms: an offline sub-optimal algorithm with lower complexity that can handle larger networks than [45], and an online algorithm that can schedule new requests as they arrive without any delay. Our main novelty lies in these new algorithms, rather than the system model. However, there are two minor differences in our system model compared to [45]. Firstly, we consider a frequency-selective fading model, whereas [45] assumes frequency-flat fading across all subcarriers. Secondly, our proposed algorithms are non-pre-emptive, meaning they can stop serving a low-priority user midway to serve a higherpriority user, while [45] uses a pre-emptive algorithm that serves each admitted user completely before serving the next.
Regarding the objective function, [45] considers the weighted sum of data transferred divided by the largest service time of admitted users. In this work, we focus solely on the weighted sum of data transferred, as omitting the largest service time from the objective function leads to more favorable, low-complexity, and real-time algorithms. Nevertheless, our objective function still emphasizes serving big data requests in two ways. Firstly, big data requests lead to a large increase in transferred data, which explicitly appears in the objective. Secondly, we can assign higher priority weights to big data requests to ensure they are served first.
Given the points mentioned above, the contributions of our work can be summarized as follows.
• We propose an objective function that prioritizes big data requests while still accommodating ordinary data requests. Our objective function maximizes the weighted sum of transferred data over decisions involving SU selection, SU-RRH associations, channel allocations, and deadline-aware time scheduling, subject to minimum SNR and maximum target bit error rate (BER) constraints. We have incorporated all five characteristics of big data in our optimization problem as follows: 1) Volume: The objective function directly encourages larger volumes of data. 2) Velocity: We have incorporated velocity by considering the deadline parameter of each data request. 3) Variety: We have modeled variety by allowing for different types of data demands with varying BER, SNR requirements, and deadlines. 4) Value: The priority factor assigned to each user in the objective function captures the value aspect of big data.

5) Veracity:
The priority factor, target BER, and deadline for each data request are used to incorporate veracity. To the best of our knowledge, the proposed objective function has not been previously reported in big data literature. Furthermore, the set of parameters we optimize over is novel and not covered in the literature except for [45], whose major differences with current work were elaborated before.
• We present two algorithms to solve the optimization problem in different scenarios. Firstly, assuming all data requests arrive before running the scheduling algorithm, we propose an offline batch method to sub-optimally solve the problem. Secondly, we consider a scenario where real-time decisions are made on admission and resource allocation upon the arrival of every data request. In this case, we propose an online algorithm that takes advantage of probabilistic predictions of upcoming data requests and the availability of channels. Our online algorithm is designed to adapt to any prediction method, regardless of its quality.
• We rigorously analyze the performance of both the batch and real-time algorithms, and derive a bound on their performance compared to the globally optimal solution. We also evaluate the complexities of both algorithms.
To further validate our proposed algorithms, we conduct extensive simulations and compare their performance to existing alternatives using various metrics. Our simulation results demonstrate the superior performance of our proposed algorithms over existing alternatives.

C. PAPER ORGANIZATION
The rest of this paper is organized as follows. Section II introduces the system model. Section III poses the optimization problem. Section IV presents the proposed offline VOLUME 11, 2023 batch (OFB) scheduling algorithm, while online real time (ONR) scheduling algorithm is derived in Section V. Section VI carries out the rigorous analysis of our two proposed algorithms in terms of both performance and complexity. Simulation results are illustrated in Section VII and conclusions are drawn in Section VIII. All the acronyms used in the paper are enlisted in Table 2. Table 3 presents the notations of the following sections.

II. SYSTEM MODEL
Our network is composed of a macro cell, which is overlaid with the cognitive CRAN architecture based on set R of small cell RRHs. Macro and small cells are deployed to serve licensed primary users (PUs) and unlicensed SUs belonging to set U, respectively. This model uses mutually synchronized time slot structure for PUs and SUs, in which time slot t spans the time interval [(t − 1) t, t t). The value of t generally depends on subcarrier spacing, for example in IEEE 802.11 family, t = 9 µseconds [46], and recommended t for 5G is reported in [47]. Time is divided into periods, where each period is comprised of many time slots. It is assumed that data requests at each period are scheduled independently, set aside our online algorithm, and thus the proposed optimization is carried out independently for each period. To serve selected SUs, RRHs are distributed in the service area and connected to the BBU pool via high speed and low latency ideal fronthaul links [48]. The symbol r n,t denotes the maximum number of SUs that RRH r ∈ R can serve in time slot t of period n. The BBU pool has perfect knowledge of path loss and shadowing between every SU and all RRHs. However, it has access to statistics of small scale fading only. It should be mentioned that once scheduling is completed, every RRH estimates the channel to its assigned SUs with almost perfect accuracy in order to carry out the needed precoding. However, only statistical knowledge is utilized for our scheduling algorithms.
The available spectrum in the network is divided into S equal channels, each with bandwidth f . The channel s ∈ {1, . . . , S} is denoted by f s . The unused channels are available for SUs and are arranged in the spectrum pool [49], [50], where set S n,t denotes these available channels in time slot t of period n. To ensure tractability of the problem formulation, we utilize an orthogonal multiple access scheme, where only one SU or PU can use a particular frequency band s belonging to {1, 2, . . . , S} at each time slot. Every SU u may request a different types of data with different QoS requirements. We model the QoS requirements by target bit error rate, BER tar u , minimum satisfactory SNR, γ u , priority of SU, α u , and T n u as the deadline to receive the whole requested data in period n. We suppose SU u requests data with length L u × L, where L u ∈ N presents number of data frames, and L is the standard frame size. Frame size is about 1500 bytes for Ethernet II and IEEE 802.3, or 2304 bytes for WLAN, and may be higher for extended versions [46]. This user can start to receive data from time slot t n u of period n. Subsequently, we describe every user's QoS demand with a 6-tuple: BER tar u , L u × L, t n u , T n u , γ u , α u . Let R n,t u and S n,t u denote allocated RRHs and channels to SU u in time slot t of period n. Also, define R n u := ∪ t R n,t u and S n u = ∪ t S n,t u as allocated resources to SU u in period n. To ensure fairness in resource allocation, maximum number of channels allocated to each SU is limited to s max . Moreover, the set of time slots that u receives service in period n is denoted by T n u := t | R n,t u ̸ = ∅ ∧ S n,t u ̸ = ∅ . T n u may be comprised of several separate time slots due to unavailability of spectrum for unlicensed users in certain time slots.
For simplicity, we assume all RRHs and SUs have a single antenna. Let us denote small-scale fading between RRH r and SU u in frequency band s by h s r,u ∈ C. Furthermore, we represent the combined effects of transmit and receive antenna gains as well as path loss and shadowing by d s r,u ∈ R + . The instantaneous SNR in the receiver of SU u ∈ U when associated with RRH r and channel s, γ s r,u , is given as In (1), P r,u is the transmit power of RRH r to SU u, σ 2 denotes the background noise power spectral density, and is the SNR gap which represents the mismatch between theoretical and practical SNR values for achieving a given information rate [51], [52]. Assuming adaptive modulation and coding (AMC) is utilized for each SU and maximum ratio beamforming is performed by the associated RRHs to each SU, the approximated spectral efficiency for user u at frequency s at period n and time slot t, defined simply by where h denotes the set of all small-scale fading coefficients h s r,u . It should be mentioned that we also use an indicator function I X (x) which returns 1 if x ∈ X is true, and 0 otherwise. Subsequently, the number of bits communicated to user u at time slot t of period n is given by

III. PROBLEM FORMULATION
Our optimization problem simultaneously performs SU admission as well as assignment of RRHs, channels, and time slots to admitted SUs so that their QoS demands are satisfied.
To rigorously define our optimization problem, we need to provide the concept of a disjunctive set of SUs. In period n, set U ⊆ D U is a disjunctive set of SUs if they can be served simultaneously in that single period with the available resources. Thus, any disjunctive set of SUs should satisfy the following constraints for a given set of R n,t u , S n,t u , and T n u : Equation (4a) ensures that the service time slots for SU u all fall in the acceptable integer interval given by [t n u , T n u ]. Constraint (4b) ensures that the allocated resources to SU u is sufficient for communicating all its requested data bits. Equation (4c) enforces orthogonal frequency allocation among SUs, while (4d) guarantees that only PU's unused spectrum bands are allocated to SUs. Constraint (4e) limits the number of channels allocated to SU u by s max . Finally, (4f) ensures that every RRH does not exceed its service capacity. We utilize the symbol ⊆ D to denote a disjunctive subset.
Our optimization goal is to find a disjunctive set of SUs and their corresponding resource allocation and schedules such that sum weighted data transfer is maximized in a given period The QoS for the admitted SUs are guaranteed as we optimize over disjunctive sets only. In period n, the optimal disjunctive set of the selected SUs is denoted by U * n , and corresponding optimal allocated resources are shown by R n u * , S n u * , and T n u * for u * ∈ U * n . When available time/spectrum/RRH resources are sufficient to serve all SUs, the maximum value for the objective function is achieved which equals to u∈U α u L u , i.e., U * n = U. However, resource scarcity introduces a bottleneck. Thus, only a subset of SUs is usually admitted and served. Given that L u appears in the objective in (5), big data requests are favored as serving them will lead to larger objective values. Priority coefficients α u s add another degree of flexibility to our optimization. These coefficients allow us to change the priorities of different SUs as necessary. For example, they can be set to favor big data users, or to favor a subset of premium users over others and so on.
The problem investigated in this paper is similar to the one studied in [45], where it was proven to be NP-hard. Although [45] solved the problem to global optimality using dynamic programming, their approach is only suitable for small networks with few resources and SUs. To address this limitation, we propose a low-complexity sub-optimal offline batch (OFB) scheduling algorithm that aims to solve the optimization problem in a greedy manner. However, to prevent a significant loss in performance compared to the global optimum, we also incorporate a substitution mechanism into OFB. This technique allows OFB to replace previously admitted users with low utilities with a user of significantly higher utility [53].
The OFB algorithm assumes that all data requests from the secondary users for a given period (denoted as n) are received during the previous period (denoted as n − 1). These requests are then processed jointly in a batch mode to determine their admission and scheduling variables. However, this approach can become a bottleneck, particularly when the period length is long. To address this issue, we propose an online real-time (ONR) scheduling algorithm that evaluates and either accepts or rejects new requests as soon as they arrive. Additionally, the required resources are reserved immediately. In this paper, we provide a detailed description of the OFB algorithm in Section IV and introduce the ONR algorithm in Section V. We also conduct a comprehensive performance evaluation of both algorithms in Section VI.

IV. PROPOSED OFFLINE BATCH SCHEDULING (OFB)
Both the OFB and ONR scheduling algorithms are nonpreemptive, which means that data transmission to any user can be delayed or interrupted to serve other users. These delays may occur if other users have stricter deadlines, higher priorities for receiving service, or there is a lack of spectrum channels due to PUs' activity.
OFB operates at the Baseband Unit (BBU) pool where all incoming data requests are collected for the next service period. At the end of the current service period, OFB schedules all requests jointly and provides the list of admitted users along with their allocated RRHs, spectrum channels, and time slots for the next service period. By using this approach, OFB can optimize system performance by considering all incoming data requests together and allocating resources accordingly. However, since it is non-preemptive, there is a possibility of some requests being delayed or interrupted, which can result in higher latency for some users.
Scheduling algorithms sort and schedule SUs in some order based on some criterion. The sorting criterion varies greatly for different algorithms. For example, SUs may be sorted based on T n u in an ascending order, which gives priority to the SUs with the earliest deadline. This leads to the well-known offline greedy earliest deadline first (EDF) algorithm [54], [55]. In another approach, SUs are sorted based on their achievable data rate per unit resource, which is equivalent to greedily solving a Knapsack problem. Based on our objective function in (5), we use scaled requested data size or α u L u L as our sorting criterion. Upon denoting sorting order by ⪯, we have u ⪯ u ′ if α u L u ≥ α u ′ L u ′ . It means that u has priority over u ′ and should be scheduled first. We hasten to add that when an algorithm reaches global optimum, as in [45], sorting is unnecessary.

Algorithm 1 The Proposed OFB for Period n.
Input: ∀u ∈ U : BER tar u , L u × L, t n u , T n u , γ u , α u , ∀t : S n,t , ∀(r, t) : r n,t , and ζ . Output: U OFB n and corresponding allocated resources. For sub-optimal approaches, initialization is critical as it can lead to sub-optimal solutions with significantly different objective values. Thus, sorting ensures that we start the algorithm with a good initialization. Algorithm 1 summarizes the OFB scheduling algorithm. First, SUs are sorted based on the earliest time slot that they can receive service which is t n u . OFB starts from the earliest time slot and iteratively increments time slots. At each time slot, OFB attempts to schedule as many SUs as possible up to the current time slot by considering all unscheduled SUs in the sorted order.
For every unscheduled SU and every time slot, OFB goes through two cases. In case 1, OFB attempts to schedule the SU whose turn has come by utilizing the remaining available resources. If there are enough resources, the SU is scheduled, and OFB moves to the next unscheduled SU. If the remaining resources are not sufficient, case 2 is invoked. In case 2, OFB checks to see if any set of previously admitted users, whose contribution to the objective is considerably lower than the current SU, can be dismissed so that the current SU can be scheduled instead.
Once all SUs have been considered, OFB returns the set of admitted users, denoted by U OFB n , along with the corresponding resource allocations. OFB is run independently at the beginning of each service period. Overall, the OFB scheduling algorithm aims to optimize the cognitive CRAN's VOLUME 11, 2023 Case 1 in Algorithm 1 8 ∀t ∈ R n,t u : r n,t ← r n,t − 1 performance by efficiently allocating resources to all SUs in order to maximize the weighted sum rate of SUs. We elaborate on OFB algorithm pseudo-code next. OFB first sorts the SUs in a descending order of α u L u in line 2. Scheduling is performed iteratively for time slots between min u∈U t n u and max u∈U T n u . OFB begins with the smallest acceptable time slot min u∈U t n u , checks if it can schedule any new SUs and then increments the time slot until it reaches the largest value max u∈U T n u . The parameter t e keeps track of the time slot currently being considered. In Line 5, every unscheduled SU is considered in the sorted order. In Line 6, those SUs whose acceptable data communication interval [t n u , T n u ] contains t e but are not yet scheduled, are considered. For any such SU, Case 1 and Case 2 are performed successively.
In Case 1, each subset s u of {1, . . . , S} with size s max becomes a spectrum resource candidate for SU u. Considering each s u sequentially, latest possible starting service time of SU u, referred to as t ′ 1 , is evaluated such that t e will become the service ending time slot. The s u which yields maximum t ′ 1 , i.e., t 1 = max s u t ′ 1 = max s u min t∈T n u t, will be selected as the allocated channels. It is obvious that the starting service time, i.e., min t∈T n u t, should be greater than or equal to t n u . For each t ∈ [t 1 , t e ], we store { f s | s ∈ s u } ∩ S n,t in S n,t u , if there is at least one RRH that meets minimum received SNR requirement for this SU. When S n,t u ̸ = ∅, each RRH r that meets γ s r,u > γ u and has free capacity to serve SU u, is associated with u, and its identity is stored in R n,t u . Finally, if S n,t u ̸ = ∅, t is stored in T n u . If no such combination of s u and assigned RRHs can be found that can complete serving u before t e , then Case 2 is executed. We define the replacement factor, 0 ≤ ζ < 1, Case 2 in Algorithm 1 26 Sort U OFB n based on starting time of service in a descending order 27 If the first term in the RHS of the above relation is selected, Sort w k s in an ascending order, and apply this order to U ′ 39 for k ∈ K do 40 Sort U ′ k based on starting service times in an descending order

45
Run Lines 8-16 in Case 1, by considering r n,t k and S n,t k as available resources go to Line 50 where ζ = 0 enforces no substitution, and ζ near 1 increases chance of replacement. Case 2 determines the subset of admitted SUs, with minimum sum of weighted data lengths, denoted by U ′ , which can be removed from U OFB n in order to provide enough resources to serve SU u. The following optimization problem is solved to determine aforementioned U ′ if it exists: 67764 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Search space for finding U ′ can be further limited by implicit constraints. Constraint (6d) means that every U ′ such that ∃u ′ ∈ U ′ : α u ′ L u ′ > ζ α u L u can not be substituted. Moreover, we should exclude U ′ that ∃u ′ ∈ U ′ : T n u ′ ∩ [t n u , t e ] = ∅. Therefore, U ′ should also satisfy the following: The substitution of U ′ found in Case 2 with the current SU u is performed only if u ′ ∈U ′ α u ′ L u ′ is smaller than ζ α u L u . This means that substitution is carried out if the objective function is increased by at least (1 − ζ )α u L u . Next, let us elaborate on how Case 2 works. By executing Lines 26-49, OFB finds a ''sub-optimal'' solution for U ′ in the following manner. First, for each SU u ′ such that T n u ′ ∩ [t n u , t e ] ̸ = 0, time slot min t∈T n u ′ t + 1 is stored in auxiliary set K for future processing, in Line 30. The stored time slots in K are sorted in a decreasing order. The dynamic program is performed in Lines 31-38. For each k ∈ K, we will form candidate SUs for replacement or U ′ k to free the time slots [k, t e ]. Candidate U ′ k s are initialized by empty set, and their corresponding contribution to objective in (5), denoted by w k , is initialized to +∞. Utilizing the dynamic relation in Line 36, U ′ k and w k for all k ∈ K are iteratively updated. Once these iterations are completed, U ′ k s are sorted in ascending order of their weights w k . Beginning with the smallest weight w min , one checks if dismissing the corresponding set U ′ min can free up enough resources to serve u. This is checked in Lines 45-49. If the answer is positive and if ζ times α u L u L has a greater value than w min , then the substitution is carried out in Lines 51 to 55. Otherwise, next smallest weight w k is checked in Lines 38 and 39. The set U temp n is an auxiliary set that stores all those SUs that were admitted at least once when OFB was running. If a SU is deleted from U OFB n in Case 2, it is not deleted from U temp n . This set will be used for performance evaluation in Section VI.
Our proposed OFB is a generalization of the greedy method in [53] for weighted interval selection problem. OFB is sub-optimal from several aspects: (i) It greedily schedules SUs who can be served at the earliest deadline. (ii) Rejections are greedy and permanent at every given time slot t e . Once rejected for a given t e , the SU should wait for t e to be incremented before it gets a second chance of being scheduled. (iii) The dynamic program in Case 2 provides only a sub-optimal solution of (6). Still, it performs satisfactorily in our numerical results compared to existing schemes.

A. A SIMPLE EXAMPLE FOR CASE 2 OF ALGORITHM 1
We consider a simple CRAN where S = 3, s max = 1, and ∀u ∈ U : α u = 1. To maintain the simplicity of exposition, we assume all three spectrum channels are available in all time slots. Furthermore, we assume RRHs have enough capacity to serve all demanding SUs in every time slot as long as the minimum SNR requirement is satisfied. Since we focus on Case 2 in this example, we assume that the answer to Case 1 was negative meaning that there are not enough channels to serve SU u alongside the already scheduled users. Therefore, Case 2 aims to find a subset of low-utility users which can be dropped in favor of the to be scheduled user u thus increasing the objective value.

V. PROPOSED ONLINE REAL-TIME SCHEDULING (ONR)
The OFB scheduling algorithm assumes that all data requests for time period n + 1 arrive at period n. Hence, in the worst case, a user should wait for one period before it is either scheduled or rejected. If the time periods are large, this waiting time is unacceptable for most real-time applications. Thus, we consider the same optimization problem as in (5) but assume that any arriving request in period n + 1 should be either scheduled in period n + 1 or immediately rejected. Unlike the OFB, the ONR assumes that the BBU pool has no prior knowledge of which SUs will request to be served in period n + 1. Furthermore, the BBU pool has no knowledge about availability of channels in period n + 1. As soon as the ONR receives a data request with the 6-tuple description BER tar u , L u × L, t n+1 u , T n+1 u , γ u , α u in period n+1, it executes a real-time admission control to check if sufficient resources are available to admit this request. If the request is accepted, the ONR allocates the corresponding resources immediately. Lack of prior knowledge on the number and specification of upcoming data requests degrades performance of the ONR compared to that of OFB. To reduce the degradation, statistics of SUs' activities and channels availability, will be exploited in the ONR as we will describe next.

A. SU's ACTIVITIES AND SPECTRUM AVAILABILITY PREDICTION MODEL
To enhance ONR performance, the algorithm employs statistical information of both SUs' request arrivals and channels availability in previous periods. Let P n (u) and P n (s) denote the probabilities for arrival of a request by SU u and availability of channel s in period n, respectively. We assume that these probabilities are independent across SUs and channels. In our model, confidence intervals are considered for the ratio of each of these probabilities over two consecutive periods. We assume where α ≥ 1 is the confidence interval factor. When α is close to one, we have achieved a very good prediction of SU's request arrival probability for the next period. When α gets large, our prediction has a very low accuracy and request arrival probability ranges from near zero to close to one. Nevertheless, our proposed ONR can work with any general prediction algorithm as long as a bound like (8) can be obtained with a specific known α. Similarly, for availability of channel s, we have where β ≥ 1 is the confidence interval factor for channel availability. Equation (9) accepts the same properties as (8).
A similar confidence interval model has been used in [23]. However, the bounds are with respect to the expected values instead of probabilities. In the literature, different models have been investigated for wireless traffic prediction [56]. Yet, our proposed algorithm can work with any general traffic anticipation approach. The inaccuracy of the predictions can be well modeled by α and β. Here, we assume time invariant (or fixed) uncertainty factors for all periods. ONR is presented as Algorithm 2 and it works as follows. As soon as a SU u ′ 's data request arrives in t n+1 u ′ , ONR first runs the OFB on the set of selected SUs in the previous, i.e. nth, period on U OFB n ∪{u ′ }, by assuming the same availability of channels and RRHs as in the n'th period. In fact, ONR first checks to see that if this request had arrived in the previous period, it would be admitted or not. If the answer is positive, we assign u ′ to U op n+1 as a candidate for acceptance in period n + 1. This decision is made based on the available resources in the nth period, so it will incur a performance degradation in period n + 1. It is possible that given the already admitted requests in period n + 1, U op n+1 is not a disjunctive subset of U. Thus, we perform two more purging steps. If the new data request passes these two steps successfully, it will be admitted and scheduled. First, we run a Bernoulli experiment with success probability p that we will optimally tune later.
If the Bernoulli experiment is a success we will keep the new request as a possible scheduling candidate, otherwise the request is rejected. Finally, we run the OFB on the set of already accepted requests U ONR n+1 plus the u ′ given by U ONR n+1 ∪ {u ′ } with resources in period n. If the new request is selected by OFB, we will admit the new request. Then, we drop all SUs that belonged to previous U ONR n+1 but are no longer in the new U ONR n+1 . Resources for this new U ONR n+1 are allocated by OFB. The complexity and performance of both OFB and ONR are rigorously derived in the next section.

VI. PERFORMANCE AND COMPLEXITY ANALYSIS FOR OFB AND ONR
Here, we rigorously evaluate OFB and ONR performance, where we derive bounds on how far the objective function of these algorithms are from the global optimum given by u∈U * n α u L u . Here, U * n denotes the set of admitted users at the global optimum of the n-th period. The following theorem summarizes our results on OFB performance.
Theorem 1: The proposed OFB algorithm is guaranteed to achieve an objective value bounded below by 0.17 times the global optimum of (5), that is where the optimum value for ζ is given by −1+ Performance analysis for ONR is summarized in the following theorem.
Theorem 2: The proposed ONR algorithm is guaranteed to achieve an expected objective value lower bounded by The optimal values for p and ζ are given by 7 ≈ 0.28, respectively. Proof: Please see Appendix B. It needs to be mentioned that the bound in Theorem 2 is derived assuming p < 1. Thus, the Bernoulli experiment has a nonzero probability of rejecting a particular user. If p = 1, the bound in Theorem 2 becomes trivial as it will amount to left hand side of (11) to be greater than some negative value which is obvious; Please check Appendix B to verify this. If a stronger bound is derived then we can also allow for p = 1. To summarize, the Bernoulli experiment is not a fundamental block of our proposed algorithm. It only allows us to derive a non-trivial bound on ONR performance.

A. COMPLEXITY ANALYSIS
OFB's complexity is given by O | U | log 2 (| U |) + max T n u × | U | × (A 1 + A 2 )), where A 1 and A 2 are computational complexity of Case 1 and Case 2, respectively. A 1 is given VOLUME 11, 2023 by A 1 = S s max × max T n u − t n u × | R |. In Case 2, complexity of for in Line 28, and also Line 32 is on the order of | U |. Moreover, complexity of the sort instruction in Lines 38 and 40 is in order of | U | log 2 (| U |). Furthermore, complexities of Lines 33-37, and Lines 39-49 are O | U | 2 , and O | U | 2 × max T n u + | U | 2 log 2 (| U |) + | U | ×A 1 , respectively. To sum up, overall complexity of OFB is on ONR runs OFB in Lines 3 and 9 for | U | times. Thereby, ONR's complexity is on the order of To find the global optimum of the optimization problem in (5), one should resort to exhaustive search. The number of possible resource allocations for SU u is given by |R| r × T n u − t n u . So, the computational complexity of an exhaustive search is u∈U N u .

VII. NUMERICAL RESULTS
Our proposed OFB and ONR algorithms outperform existing alternatives in the literature, and we present a comprehensive numerical analysis to support this claim. We perform our analysis using two different setups. Firstly, we generate a sample CRAN to demonstrate the significant differences between OFB and the currently available alternatives. Secondly, we conduct Monte Carlo simulations to evaluate the average performance of OFB and ONR. These simulations enable us to assess their performance under various scenarios and network conditions.
We investigate the ratio of transferred data over total data requests, the percentage of scheduled SUs, and the percentage of allocated channels and assigned RRHs as performance metrics in both setups. Moreover, our main focus is on the impact of the proposed algorithms on big data users. To address this, we select a suitable value for L u in the range of 2 5 , 2 20 for each SU u. We partition this range into 5 equal sub-intervals, with each sub-interval represented by a distinct value of ρ. Specifically, we obtain ρ by dividing the rightmost point in each sub-interval by 2 20 , and the resulting values of ρ are 0.2, 0.4, 0.6, 0.8, and 1, respectively. Notably, larger values of ρ indicate higher data demands for the SUs in that sub-interval, and the sub-interval with ρ = 1 corresponds to the big data SUs.
Simulation setups are determined next. We consider the service area of the CRAN to be within a 2000 × 2000 m 2 area with multiple RRHs serving the SUs and a single RRH serving the PUs. The RRHs and SUs are uniformly and independently distributed within this square area. These SUs are assumed to be either static or have low mobility. Simulation parameters are summarized in Table 4. The capacity of backhaul and fronthaul links are assumed to be sufficiently large to support all data flow in the CRAN with negligible delay. Upon assuming an urban environment, the RRH-SU channel coefficients follow the path loss model PL[dB] = 30.58 + 36.7 log 10 d r,u − a 0 where d r,u > 1.135 m is the distance between RRH r and SU u. Log-normal shadowing with 8 dB variance is considered [58]. Parameter a 0 is a correction factor that accounts for different RRH and SU antenna heights. The total bandwidth is 20 MHz, which is divided into S = 100 channels having equal bandwidth of 200 KHz each. The utilization rate of each channel by PUs varies from 40 to 60 percent [59]. The duration of channel occupation by PU is modeled by an exponential random variable with mean dwell time 10 3 × t. Finally, we consider BER tar u to be from the set 10 −3 , 10 −5 , 10 −6 , and γ u from the set {0, 3, 5} [dB] for requests with audio, video, and text, respectively [60].
The global optimum of (5) can be determined through efficient exhaustive search methods like branch and bound. However, these approaches become impractical for medium to large network sizes. Therefore, we compare our proposed methods against existing suboptimal alternatives. Specifically, we evaluate two scheduling algorithms, earliest deadline first (EDF) [55], [61], [62], [63], [64] and earliest ending time first (EEF). EDF has been proven to achieve a total number of admitted SUs at least half of the global optimum [65]. Thus, we compare against three different algorithms: EDF, EDF_ζ , and EEF. EDF_ζ is an algorithm that allows for some users to be dropped in favor of users who increase the objective by at least a 1 − ζ value. Furthermore, we compare our proposed ONR against ONR/EDF_ζ , ONR/EDF, and ONR/EEF. It should be noted that ONR utilizes successive applications of OFB to determine the admitted users and their resource allocations. Hence, ONR/EDF, for instance, represents the online algorithm that utilizes successive EDF runs instead of OFB runs as the primary building block.

A. ONE CRAN REALIZATION
An instance realization of the coverage area is shown in Fig. 2a for a CRAN with | R | = 20 RRHs of small cells and | U | = 15 SUs. Here, we assume there are only S = 5 channels with f = 1 MHz. Parameters t n u and T n u for these SUs are shown in Fig. 2b when they make requests with lengths that are shown in Fig. 2c. Every RRH has enough capacity to simultaneously support all SUs, i.e., ∀r, t : r n,t ≥ 15. Furthermore, when SU u is selected, it is assigned to all RRHs with indicator function I R + (γ s r,u − γ u ) = 1. For this CRAN, the percentage of total transferred data and the percentage of scheduled SUs are shown for different resource scheduling algorithms in Fig. 2d. As illustrated in this figure, the proposed OFBs with ζ = 0.41 and ζ = 1 achieve the highest transferred data percentages, respectively. However, these two algorithms serve a smaller percentages of SUs with respect to the EEF. The selected SUs by the algorithms EEF, OFB with ζ = 0.41, OFB with ζ = 1, and EDF are, respectively {10, 13, 4, 1, 5, 9, 11}, {10, 13, 4, 7, 9, 11}, {10, 8, 7, 5, 11, 9}, {10, 2, 1, 5, 9, 11}. These results show that algorithms with ζ ̸ = 0 serve those SUs requesting larger volumes of data with a higher priority, while algorithms with ζ = 0, namely EEF, serve a larger number of SUs.

B. MONTE CARLO SIMULATIONS
Next, we evaluate average performance of OFB and ONR over 10 4 random CRAN realizations. These results are averaged over different values of U , R, S, s max , r n,t and availability distribution of channels. As mentioned earlier, the performance of OFB depends on ζ . Fig. 3 illustrates the percentages of total transferred data, scheduled SUs, and usage of channels and RRHs for all offline algorithms with respect to ζ . It should be mentioned that OFB and EDF_ζ with ζ = 0 are equivalent to the EEF and   EDF, respectively. Fig. 3 plots our four performance criteria for various algorithms versus ζ . It is demonstrated that OFB performs better in the percentage of total transferred data over the whole range of ζ and its maximum occurs 67770 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. at ζ ≈ 0.41 that is also expected from Theorem 1. This improvement in OFB's performance is also a direct consequence of the fact that OFB makes a more efficient utilization of spectrum as corroborated in Fig. 3c. The best percentage of the transferred data are 35.00%, 32.99%, 29.92%, and 28.93% which are achieved by OFB with ζ = 0.41, EEF, EDF_ζ = 0.6, and EDF, respectively. Fig. 3b shows that EDF_ζ achieves approximately 2% more total scheduled SUs compared to OFB. However, OFB performs better in terms of transferred data percentage as it achieves 29.38% versus EDF_ζ 's 29.00%. Upon increasing ζ , both algorithms become inclined to schedule SUs with higher volumes of data requests. As a result, the percentage of the total scheduled SUs decreases. We have observed that the EEF and EDF algorithms utilize more channels and RRHs compared to our proposed OFB and ONR algorithms. However, they lack the flexibility to properly select SUs with higher data requests. These algorithms are designed to maximize the number of scheduled SUs, often at the cost of lower data transfer percentages. This inflexibility results in suboptimal solutions for big data transmission, as they do not take into account the data prioritization and QoS requirements of the selected SUs.
Given that PUs' activity pattern vary in time, they cause the available spectrum for SUs to vary in time as well. We use η to express the availability of channels. The percentages of total transferred data, of scheduled SUs, of usage of channels and RRHs are depicted in Fig. 4 versus η for OFB with ζ = 0.41, EEF, EDF with ζ = 0.6, and EDF. It illustrates that OFB achieves a better percentage of total transferred data while maintaining the percentage of total scheduled SUs near to that of the EDF_ζ = 0.6. By increasing ζ , the percentages of total transferred data, scheduled SUs, and RRHs utilization improve for all algorithms. Yet, OFB maintains the best performance in the percentage of total transferred data for all η values.
The performance of ONR is a function of ζ and p as well as the parameters α and β. Fig. 5 illustrates the percentage of total transferred data with respect to ζ and p for (α, β) = (1, 1) and (1.1, 1.1), respectively. It can be observed that maximum performance is achieved when (ζ, p) = (0.28, 0.36) and (ζ, p) = (0.28, 0.33), in these two scenarios. ONR algorithm is simulated with these optimal values of p and plotted in Fig. 6 and Fig. 7 for (α, β) = (1, 1) and (1.1, 1.1), respectively. By comparing these two figures, it is deduced that by increasing α and β, all performance criteria degrade. This is a direct consequence of increased uncertainty about requesting SUs and availability of channels in period n + 1, which was also predicted by Theorem 2. In Fig. 6, for (α, β) = (1, 1), the maximum percentage of total transferred data is given by 23.31%, 16.03%, 19.38%, and 13.59% for ONR, ONR/EEF, ONR/EDF_ζ = 0.71, and ONR/EDF, respectively. In Fig. 7, for (α, β) = (1.1, 1.1), the maximum percentage of the total transferred data is given by 16.20%, 10.01%, 11.87%, and 7.49% for ONR, ONR/EEF, ONR/EDF_ζ = 0.79, and ONR/EDF, respectively. The results corroborate a higher percentage of total transferred data for ONR versus all alternatives. This improvement occurs due to a higher utilization of channels, flexibility in SUs' selection due to ζ , and applying our prior knowledge of SUs activity and channels availability probabilities. Similar to offline batch algorithms, ONR/EEF and ONR/EDF have better performances in percentage of total scheduled SUs.

C. BIG DATA REQUESTS
Both OFB and ONR were designed to improve service quality for big data requests. Here, we evaluate both OFB and ONR for big data services. Upon recalling that all requested data sizes are divided into five equal ranges in the interval [2 5 , 2 20 ], where each range is recognized by a different ρ, one deduces that the sub-interval with ρ = 1 contains 20% of the largest requested data sizes and represents big data users. In Fig. 8, the percentage of the totaled scheduled SUs is plotted versus ρ. The results are plotted for ζ = 0, 1, and ζ 's optimal values of Theorems 1 and 2 for OFB and ONR respectively. The results determine that both OFB and ONR schedule more big data requests compared to existing alternatives. By increasing ζ , OFB and ONR exert a higher priority for big data requests, so the largest percentage of admitted big data demands occur at ζ = 1. However, ζ = 0.41 also performs satisfactorily on big data. these observations are corroborated numerically in Figs. 8a and 8b for OFB and ONR respectively.

VIII. CONCLUSION
We addressed the problem of selecting SUs, associating them with RRHs, allocating channels, and performing deadlineaware non-preemptive time scheduling over the cognitive CRAN. Our objective is to find an optimal disjunctive set of SUs with corresponding resource allocation to maximize overall weighted data transmission while ensuring QoS parameters for big data transmission. We prioritized SUs based on the requested big data type, which is multiplied by data length in the objective function, to customize this problem for big data transmission. Furthermore, we considered the 5V characteristics of big data in our work.
To solve this problem, we proposed the OFB and ONR algorithms, which support QoS for data requests of selected SUs, including target bit error level, minimum signal-to-noise ratio (SNR), and deadline to receive data. The performance of these algorithms is at most a factor of 3−2 away from the globally optimal solutions, respectively.
We evaluated the performance of our proposed algorithms through simulations, which demonstrate that they outperform the EEF and EDF algorithms in total transferred data and big data transmission. Specifically, our proposed algorithms achieve better performance in terms of maximizing overall weighted data transmission, ensuring QoS for data requests, and improving the efficiency of spectrum utilization.

APPENDIX A PROOF OF THEOREM 1
First, we analyze the relation between U OFB n and U temp n . According to OFB algorithm Each u ∈ U OFB n has been admitted either through Line 18 or 52 of Algorithm 1. Obviously, each SU u that is finally admitted belongs to U OFB n . These users are all members of U temp n as well. However, U temp n also contains those SUs that were once admitted but were later dropped according to Line 52 of OFB. In this Line, u is accepted and the set U ′ of previously admitted SUs are rejected if ζ α u L u > u ′ ∈U ′ α u ′ L u ′ . Due to this substitution, the objective function increases by at least (1 − ζ ) α u L u . We can write this as Finally, we arrive at Next, we derive a bound between U temp n and U * n . We can write the following inequality For every u ∈ U * n \ U temp n , this SU was not admitted because there was a set of users To show this, we consider two cases. Either u, v schedules in the global optimum share a time slot or do not share any time slots. If they share time slots, then they should be scheduled on different frequency channels. Hence, they will interfere with disjoint U ′ , V ′ . If they do not share time slots, then they can be scheduled on the same frequency channels. Let us assume there exists a SU w ∈ U temp n which belongs to both U ′ , V ′ . Then, either u or v will end before w. According to the while loop in line 4 of Algorithm 1, either u or v should belong to U temp n which is not the case voiding this assumption. As a result, U ′ and V ′ are guaranteed to be disjoint. Given the disjoint assumption, we can write ζ u∈U * n /U temp n α u L u ≤ u∈U temp n α u L u . Combining this with (14), we arrive at Finally, we combine (13) and (15) to arrive at By taking the derivative of ζ 1−ζ 1+ζ and set it to zero, we obtain two values for ζ as −1− √ 2 and −1+ √ 2. The first one is negative and hence not a valid choice. Thus, ζ = −1 +

APPENDIX B PROOF OF THEOREM 2
To derive the performance bound for ONR, we derive successive bounds on how much objective value we loose in going from U * n+1 to U ONR n+1 at every step of Fig. 9. Then, we combine the corresponding losses to derive Theorem 2. This proof idea is borrowed from [66]. However, our ONR is different from their proposed online algorithm and thus demands a separate in-depth analysis. First, we assume that all admitted users can only be scheduled on a set S a ⊂ S of size |S a | = s max . It is notable that data request probability for SUs and availability of channels are independent, so the joint probability of data request by SU u at period n and availability of set S a of channels in nth period of the CRAN is given by P n (u, S a ) = P n (u) s∈S a P n ( f s ). (17) Lemma 1: By using (8) and (9) in (17), we have 1 First, we characterize the loss in going from U * n+1 to U * n . Lemma 2: The following inequality holds Proof: where in the second inequality, Lemma 1 was applied. Next, we characterize the loss in going from U * n to U OFB n . Taking expected values from both sides of (16) we arrive at 67774 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. . Performance degradation of the proposed ONR scheduling algorithm from optimal solution; which shows how much successive bounds on objective value is degraded in going from U * n+1 to U ONR n+1 at every step.
The following lemma determines the loss in going from U OFB n to U op n+1 : Lemma 3: The following inequality holds Proof: We define A u as event that u is disjunctive with U OFB n \ {u}. Based on Algorithm 2, u ∈ U OFB n if u requests data in period n, S a is available in period n, and u is disjunctive with U OFB n \ {u}. Subsequently, P(u ∈ U OFB n ) = P n (u, S a , A u ). As well, u is a member of U op n+1 , if u requests data in period n + 1, S a is available in period n + 1, and u is disjunctive with U OFB n \ {u}. As a result, P(u ∈ U op n+1 ) = P n+1 (u, S a , A u ). So, we have: = P I U OFB n (u) = 1 S a × α u L u = P n (u, S a , A u ) α u L u = P (A u ) P n (u, S a | A u ) α u L u = P (A u ) P n (u, S a ) α u L u ≤ αβ s max P (A u ) P n+1 (u, S a ) α u L u = αβ s max P (A u ) P n+1 (u, S a |A u ) α u L u = αβ s max P n+1 (u, S a , A u ) α u L u = αβ s max P u ∈ U op n+1 S a α u L u It should be mentioned that we assume A u is independent of u requesting data and availability of channels in n and n + 1 periods. Summing (23a) and (23b) over all u ∈ U will yield the lemma's inequality. Lemma 4: We have the following equality Proof: We know that if u ∈ U Lemma 5: We have the following inequality Proof: Upon applying Lemma 4 to the left hand side (LHS) of (26), it suffices to prove the following According to the proof of Lemma 3, we have P(u ∈ U OFB n ) = P n (u, S a , A u ) and P(u ∈ U op n+1 ) = P n+1 (u, S a , A u ). So, similar to (23) we have: = P I U OFB n (u) = 1 S a × α u L u = P n (u, S a , A u ) α u L u = P (A u ) P n (u, S a | A u ) α u L u = P (A u ) P n (u, S a ) α u L u ≥ 1 αβ s max P (A u ) P n+1 (u, S a ) α u L u = 1 αβ s max P (A u ) P n+1 (u, S a |A u ) α u L u = 1 αβ s max P n+1 (u, S a , A u ) α u L u Summing (28a) and (28b) over all u ∈ U will yield (27).
whereD u ′ , U ONR n+1 means that u ′ is not disjunctive with U ONR n+1 . The inequality in (30b) is derived by an application of Lemma 5: Now, we simplify the second term in the right hand side (RHS) of (30b). We know that members of u ′ ∈ U OFB n , D u ′ , U ONR n+1 are disjunctive, and are jointly admitted and scheduled in period n given the available resources in period n. Therefore, the reason these SUs do not belong to U ONR n+1 is that their weighted data size is smaller than those appearing in U ONR n+1 . Next, we assume each SU ν ∈ U ONR n+1 have caused the absence of set C ν ⊆ U OFB n in U ONR n+1 . In other words, C ν is the part of u ′ ∈ U OFB n ,D u ′ , U ONR n+1 that are omitted from U ONR n+1 due to not being disjunctive with ν ∈ U ONR n+1 . Consequently, we have U ONR n+1 is a disjunctive set. So, for two different SUs ν and ν ′ in U ONR n+1 , we have C v ∩ C ν ′ = ∅. Therefore, we can write the following (1 − ζ ) Subsequently, we have Upon substituting this inequality in the second term on the RHS of (30b), proof of Lemma 6 is completed.
Next, we combine Lemmas 1-4 and Lemma 6 to arrive at We maximize the RHS of bound in (32) with respect to both p and ζ . Taking the derivative of RHS with respect to p and setting it equal to zero will yield p = 7− √ 17 8 √ αβ s max . Then, we take the derivative with respect to ζ and set it equal to zero which yields ζ = √ 17−3 4 . Substituting these values for p, ζ into (32) will complete the proof of Theorem 2.