TY - JOUR
T1 - Cost-aware learning and optimization for opportunistic spectrum access
AU - Gan, Chao
AU - Zhou, Ruida
AU - Yang, Jing
AU - Shen, Cong
N1 - Funding Information:
Manuscript received April 9, 2018; revised September 17, 2018 and November 21, 2018; accepted November 27, 2018. Date of publication December 10, 2018; date of current version March 7, 2019. The work of C. Gan and J. Yang was supported in part by the National Science Foundation under Grant ECCS-1650299. The work of R. Zhou and C. Shen was supported by the National Natural Science Foundation of China under Grant 61631017. The associate editor coordinating the review of this paper and approving it for publication was A. B. MacKenzie. (Corresponding author: Jing Yang.) C. Gan and J. Yang are with the School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA 16802 USA (e-mail: cug203@psu.edu; yangjing@psu.edu).
Publisher Copyright:
© 2015 IEEE.
PY - 2019/3
Y1 - 2019/3
N2 - In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.
AB - In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.
UR - http://www.scopus.com/inward/record.url?scp=85058183635&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058183635&partnerID=8YFLogxK
U2 - 10.1109/TCCN.2018.2885790
DO - 10.1109/TCCN.2018.2885790
M3 - Article
AN - SCOPUS:85058183635
SN - 2332-7731
VL - 5
SP - 15
EP - 27
JO - IEEE Transactions on Cognitive Communications and Networking
JF - IEEE Transactions on Cognitive Communications and Networking
IS - 1
M1 - 8570777
ER -