Skip to main content
SearchLoginLogin or Signup

Massive MIMO Channel Prediction Using Recurrent Neural Networks

In this work, the authors demonstrate a low complexity channel prediction method using neural networks. Specifically, they explore the power of recurrent neural network utilizing long-short memory cells in analyzing time series data for achieving accurate channel prediction.

Published onAug 12, 2020
Massive MIMO Channel Prediction Using Recurrent Neural Networks


Massive MIMO has been classified as one of the high potential wireless communication technologies due to its unique abilities such as high user capacity, increased spectral density, and diversity among others. Due to the exponential increase of connected devices, these properties are of great importance for the current 5G-IoT era and future telecommunication networks. However, outdated channel state information (CSI) caused by the variations in the channel response due to the presence of highly mobile and rich scattering is a major problem facing massive MIMO systems. Outdated CSI occurs when the information obtained about the channel at the transmitter changes before transmission. This leads to performance degradation of the network. In this work, we demonstrate a low complexity channel prediction method using neural networks. Specifically, we explore the power of recurrent neural network utilizing long-short memory cells in analyzing time series data. We review various neural network-based channel prediction methods available in the literature and compare complexity and performance metrics. Results indicate that the proposed methods outperform conventional systems by tremendously lowering the complexity associated with channel prediction.

INDEX TERMS: Massive MIMO, Neural networks, RNN, Artificial Intelligence, Channel state information.


Multiple input multiple output (MIMO) employs multiple antennas at the transmitter and/or receiver. This technology has highly desired properties such as high throughput, high spectral efficiency, and multiplexing gains [1]. MIMO has evolved from a mere research concept to a real-world application and has been integrated into state-of-the-art wireless network standards such as IEEE 802.11n, 3GPP long-term evolution (LTE) and LTE- Advanced (E-UTRA) [2]. In a massive MIMO (mMIMO) system, the number of antennas in MIMO increases to hundreds. mMIMO has been classified as one of the high potential wireless communication technologies with the ability to have high user capacity [3], which is a key requirement for 5G-IoT and beyond technologies [4][5][6].
Nevertheless, numerous challenges are facing mMIMO and the general wireless communication paradigms. This is due to the population increase of users which causes increases in the number of connected devices [7][8][9][10]. Outdated channel state information (CSI) caused by the variations in the channel response due to the presence of highly mobile and rich scattering is a major problem facing mMIMO systems [3]. Outdated CSI occurs when the information obtained about the channel changes before transmission. Proposed solutions in the literature to combat the outdated CSI problem are mainly divided into passive and sub-optimal methods [11]. Passive method passively compensates for the performance loss at the cost of wireless resources (frequency, time, power), while sub-optimal methods assumes imperfect CSI as a constraint and aims to acquire only partial performance [12].
For example, in time division duplex (TDD) systems the channel is reciprocal, where the same frequency is used for both uplink and downlink. Therefore the downlink channel is estimated from the uplink channel in conventional TDD systems [13]. Nevertheless, the increase in the number of devices caused by the exponential human population increase, and the IoT era where everything is connected to everything has forced engineers to use ultra-high bands including millimeter and terahertz in wireless communication [14]. Channel coherence time is significantly reduced in ultra-high bands thus becoming shorter than the pilot transmission time. Consequently, the conventional uplink-downlink channel state acquisition method will provide an outdated CSI resulting in significant system performance degradation. In light of this, an efficient and reliable channel state prediction system is in demand.
Moreover, considering channel modeling in mMIMO channel prediction, Rayleigh distance between the transmitter and receiver in MIMO is defined as 2L22L^2/λ\lambda, where L is the antenna dimension and λ\lambda is the carrier wavelength [15]. Unlike MIMO, mMIMO has a large number of antennas, which may cause the distance between the receiver and transmitter to be shorter than Rayleigh distance. Therefore, the far-field assumption defined in [16] cannot be used and in turn, we use the near field assumption defined in [17]. Moreover, non-stationarity is also considered in mMIMO due to the changing antennas and the varying physical environment [16][17].
Neural Networks (NN) as an AI technique, is an effective recently proposed model for combating outdated CSI without wasting resources [18]. It is highly valued because it can avoid parameter estimation due to its data-driven nature. Channel prediction is viewed as a revolutionary future technology and hence it has attracted the attention of many researchers [3]. Additionally, instantaneously selecting transmit parameters (enabled by channel prediction) such as transmit power, coding rate, transmit antennas, and carrier frequency, depending on the instantaneous condition of the channel using channel prediction will tremendously improve the performance of adaptive wireless communications systems (Which are the future of effective wireless communication).
In this work, we concentrate on implementing channel state prediction in mMIMO using artificial intelligence, specifically recurrent neural networks (RNNs). Fig. 1 depicts a mMIMO BS with hundreds of antennas and multiple users.
RNNs are very effective when processing time series data. Since channel response data is closely related to time series data, we look at mMIMO channel prediction using RNNs as a technology with great future potential that will have a major impact in wireless technology. Moreover, we compare performance metrics of conventional CSI prediction processes with RNN-based prediction. lastly, an RNN model utilizing LSTMs is designed. The main contributions of this work are:

  1. Propose a low cost mMIMO RNN-based CSI predictor.

  2. Provide quick answers about RNN-based mMIMO CSI prediction.

  3. Demonstrate performance metrics between conventional CSI predictors and RNN-based predictors, in terms of complexity and cost.

  4. Develop a mMIMO channel prediction method for 128 transmit antennas.

  5. Review recent and common neural network channel prediction schemes.

The remainder of this work is organized as follows. In Section II we discuss the literature review related to this topic. In Section III we present the mMIMO system model. Section IV discusses about commonly used machine learning MIMO channel prediction models and their limitations, while Section V explains about RNN-based mMIMO channel prediction process using the RNN predictor. Section IV discusses about the mMIMO channel prediction process. In Section VII we discuss the simulation results and finally, the conclusion is presented in Section VIII.





adaptive coded modulation


artificial intelligence


angle of arrival


angle of departure


access point


bit error rate




Base Station


convolutional neural networks


channel state information




estimation of signal parameters via rotational invariant techniques


frequency division duplex


interval of effective prediction


long short-term memory


long-term evolution


multiple input multiple Output


multiple signal classification


normalized mean square error


orthogonal frequency division multiplexing


recurrent neural network


sparse complex-valued neural network


time division duplex




uniform liner array

FIGURE 1. Massive multiple input multiple output (mMIMO) model.


According to [19], frequency division duplexing (FDD) based networks is superior compared to TDD-based networks due to its low latency and high anti-interference properties. Nevertheless, its computation and feedback challenges for predicting downlink channel state information (DL-CSI) are the main constraints for advancing performance in FDD cellular networks. To solve this problem, the authors in [19] propose the use of deep learning convolutional Long Short-Term Memory network (convLSTM-net) to predict DL-CSI using uplink channel state information (UL-CSI). The proposed convLTSM-net consists of two modules. The first one is the feature extraction module, responsible for learning spatial and temporal correlations between DL-CSI and UL-CSI. The second part is the prediction model which maps the extracted features to the reconstruction of DL-CSI. Moreover, the authors compare the performance of convolutional neural networks (CNN) and long short-term memory networks (LSTM) to convLSTM-net. The performance simulation was further divided into two parts. In part one, the hyperparameters on the proposed convLSTM-net are analyzed to investigate their effects on the prediction performance. In part two the experiment is performed in both time and frequency domain to determine the proper environment for accurate prediction of DL-CSI. According to Wang et. al. [19], the results show that convLSTM-net outperforms both CNN and LSTM in DL-CSI prediction using UL-CSI in cellular FDD networks. However, the proposed convLSTM-net has high complexity evident from its design.
In addition, authors in [20] claim that in an FDD mMIMO system, the acquisition of downlink CSI is a complex task due to the overheads required for downlink training and uplink feedback. The authors propose a sparse complex-valued neural network (SCNet) system used to map uplink to downlink. Unlike the previous system, this paradigm is modeled in the complex domain and can learn the complex-valued mapping function by off-line training. According to the authors, numerical results suggest that SCNet outperforms conventional deep network-based channel prediction in terms of prediction accuracy and robustness over complex wireless channels.
Authors in [21] propose an ESPRIT-based parameter prediction model for narrow-band MIMO systems that exploit both temporal and spatial correlations in practical MIMO channels. The model estimates channel parameters by employing a vector transmit spatial signature model and two-dimensional ESPRIT. According to the authors, the proposed scheme is suitable for both two-dimensional azimuths only and three-dimensional MIMO spatial channel models.
Authors in [22] utilize back-propagation (BP) in a multilayered neural network to model a multi-time channel prediction system. The paradigm is used to effectively predict CSI and enhance mMIMO performance, power control, and artificial noise physical layer security design. Additionally, the authors utilize a previous stopping criterion to prevent over-fitting of BP in neural networks. According to the authors, the results demonstrated by comparing the predicted normalized mean square error (NMSE), indicate that the performance of the proposed model has improved. Additionally, a sparse channel construction model used to save system resources without deteriorating the performance is proposed.
According to [23], a mMIMO channel is characterized by non-stationarity and quick variations. Therefore, using conventional methods to obtain CSI will result in an outdated CSI and consequently degrading the performance of the network. Authors in [23] propose a channel prediction algorithm used in massive MIMO. The authors propose a first-order Taylor expansion-based channel prediction model to handle channel characteristics. Moreover, a channel prediction model with estimation and prediction sections is further proposed to derive an interval of effective predictions (IEP). According to the authors, numerical simulations from the proposed algorithm indicate that a reliable channel prediction can be achieved within IEP.
By matching time-varying wireless fading channels to transmitter parameters, known as adaptive modulation, system throughput is considerably improved. Authors in [24] propose a channel prediction scheme using pilot symbol assisted modulation for MIMO Rayleigh fading channels. Moreover, the effect of the channel prediction error on the bit error rate of a transmit beam-former is analyzed. According to [24], the results obtained indicate a critical value, under which the adaptive modulator can consider the predicted channel as perfect, and above which categorical attention of the channel’s imperfection must be accounted for at the transmitter.
Authors in [13] propose a split-brain autoencoder system, which is a modified regular autoencoder for learning. The paradigm divides the network into two disjoint parts. Each part performs complex channel prediction tasks. According to the authors, the method produced state-of-the-art results on several large-scale learning standards. Moreover, authors in [1] provide a comprehensive survey on the application of recurrent neural networks (RNN) in channel prediction. The complexity and performance of predictors are relatively illustrated by numerical results.
Adaptive coded modulation (ACM) is a promising method used to enhance spectral efficiency in a time-varying mobile channel with no effect on the targeted bit error rate (BER) [25]. However, the transmitter must have perfect up-to-date channel characteristics for ACM to work. Authors in [25] propose a linear fading-envelope predictor to predict the CSI. Moreover, authors in [26] propose a CSI prediction algorithm for OFDM that determines time-delays and Doppler frequencies of each propagation path. According to the authors, the method requires less feedback information and has better mean-squared error performance than previous methods.
The model proposed in this work differs from the discussed reviews in that the model strives to provide a low-cost low-complex CSI predictor with the ability to suport future wireless technologies such as mMIMO and mm-Wave. Key contributions are:

  1. Propose a low-complexity low-cost mMIMO RNN-based CSI predictor.

  2. Demonstrate performance metrics between conventional CSI predictors and RNN-based predictors, in terms of complexity and cost

  3. Develope and RNN-based predictor for mMIMO.

In the next section, we will discuss the mMIMO system model.


In a time-varying mMIMO system, there are MTM_T transmitter antennas and MRM_R receiver antennas. Where, at any instance MTMRM_T \leq M_R [23]. Each antenna transmits NN time slots at a given transmission, where NMTN\geq M_T. Considering an instantaneous signal transmit and receive in mMIMO, the base-band receiver can be shown as:

r(t)=H(t)s(t)+n(t),(1)\mathbf{r}(t) = \mathbf{H}(t)\mathbf{s}(t) + \mathbf{n}(t), (1)

where r(t)=[r1(t),,rNr(t)]T\mathbf{r}(t)=[r_1 (t),…,r_{N_r}(t)]^T is an Nr×1N_r\times1 vector of the receive signal at time tt (NrN_r is the number of receive antennas) and s(t)=[s1(t),,sNr(t)]T\mathbf{s}(t)=[s_1 (t),…,s_{N_r}(t)]^T is an Nt×1N_t\times1 vector of the transmit signal at time tt (NtN_t is the number of transmit antennas). H(t)=[hnrnt(t)]Nr×Nt\mathbf{H}(t)=[h_{n_r n_t} (t)]_{N_r \times N_t} is the matrix of continues channel impulse response and hnrntC1×1h_{n_r n_t } \in \mathbb{C}^{1 \times 1} is the flat fading channel gain between transmitter antenna ntn_t and receiver antenna nrn_r. Moreover, 1nrNr1 \leq n_r \leq N_r and 1ntNt1 \leq n_t \leq N_t. Due to the multipath fading, feedback and processing delays the obtained CSI at the transmitter may be outdated before it can be used. That is, H(t)H(t+τ)\mathbf{H}(t) \neq \mathbf{H}(t+\tau), consequently, this will lead to performance degradation of adaptive communication systems [2]. The goal of channel prediction is to estimate H(t+τ)\mathbf{H}(t + \tau) at time tt to be as close as possible to the actual value at (t+τ)(t + \tau). That is H(t+τ)H^(t+τ)\mathbf{H}(t + \tau) \rightarrow \mathbf{\hat{H}}(t + \tau). Closely related MIMO prediction techniques are briefly discussed in the subsequent section.


Apart from RNN-base predictors, several other methods have been proposed for mMIMO CSI prediction. Parametric and autoregressive channel prediction models are the most popular techniques [1]. In this section, the models are briefly discussed to provide the reader with a clear distinction to the proposed work.


As stated by [6][27], a single antenna channel is represented by overlaying a set of complex sinusoids in popular multipath fading models.

h(t)=p=1Pαpej(ωpt+ϕp),(2)h(t) = \sum_{p=1}^P \alpha_pe^{j(\omega_p t + \phi_p)}, (2 )

where αp\alpha_p is the complex amplitude, ωp\omega_p is the Doppler frequency shift in radians of the pthp^{th} sinusoid, and ϕp\phi_p is the phase. j2=1j^2=-1 denotes complex units and PP represents the total number of scattered sinusoids. The single-antenna system depicted in equation (2) can be modeled to represent a MIMO propagation model shown by equation (3), by introducing spatial dimension parameters.

H(t)=p=1Pαpar(θp)atT(ψp)ej(ωpt+ϕp),(3)\mathbf{H}(t) = \sum_{p=1}^P \alpha_p \mathbf{a_r}(\theta_p) \mathbf{a}^T_t(\psi_p)e^{j(\omega_p t + \phi_p)}, (3)

where θp\theta_p and ψp\psi_p are the angle of arrival (AOA) and angle of departure (AOD) respectively. ar\mathbf{a_r} and at\mathbf{a_t} are the response vector of the receiver and transmitter antenna arrays respectively. a\mathbf{a} can be represented as a uniform linear array (ULA) with MM equally spaced antenna elements as follows:

a(x)=[1,ej(2πλ)dsin(x),...,ej(2πλ)(M1)dsin(x)]T,(4) \mathbf{a}(x) = [1, e^{-j(\frac{2\pi}{\lambda})d sin(x)},..., e^{-j(\frac{2\pi}{\lambda})(M-1)d sin(x)}]^T, (4)

where xx can either be the angle of arrival or departure, λ\lambda is the wavelength of the sub-carrier frequency, and dd is the distance between antennas. According to [1], multipath parameters change slowly compared to the channel fading rate. Therefore, future CSI up to a certain period can be obtained by simply extrapolating the known multipath parameters. Hence, channel prediction in mMIMO using equation (3) is reduced to parameter prediction. That is, a parameter prediction model to predict the total number of scatters, the angle of arrival and departure, and the Doppler shift for each path (i.e. {α^p,ω^p,θ^p,ψ^p}p=1p^{\{\hat{\alpha}_p, \hat{\omega}_p, \hat{\theta}_p, \hat{\psi}_p}\}^{\hat{p}}_{p=1} ). In the following section, we model a prediction procedure for the mentioned parameters.

1) parametric model prediction procedure

  1. Step 1:
    We define kk independent discrete-time channels {H[k]k=1,,K}\{\mathbf{H}[k]|k=1,…,K\}, sampled from the continuous channel response H(t)\mathbf{H}(t). We can therefore model a large matrix containing all the required translational invariance structure in all dimensions. According to [21], a block-Hankel matrix with dimensions NrQ×NtSN_r Q \times N_t S is represented as follows:

    D^=[H[1]H[2]...H[S]H[2]H[3]...H[S+1]........H[Q]H[Q+1]...H[K]],(5) \mathbf{\hat{D}} = \begin{bmatrix} \mathbf{H}[1] & \mathbf{H}[2] & ... & \mathbf{H}[S]\\ \mathbf{H}[2] & \mathbf{H}[3] & ... & \mathbf{H}[S+1]\\ . & . & . & .\\ . & . & . & .\\ \mathbf{H}[Q] & \mathbf{H}[Q+1] & ... & \mathbf{H}[K]\\ \end{bmatrix}, (5)

    where QQ is the size of the Hankel matrix and S=KQ+1S=K-Q+1. We use equation (5) to calculate a Spatio-temporal covariance matrix C^\mathbf{\hat{C}}.

    C^=D^D^HNtS,(6) \mathbf{\hat{C}} = \frac{\mathbf{\hat{D}\hat{D}^H}}{N_tS}, (6)

    where ()H(\cdot)^H represents the Hermitian conjugate transpose.

  2. Step 2:
    We estimate the dominant scattering sources P^\hat{P} using the minimum description length (MDL) system described by [28].

    P^=argz=1,...,(NrQ1)min[Slog(λz)+(z2+z)log(S)z],(7) \hat{P} = arg^{min}_{z=1,...,(N_rQ-1)}[{Slog(\lambda_z) + \frac{(z^2 + z)log(S)}{z}}], (7)

    where λz\lambda_z represents the zthz^{th} eigenvalue of C^\mathbf{\hat{C}}.

  3. Step 3:
    Algorithms such as multiple signal classification (MUSIC) and estimation of signal parameters by rotational invariance techniques (ESPRIT) to find {ω^p,θ^p,ψ^p}p=1p^{\{ \hat{\omega}_p, \hat{\theta}_p, \hat{\psi}_p}\}^{\hat{p}}_{p=1} from C^\mathbf{\hat{C}}.

  4. Step 4:
    Now that we have {ω^p,θ^p,ψ^p}p=1p^{\{ \hat{\omega}_p, \hat{\theta}_p, \hat{\psi}_p}\}^{\hat{p}}_{p=1}, we calculate {α^p}p=1p^{\{ \hat{\alpha}_p}\}^{\hat{p}}_{p=1} by substituting the former parameters to equation (3) and obtain the equation below.

    H(τ)=p=1Pαpar(θp)aττ(ψp)ej(ωpτ+ϕp).(8) \mathbf{H}(\tau) = \sum_{p=1}^P \alpha_p \mathbf{a_r}(\theta_p) \mathbf{a}^\tau_\tau(\psi_p)e^{j(\omega_p \tau + \phi_p)}. (8)

    where τ\tau is the time steps predicted.

From the process described above, it is evident that this model experiences some constraints. The estimation process is tedious and has high complexity due to the manipulation of high order matrices. Moreover, equations (3) and (4) are highly dependent on the type of array used. That is, if a different kind of array is used then equations (3) and (4) must be adjusted accordingly. Finally, the obtained prediction outdates faster especially in a continuously changing environment compared to a constant environment.


The mMIMO time varying channel can alternatively be formulated using an autoregressive process where Kalman filters (KF) are used to compute AR coefficients used to build a liner predictor used to predict future CSI using current and past CSI [1]. the AR predictor for mMIMO can be represented as:

H^(t1)=p=1PApH[tp+1],(9) \mathbf{\hat{H}}(t-1) = \sum_{p=1}^P \mathbf{A}_p \bigcirc \mathbf{H}[t-p+1], (9)

where Ap={anrntp}\mathbf{A}_p = \{a^p_{n_r n_t}\} is an nr×ntn_r \times n_t AR coefficient matrix, such that anrntpa^p_{n_r n_t} is the pthp^{th} AR coefficient of the channel between transmitter ntn_t and receiver nrn_r. Other predictors proposed in literature include maximum-likelihood (ML) estimation, least-square (LS) estimation, and minimum-mean-square-error (MMSE) estimation

CSI prediction using the above models is faced by a number of challenges such as high complexity, low accuracy, lack of generality, single-step prediction limitation, and unreliability, hence such methods are only suitable for small scale estimation [29]. CSI prediction process in RNNs is summarised in the subsequent section.


In this section, we will discuss the RNN model which is a powerful machine learning technique that has shown great potential in predicting time series data. RNNs are superior because they not only use training data for learning but also learn from historical data of past events. There are different models of RNNs, Fig. 2 depicts the commonly known Jordan network. A simplified RNN network consists of an input layer with NiN_i neurons, a hidden layer with NhN_h neurons, and an output layer with NoN_o outputs. Each connection between the input layer, the hidden layer, and the output layer is assigned a weighted value. Let wlnw_{ln} denote the weight between the lthl^{th} input and the nthn^{th} hidden neuron, and volv_{ol} represent the weight of the lthl^{th} hidden neuron and the otho^{th} output neuron. Such that 1nNi1 \leq n \leq N_i, 1lNh1 \leq l \leq N_h , and 1oNo1 \leq o \leq N_o. Therefore, we can construct an Nh×NiN_h \times N_i matrix W\mathbf{W} of weights shown by (10).

W=[w11...w1Ni......wNh1...wNhNi].(10) \mathbf{W} = \begin{bmatrix} w_{11} & ... & w_{1N_i}\\ . & . & . \\ . & . & . \\ w_{N_h1} & ... & w_{N_hN_i}\\ \end{bmatrix}. (10)

FIGURE 2. Activation functions for a neural network designed to introduce non-linearity.

FIGURE 3. Conventional recurrent neural network system.

The input activation vector is denoted as x(t)=[x1(t),,x(Ni)(t)]T\mathbf{x}(t)=[x_1 (t),…,x_{(N_i)} (t)]^T while the recurrent or feedback component is modeled as: f(t)=[f1(t),,f(Ni)(t)]T\mathbf{f}(t)=[f_1 (t),…,f_{(N_i)} (t)]^T. Therefore, the input to the hidden layer can be represented by the following equation:

zh(t)=Wx(t)+f(t)+bh,(11) \mathbf{z}_h(t) = \mathbf{Wx}(t) + \mathbf{f}(t) + \mathbf{b}_h, (11)

where bh=[b1h,,b(Nh)h]T\mathbf{b}_h=[b_1^h,…,b_{(N_h)}^h ]^T represents the bias in the hidden layer. In addition, we use a matrix F\mathbf{F} to map the previous output y(t1)=[y1(t1),,yNo(t1)]Ty(t-1)=[y_1 (t-1),…,y_{N_o} (t-1)]^T to the current input component. Hence,

f(t)=Fy(t1).(12) \mathbf{f}(t)=\mathbf{Fy}(t-1). (12)

The behavior of a neuron network is determined by an activation function (AF). The activation function introduces some nonlinearity characteristics to the system, which means the output is not simply a constant scaling of the input (i.e. the rate of change is not constant across all independent variables). Hence enabling it to solve complex problems. Common activation functions include liner, rectified liner hyperbolic tangent, and leaky rectified liner (see Fig. 2). In this work, we use the sigmoid function to introduce nonlinearity. The sigmoid function is defined as:

S(x)=11+ex.(13) S(x)= \frac{1}{1+e^x}. (13)

Substituting equations (11) and (12) into equation (13), we get the following equation:

h(t)=S(zh(t))=S(wx(t)+Fy(t1)+bh).(14) \mathbf{h}(t)=S(\mathbf{z}_h (t))=S(\mathbf{wx}(t)+\mathbf{Fy}(t-1)+\mathbf{b}_h ). (14)

zh(t)\mathbf{z}_h (t) is of size Nh×1N_h \times 1, therefore, equation (14) is executed in an element wise operation. That is, S(Zh)=[S(z1),,S(zNh)]TS(\mathbf{Z}_h)=[S(z_1 ),…,S(z_{N_h} )]^T. A matrix V\mathbf{V} of weights with dimensions No×NhN_o \times N_h exists between the hidden and the output layer. The input to the output layer therefore becomes, zo(t)=Vh(t)+by\mathbf{z}_o (t)=\mathbf{Vh}(t)+\mathbf{b}_y where by\mathbf{b}_y is the bias matrix for the output layer of size No×1N_o \times 1. The output equation is derived as:

y(t)=S(zo(t))=S(Vh(t)+by).(15) \mathbf{y}(t)=S(\mathbf{z}_o (t))=S(\mathbf{Vh}(t)+ \mathbf{b}_y ). (15)

However, RNNs suffers from vanishing and exploding gradient problems. The vanishing gradient problem is experienced during back-propagation. This is where the partial derivative of the loss function with respect to the current weight progressively diminishes during back-propagation and hence has no effect on the weights when performing gradient descent. On the other hand, the exploding gradient is experienced when large gradient errors accumulate causing large updates on the network weights during training. LSTM (long short term memory) networks are special types of RNNs that use gates to overcome the problems experienced by conventional RNNs. The gates in LSTM networks include input, output, and forget gates, they facilitate better control of gradient flow and prevention of long-range dependencies. In this work, we utilize LSTMs as shown in Fig. 4. Equations that describe the functionality of an LSTM unit cell are given below.

it=σ(W(i)xt+U(i)ht1),(16) i_t = \sigma (W^{(i)}x_t + U^{(i)}h_{t-1}), (16)

ft=σ(W(f)xt+U(f)ht1),(17) f_t = \sigma (W^{(f)}x_t + U^{(f)}h_{t-1}), (17)

ot=σ(W(o)xt+U(o)ht1),(18) o_t = \sigma (W^{(o)}x_t + U^{(o)}h_{t-1}), (18)

c^t=tanh(W(c)xt+U(c)ht1),(19) \hat{c}_t = tanh(W^{(c)}x_t + U^{(c)}h_{t-1}), (19)

ct=ftct1+itc^t,(20) c_t = f_t \bigcirc c_{t-1} + i_t \bigcirc \hat{c}_t, (20)

ht=ottanh(ct).(21) h_t = o_t \bigcirc tanh(c_t). (21)

\bigcirc denotes element-wise multiplication. WW denotes the recurrent connection between the previous and the current hidden layers. UU is the weight matrix between the current input and the hidden layer. C^\hat{C} is calculated based on the current input and the previous hidden state and is a hidden state candidate. CC is the internal memory of the cell, which is the sum of previous_memoryforget_gateprevious\_memory \bigcirc forget\_gate and newly_computed_hidden_stateinput_gatenewly\_computed \_hidden\_state \bigcirc input\_gate.
The first operation done in an LSTM layer is to decide whether the information is kept or discarded. This operation is done in the forget gate. A number between 0 and 1 is produced, where 1 means keep the information, while 0 means completely forget. The next step is to decide which new information will be kept in the cell state. This process is done in two parts, first, a sigmoid function called the input gate determines which values will be updated, then a tanh layer determines candidate cell states to be added. Next, the previous cell state Ct1C_{t-1} needs to be updated, that is Ct=ft×Ct1+it×C^tC_{t} = f_t \times C_{t-1} + i_t\times \hat{C}_t. where ftf_t makes the old state forget information discarded in the forget gate. Finally, the output of the cell is modeled by the output gate which is a sigmoid layer that determines which part of the cell state will be output. The cell state is passed through a layer of tanh which forces the cell state to be between -1 and 1, then the output of the output gate is multiplied by the output of the tanh layer to get the output of the cell.
The use of the forget gate in an LSTM cell helps it to decide which information needs to be discarded and which information is to be used to update the models parameters at each time step. Hence, this helps the model prevent against vanishing and exploding gradient. To understand this better, lets say that the error vanishing gradient E\partial E with respect to some weight WW at some time step k<Tk<T is:

t=1kEtW0,(22)\sum\limits_{t=1}^k\frac{\partial E_t}{\partial W} \rightarrow 0, (22)

then for the gradient not to vanish, a suitable parameter is found for the next time step such that:

Ek+1W!0.(23)\frac{\partial E_{k+1}}{\partial W} !\rightarrow 0. (23)

The presence of the activation vector in the forget gate allows it to find such a parameter.
The data set used is divided into two, training and prediction data sets. The training data set is used to train the model through backpropagation (BP). The model takes in the input xx and the preferred output yy then calculates the cost C=yx2\mathbf{C=||y-x||^2}. The error is then propagated back through the network, causing the weights to iteratively adjust until convergence is obtained and a minimum cost is achieved. To commence prediction using LSTMs, the initialization process is described below.

FIGURE 4. Long Short Term Memory (LSTM) unit cell.


  1. Randomly select W,V,bh,by\mathbf{W,V,b_h,b_y}.

  2. Define sequence length. This is the number of historical data considered when performing the prediction.

  3. Set feature period predict. This defines the time units predicted into the future.

  4. Set epoch. This is the number of times that entire data points are passed through the network.

  5. Set batch size. Number of grouped samples from the total samples processed at a time.

  6. Preprocess the data. Data is scaled and cleaned, i.e. NANs and empty rows are removed.

  7. Split data into validation and train data sets.

  8. Create an LSTM model.


The data set h\mathbf{h} is divided into two, training (60%) and testing/prediction (40%) data sets. The training data set is used to train the model through backpropagation (BP). Referring to Fig. 4, the model takes in the input x=h(t)\mathbf{x} = \mathbf{h}(t) and the preferred output y=h(t+D)\mathbf{y} = \mathbf{h}(t + D), where DD is the steps to be predicted into the future. The model then calculates the cost C\mathbf{C}, using mean squared error (MSE):

C=1Tt=1Th^(t+D)h(t+D)2,(24)\mathbf{C}=\frac{1}{T} \sum_{t=1}^T||\mathbf{\hat{h}}(t + D)-\mathbf{h}(t + D)||^2, (24)

TT is the is the total number of channel samples. The error is then propagated back through the network, causing the weights to iteratively adjust through gradient descent until convergence is obtained and a minimum cost is achieved. Initial parameters such as Ct1C_{t-1} and ht1h_{t-1} are generally initialised with zero values, while the weights are randomly initialized. The testing set is then used to measure the accuracy of the network. A 2-layer mMIMO RNN-based model predictor utilizing LSTM was trained using 200,000 samples of h\mathbf{h} from 128 antenna generated with Rayleigh distribution. In this simulation, Python 3.0 was used and the properties of Keras used as a deep learning framework were version 2.3.1, Keras-Applications 1.0.8, and Keras-Preprocessing 1.1.0. A summary table of the utilized network indicating the number of parameters and the network topology is depicted in Table 2. Adam optimizer was employed with an initial learning rate of 0.01 and a batch size 0f 256. Moreover, we use dropout regularization to prevent overfitting.

TABLE 2. Proposed LSTM network properties.

Layer (type)

Output Shape

Param No.

lstm 10 (LSTM)

(None, 128, 32)


dropout 1 (Dropout)

(None, 128, 32)


lstm 11 (LSTM)

(None, 16)


dropout 2 (Dropout)

(None, 16)


dense 3 (Dense)

(None, 64)


Total params :


Trainable params :


Non-trainable params :



Fig. 5 indicates characteristics of the mMIMO channel, where aa and bb are real and imaginary parts of the channel respectively). As it can be observed, the channel is near to a random signal due to the random nature of the environment. However, we can also observe that the data is a time series data with some repetition. Fig. 6 shows historical and true future real part of the channel data used to train the model. Similarly, the imaginary part with the same historical and future window is fed into the model for training.

FIGURE 5. Real and imaginary parts of the channel (a is real, b is imaginary).

FIGURE 6. History and future true channel data for massive MIMO.

Fig. 7 shows the validation and training loss after 10 epochs with a batch of 500. It can be observed that both training loss and validation loss gradually converge. This indicates that the model is neither over-fitting nor under-fitting due to the use of dropout layers in the model as it can be observed from Table 2. In this case, the accuracy of the model can be improved by increasing the number of epoch since the losses are still on a downwards trend.

FIGURE 7. mMIMO training and validation loss.

Moreover, Fig. 8 and 9 shows the prediction of the real and imaginary channel parts respectively of the channel. The results show that the model can estimate the channel’s characteristics with minimum complexity compared to other methods mentioned in section IV. It can also be observed from the figures that the accuracy of the predictor is not 100%. To improve the accuracy of the predictor, training data and iterations can be increased as well as LSTM layers or using different regularizers.

FIGURE 8. Real part prediction of the massive MIMO channel.

FIGURE 9. Imaginary part prediction of the massive MIMO channel.


Considering that mMIMO is a technology that has shown great potential in enhancing wireless communication in the future, this work has demonstrated that RNN-based CSI prediction is the ideal technology to boost the performance of mMIMO systems by lowering complexity and increasing accuracy. Therefore, we intent to further this work by perfecting mMIMO CSI prediction using different configurations of RNNs utilizing LSTMS or GRUs and improving the accuracy of the designed predictor.


[1] Jiang, W., & Schotten, H. D. (2019). Neural network-based fading channel prediction: A comprehensive overview. IEEE Access7, 118112–118124.

[2] Zheng, J., & Rao, B. D. (2008). Capacity analysis of mimo systems using limited feedback transmit precoding schemes. IEEE Transactions on Signal Processing56(7), 2886–2901.

[3] Jiang, W., & Schotten, H. D. (2020). Deep learning for fading channel prediction. IEEE Open Journal of the Communications Society1, 320–332.

[4] Al-Turjman, F., & Lemayian, J. P. (2020). Intelligence, security, and vehicular sensor networks in internet of things (iot)-enabled smart-cities: An overview. Computers & Electrical Engineering87, 106776.

[5] Lemayian, J. P., & Al-Turjman, F. (2019). Intelligent iot communication in smart environments: An overview. In Artificial intelligence in iot (pp. 207–221). Springer.

[6] Hamamreh, J. M., Hajar, A., & Abewa, M. (2020). Orthogonal frequency division multiplexing with subcarrier power modulation for doubling the spectral efficiency of 6G and beyond networks. Transactions on Emerging Telecommunications Technologies31(4), e3921.

[7] Al-Turjman, F., Lemayian, J. P., Alturjman, S., & Mostarda, L. (2019). Optimal placement for 5G drone-bs using sa and ga. In Drones in iot-enabled spaces (pp. 43–58). CRC Press.

[8] Al-Turjman, F., Lemayian, J. P., Alturjman, S., & Mostarda, L. (2019). Enhanced deployment strategy for the 5G drone-bs using artificial intelligence. IEEE Access7, 75999–76008.

[9] LEMAYIAN, J. P., & HAMAMREH, J. M. (2020). Autonomous first response drone-based smart rescue system for critical situation management in future wireless networks.

[10] Lemayian, J. P., & Hamamreh, J. M. (2019). First responder drones for critical situation management. 2019 Innovations in Intelligent Systems and Applications Conference (Asyu), 1–6.

[11] Hamamreh, J. M., Furqan, M. Haji , Ali, Z., & Sidhu, G. A. S. (2017). Enhancing the security performance of ostbc using pre-equalicodization. 2017 International Conference on Frontiers of Information Technology (Fit), 294–298.

[12] Love, D. J., Heath, R. W., Lau, V. K., Gesbert, D., Rao, B. D., & Andrews, M. (2008). An overview of limited feedback in wireless communication systems. IEEE Journal on Selected Areas in Communications26(8), 1341–1365.

[13] Zhang, R., Isola, P., & Efros, A. A. (2017). Split-brain autoencoders: Unsupervised learning by cross-channel prediction. Proceedings of the Ieee Conference on Computer Vision and Pattern Recognition, 1058–1067.

[14] Poncha, L. J., Abdelhamid, S., Alturjman, S., Ever, E., & Al-Turjman, F. (2018). 5G in a convergent internet of things era: An overview. 2018 Ieee International Conference on Communications Workshops (Icc Workshops), 1–6.

[15] Furqan, M. Haji , Hamamreh, J. M., & Arslan, H. (2016). Secret key generation using channel quantization with svd for reciprocal mimo channels. 2016 International Symposium on Wireless Communication Systems (Iswcs), 597–602.

[16] Yaghjian, A. (1984). Approximate formulas for the far field and gain of open-ended rectangular waveguide. IEEE Transactions on Antennas and Propagation32(4), 378–384.

[17] Yaghjian, A. (1986). An overview of near-field antenna measurements. IEEE Transactions on Antennas and Propagation34(1), 30–45.

[18] Hamamreh, J. M., El\textbackslash_sallabi, H., Abdallah, M., & Qaraqe, K. (2013). Advance in adaptive modulation for fading channels.

[19] Wang, J., Ding, Y., Bian, S., Peng, Y., Liu, M., & Gui, G. (2019). UL-csi data driven deep learning for predicting dl-csi in cellular fdd systems. IEEE Access7, 96105–96112.

[20] Yang, Y., Gao, F., Li, G. Y., & Jian, M. (2019). Deep learning-based downlink channel prediction for fdd massive mimo system. IEEE Communications Letters23(11), 1994–1998.

[21] Adeogun, R. O., Teal, P. D., & Dmochowski, P. A. (2013). Parametric channel prediction for narrowband mobile mimo systems using spatio-temporal correlation analysis. 2013 Ieee 78th Vehicular Technology Conference (Vtc Fall), 1–5.

[22] Liao, R.-F., Wen, H., Wu, J., Song, H., Pan, F., & Dong, L. (2018). The rayleigh fading channel prediction via deep learning. Wireless Communications and Mobile Computing2018.

[23] Peng, W., Zou, M., & Jiang, T. (2017). Channel prediction in time-varying massive mimo environments. IEEE Access5, 23938–23946.

[24] Zhou, S., & Giannakis, G. B. (2004). How accurate channel prediction needs to be for transmit-beamforming with adaptive modulation over rayleigh mimo channels? IEEE Transactions on Wireless Communications3(4), 1285–1294.

[25] Oien, G., Holm, H., & Hole, K. J. (2004). Impact of channel prediction on adaptive coded modulation performance in rayleigh fading. IEEE Transactions on Vehicular Technology53(3), 758–769.

[26] Wong, I. C., & Evans, B. L. (2005). Joint channel estimation and prediction for ofdm systems. GLOBECOM’05. IEEE Global Telecommunications Conference, 2005.4, 5-pp.

[27] Jaradat, A. M., Hamamreh, J. M., & Arslan, H. (2020). OFDM with hybrid number and index modulation. IEEE Access8, 55042–55053.

[28] Huang, L., Long, T., Mao, E., & So, H.-C. (2009). MMSE-based mdl method for accurate source number estimation. IEEE Signal Processing Letters16(9), 798–801.

[29] Luo, C., Ji, J., Wang, Q., Chen, X., & Li, P. (2018). Channel state information prediction for 5G wireless communications: A deep learning approach. IEEE Transactions on Network Science and Engineering.

Authors Bio.

Joel P. Lemayian:

Received the B.Sc. degree in electrical and electronics engineering from Middle East Technical University Turkey, in 2017. He is presently pursuing the master’s (M.Sc.) degree in electrical and computer engineering. He is currently with Antalya Bilim University, Turkey.

He has worked as a research assistant in both Middle East Technical University and Antalya Bilim University in IoT lab and Neuroscience lab respectively. He is an author of numerus journals, conference papers and book chapters. His research interests include UAVs, 5G Communication networks, Artificial Intelligence, Machine Learning, and the Internet of Things (IoT) applications.

Jehad M. Hamamreh:

Received the B.Sc. degree in electrical and telecommunication engineering from An-Najah University, Nablus, in 2013, and the Ph.D. degree in electrical-electronics engineering and cyber systems from Istanbul Medipol University, Turkey, in 2018. He was a Researcher with the Department of Electrical and Computer Engineering, Texas A and M University at Qatar. He is currently an Assistant Professor with the Electrical and Electronics Engineering Department, Antalya International (Bilim) University, Turkey.

His current research interests include wireless physical and MAC layers security, orthogonal frequency-division multiplexing multiple-input multiple-output systems, advanced waveforms design, multi-dimensional modulation techniques, IoT, 5G & 6G and orthogonal/non-orthogonal multiple access schemes for future wireless systems. He is a Regular Reviewer for various refereed journals as well as a TPC Member for several international conferences.

Simulation codes

The simulation codes used to generate the results presented in this paper can be accessed at here.

No comments here
Why not start the discussion?