Abstract. With the widely use of 4G network, the corresponding bandwidth processing has become a critical issue. The current recognized 4G network is LTE-A. In the baseband processing for LTE-A, the processing of its physical layer algorithm is the biggest bottleneck for current processors. The use of application specific integrated circuit (ASIC) design has become necessary. This article will introduce a communication dedicated coprocessor (TxCP), specifically for LTE-A physical layer uplink shared/control channel (PUSCH/PUCCH) algorithm for bit-level acceleration. Its internal support for PUSCH/PUCCH CRC, Turbo encoding, equation definable convolutional encoding, data channel and control channel rate matching, channel interleaving, scrambling and modulation supporting QPSK, 16QAM and 64QAM. And in order to ensure that the coprocessor has a certain degree of flexibility, its internal controller design will support a variety of modes to ensure that some of the algorithm modules can be run separately. The programming model of the processor is relatively simple, and the user does not need to go through a complicated design.

1 Introduction

In the physical layer processing flow of LTE-A, the encoding process involves many small bitwidth bit-level processing, such as the characteristics of high density data operations (channel estimation), small bitwidth data interleaving characteristics (rate matching, channel interleaving), and some specific complex algorithms (Turbo decoding, Viterbi decoding). In different channels, these algorithms will have different application modes. The coding channels in LTE/LTE-A are also divided into various types, such as PUSCH, PUCCH, and PRACH [1]. In this article, we mainly support the PUSCH and PUCCH for coding and modulation process. There is a high demand for peak rates in LTE physical layer solutions. LTE requires the downlink peak rate to reach 100Mbps, the uplink peak rate to reach 50Mbps [2]. This may be relatively easy for baseband processing designs. However, in LTE-A, the downlink peak rate requirements of 1Gbps, uplink peak rate requirements of 500Mbps [3]. This brings a lot of challenges to baseband processing.

At present, baseband processing is mainly based on general purpose processor (GPP), FPGA, digital signal processor (DSP) and coprocessor. But for the bit-level processing of the physical layer coding processing, GPP and DSP are difficult to achieve the desired speed requirements. For large-scale application solutions, the cost of FPGA is a problem, and FPGA will bring a lot of power problems. In this case ASICs have been specifically designed for hardware acceleration of PDSCH, PBCH and PMCH.

In this article, we present a coprocessor (TxCP) design specifically for PUSCH and PUCCH. The coprocessor specifically accelerates the encoding process for PUSCH and PUCCH. The entire encoding process involves the cyclic redundancy check (CRC), turbo encoding, convolutional encoding, data information and control information rate matching, channel interleaving, and scrambling and modulation supporting for three modulation modes. In addition, in order to ensure that the operation modes of TxCP have some flexibility, TxCP internal parts of the algorithm modules can be separately scheduled.

The rest of this paper is organized as follows. Section 2 describes the overall architecture, internal modules, and the internal operations of the controller of TxCP. Section 3 introduces the results of the TxCP performance evaluation, including the synthesis results provided by Synopsys's DC tools. Finally, we summarize our work in section 4.

2 TxCP Architecture

The TxCP is internally composed of six configurable submodules for the LTE/LTE-A physical layer algorithm (CRC, TEC, CCE, RM, CIM, SMC). The configuration of each submodule is controlled by a unified top-level controller.

2.1 Overall Architecture

The TxCP is divided into three main parts: the controller, the memory, and the algorithm modules. The TxCP is
internally controlled by a controller. The memory is responsible for storing the input data of the TxCP and the intermediate processing result of the algorithm modules. The algorithm modules accelerate PUSCH algorithms, a total of six modules. The overall architecture is shown in Figure 1.

2.2 Cyclic Redundancy Check

The Cyclic Redundancy Check module completes the CRC check in 4G communication. The module consists of CRC-8, CRC-16, CRC-24a, CRC-24b four basic modules and control module. CRC-8 is mainly used for checking CQI information. CRC-16 is mainly used for downlink control channel and broadcast channel data transmission verification. CRC-24A and CRC-24B are mainly used for the verification of PUSCH data. CRC supports big/little-endian two different storage methods. And for multi-code block check, CRC supports both CRC-24A and CRC-24B. As the input interface is 8-bit width, CRC complements 0 in high bit for the code blocks which its bit length is not an integral multiple of 8-bit. And the check information is appended into the data information to output.

2.3 Turbo Encoding Coprocessor

The turbo encoding coprocessor (TEC) implements the turbo encoding of data information. In accordance with the requirements of the LTE-A protocol [5], this module has three-way information output, respectively, the original information, direct check information, and information interleaved in the memory and then encoded. The original information is essentially obtained by directly outputting the input data. The direct check information is obtained by encoding the input data. So the original information in this module and the direct check information is output in parallel. The third path information is obtained by interleaving all the data read into memory and then encoding the information after the interleaving is completed. This completes the 1/3 rate turbo coding function.

2.4 Convolutional Coding Engine

The convolutional coding engine (CCE) applies a more general solution. CCE can not only solve the LTE-A scenario, but also contain all the 1/4 bit rate, 1/3 rate and 1/2 rate of the constraint length of 7 convolutional coding, while supporting non-tail-biting and tail-biting convolutional coding. To meet these solutions, CCE allows the user to configure the equations required for convolutional coding through the registers in the TxCP. Therefore, in the LTE-A solution, only need to apply coding rate 1/3 to CCE. Figure 2 shows the convolutional coding solution in LTE-A with three-way convolution constraints of 133 (octal), 171 (octal), and 164 (octal). In order to achieve 1/3 rate, the fourth convolutional encoding of the configuration register is set to 0, the other three in its corresponding register can be configured in the corresponding value.

2.5 Rate Matching

The rate matching module supports the rate matching of the data channel and the control channel, and also has the function of code block concatenation. The rate matching module first needs to interleave the three sets of data streams from the turbo encoder or the convolutional encoder separately (the data channel and the control channel are interleaved differently) and are filled with null bits, and then the data is passed to the rate matching of the bit collection, selection and transmission module. The bit selection module is responsible for entering 3-way data streams into a circular buffer. The null bits are then deleted when selecting bits, and the result is output according to the start bit and length in the configuration [5].


2.6 Channel Interleaving Module

Channel interleaving module (CIM), to achieve the RI, CQI, DATA, ACK four different information intertwined [6]. Channel interleaving module needs to support QPSK, 16QAM, 64QAM three different modulation methods, support different loop prefix mode, support different information of different interweaving. The module uses 6-bit bitrate 10K memory as the physical mapping of the interleaving matrix, mapping the interleaved matrices of different number of columns (9,10,11,12) into memory. Using 8-bit output interface, the use of travel listed in the way, the output information to the lower module. For RI, ACK, and CQI information for some modules, cyclic fetches are made from 32-bit registers, reducing the dynamic power consumption of information in memory and reducing overall power consumption.

2.7 Scrambling and Modulation Coprocessor

The scrambling and modulation coprocessor (SMC) mainly scrambles and modulates the data outputted by the channel interleaver. SMC supports the generation of pseudo-random sequences, generating random sequences in a specific way according to the initial sequence set. And then the generated random sequence and input data XOR to scramble the data. Finally, the results of the scrambling are modulated. In this design, the modulation supports three modes, QPSK, 16QAM and 64QAM. In LTE-A, the placeholder X/Y will be used in the ACK and RI information. In the scrambling process, the position of the placeholder X/Y needs to be judged in order to obtain the correct symbol for modulation. So the mechanism used to determine the ACK and RI information is added to the SMC. The judgment process is based on the information provided by the CIM module.

2.8 TxCP Controller

The controller is responsible for the overall control within the TxCP, including the entire PUSCH channel encoding process and the control of the modulation process. TxCP supports up to seven operating modes through a finite state machine: Data only mode, Data + UCI mode, UCI only mode, CRC mode, TEC mode, CCE mode, CIM mode. The first three modes are the control modes for the PUSCH. The throughput $Thp_{in}$ is calculated as shown in equation (1), where $D_{in}$ is the total input data and $T$ is the total time from TxCP startup to sending interrupt request:

$$Thp_{in} = D_{in} / T$$  \hspace{1cm} (1)

The output data in unit time $Thp_{out}$ is calculated as shown in equation (2), where $D_{out}$ is the total output data of the test case:

$$Thp_{out} = D_{out} / T$$  \hspace{1cm} (2)

As shown in the table, TxCP has a high throughput in Data-only mode and Data + UCI mode. One of the main reasons why TxCP perform bad in UCI-only mode is that the input data in this mode is quite small, and the rate matching module and the channel interleaving module interleave and reuse the data multiple times, resulting in a large amount of output data. So the throughput in this mode will be very low. The corresponding $Thp_{out}$ has a
We can conclude that the test case that produces the maximum throughput multiplexed the data so many times. 

**Table 1. TxCP Maximum throughput in different operating mode.**

<table>
<thead>
<tr>
<th>TxCP operating mode</th>
<th>Maximum Th$_{pin}$ (Mbps)</th>
<th>Th$_{pout}$ of corresponding test cases (Mbps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data-only</td>
<td>297.44</td>
<td>1866.30</td>
</tr>
<tr>
<td>Data+UCI</td>
<td>289.64</td>
<td>1769.27</td>
</tr>
<tr>
<td>UCI-only</td>
<td>0.39</td>
<td>5128.78</td>
</tr>
</tbody>
</table>

Table 2. The synthesis results of TxCP.

<table>
<thead>
<tr>
<th>Technology</th>
<th>TSMC 28nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>Area</td>
<td>25</td>
</tr>
<tr>
<td>Power</td>
<td>20</td>
</tr>
<tr>
<td>Frequency</td>
<td>19</td>
</tr>
</tbody>
</table>

TxCP not only has a high throughput, its area and power consumption on the same good performance. The TxCP synthesis is carried out under the DC tool provided by Synopsys. The results obtained are shown in Table 2.

### 4 Summary

As a solution for LTE-A PUSCH/PUCCH, TxCP has six modules for the communication physical layer algorithm. Part of the algorithm modules use a common design, to ensure that the TxCP has a certain degree of scalability and compatibility. The TxCP is controlled by an internal controller. The controller can support seven operating modes. There are three modes dedicated to the PUSCH of LTE-A, and the remaining four modes can individually schedule the sub-modules within TxCP to cope with more application scenarios. TxCP has a good performance, running at 600MHz under the maximum throughput of up to 297.44Mbps. TxCP also has a very good area and power performance.

### Acknowledgments

This work is supported by the Strategic Priority Research Program of Chinese Academy of Sciences (under Grant XDA-06010402).

### References

1. 3GPP TS36.211, Evolved Universal Terrestrial Radio Access (EUTRA); Physical Channels and Modulation (Release 11).
5. 3GPP TS36.212, Evolved Universal Terrestrial Radio Access (EUTRA); Multiplexing and channel coding (Release 11).