FFT Block (with Cyclic prefix insertion and removal) Control Class
Overview
The FFT block is an RFNoC block that accepts signed complex 16-bit data at its input and computes the forward or reverse FFT of the input data, outputting signed complex 16-bit data at its output.
The FFT length is configured via the length parameter, up to a maximum which depends on the instantiation on the FPGA. Use the function get_max_fft_length to determine the maximum supported FFT length.
The length will be coerced to the closest power of two which is smaller than length. The block will output packets of the same length in the desired format as configured via the API.
The block can be configured to add cyclic prefixes (typically when performing an inverse FFT, i.e. repeating a part of the generated time domain signal) or to remove cyclic prefixes (typically when performing a forward FFT, i.e. removing a part of the input time domain signal). This feature makes this block suitable for OFDM (de-)modulation.
See RFNoC FFT block example in the uhd repository for how to use the API.
Features
- Implements FFT and iFFT operations.
- Based on Xilinx FFT IP core (Xilinx Product Guide )
- Configured for continuous streaming through the FFT block
- Runtime-configurable transform sizes: \(N=2^m, m∈\{1,2,…16\}\)
- Computes an (i)FFT across the full instantaneous bandwidth of any USRP through parallel multi-sample per FPGA clock cycle processing (added in UHD 4.9).
- Cyclic prefix (CP) insertion and removal (added in UHD 4.8), enabling OFDM modulation
- CP lengths can be varied as a function of the OFDM symbol index through a runtime-configurable "CP schedule"
- Compile-time parameters enable / disable inclusion of FPGA logic to (examples)
- compute the magnitude or square magnitude of the FFT output (useful for power spectral density estimation)
- switch between computing or bypassing the (i)FFT at runtime
- add/remove cyclic prefixes
- The FFT window can span across multiple CHDR packets (added in UHD 4.8)
Theory of Operation
The RFNoC FFT block can operate on signals at very wide bandwidth by processing multiple (NIPC , Number of Items Per Cycle) samples per cycle. That is, it can be used at master clock rates (MCR) that well exceed the FPGA clock rate. This capability is achieved by instantiating multiple, so called, (i)FFT pipelines in parallel.

An (i)FFT pipeline computes either an iFFT or an FFT and can implement additional operations. Some of these operations require inclusion of FPGA logic at compile time. Some of these operations are run-time configurable as can be seen in the software interface documentation below. The RFNoC FFT block also offers the option to bypass the (i)FFT operations (pass-through of the input signal).
An (i)FFT pipeline is internally structured as shown below. Some of the blocks are present only if the respective compile-time parameter is enabled.

These are the operations each block implements:
- CP removal: removal of a configurable number of samples preceding each FFT window, as required in an OFDM receiver
- (i)FFT: FFT or iFFT operation, implemented by wrapping a Xilinx FFT block configured in pipelined streaming I/O architecture.
- FFT (forward FFT): the frequency domain samples X(k) are computed from the time domain sequence x(n) through. \( X(k)=1/S ∑_{n=0}^{N-1} x(n) e^{-j2πnk/N}, k=0,1,…,N-1 \)
S is the scaling factor.
- iFFT (inverse FFT): \( x(n)=1/S ∑_{k=0}^{N-1} X(k) e^{j2πnk/N}, n=0,1,…,N-1\)
- Bit reversal: Xilinx IP is configured for outputs in bit-reversed order. This block arranges the output into natural order.
- FFT shift: shifts DC to the center of the FFT output.
- CP insertion: prepending a copy of a configurable number of last samples of an iFFT output as required in an OFDM transmitter.
- (Squared) Magnitude ( \(| |, | |^2\)): computes the (squared) magnitude of each output sample.
FPGA Compile-Time Configuration
Users can configure the blocks' capabilities at FPGA compile-time by configuring a set of parameters in the RFNoC image core yaml file where the block is instantiated.
User-Definable Parameters
User-definable FPGA compile-time parameters are documented in the blocks' top-level system verilog file rfnoc_block_fft.sv and listed below.
// Module: rfnoc_block_fft
//
// Description:
//
// RFNoC block for multichannel FFT/IFFT plus cyclic prefix insertion/removal.
//
// User Parameters:
//
// THIS_PORTID : Control crossbar port to which this block is connected
// CHDR_W : AXIS-CHDR data bus width
// MTU : Log2 of maximum transmission unit
// NIPC : Number of samples/items per clock cycle to
// process internally.
// NUM_PORTS : Total number of FFT channels
// NUM_CORES : Number of individual cores to instantiate.
// Setting to 1 means all ports use a shared core
// and therefore all ports share the same control
// logic and all ports must be used simultaneously.
// Setting to NUM_PORTS means that each port will
// use its own core, and therefore each port can
// be configured and used independently. NUM_PORTS
// must be a multiple of NUM_CORES.
// MAX_FFT_SIZE_LOG2 : Log2 of maximum configurable FFT size. That is,
// the FFT size is exactly 2**fft_size_log2.
// EN_CP_INSERTION : Controls whether to include the cyclic prefix
// insertion logic. If included, EN_FFT_ORDER must
// be 1.
// EN_CP_REMOVAL : Controls whether to include the cyclic prefix
// removal logic.
// MAX_CP_LIST_LEN_INS_LOG2 : Log2 of max length of cyclic prefix insertion
// list. Actual max is 2**MAX_CP_LIST_LEN_INS_LOG2.
// MAX_CP_LIST_LEN_REM_LOG2 : Log2 of max length of cyclic prefix removal
// list. Actual max is 2**MAX_CP_LIST_LEN_REM_LOG2.
// CP_INSERTION_REPEAT : Enable repeating the CP insertion list. When 1,
// the list repeats. When 0, CP insertion will
// stop when the list is finished.
// CP_REMOVAL_REPEAT : Enable repeating the CP removal list. When 1,
// the list repeats. When 0, CP removal will
// stop when the list is finished.
// EN_FFT_BYPASS : Controls whether to include the FFT bypass logic.
// EN_FFT_ORDER : Controls whether to include the FFT reorder logic.
// EN_MAGNITUDE : Controls whether to include the magnitude
// output calculation logic.
// EN_MAGNITUDE_SQ : Controls whether to include the
// magnitude-squared output calculation logic.
// USE_APPROX_MAG : Controls whether to use the low-resource
// approximate calculation (1) or the more exact
// and more resource-intensive calculation (0) for
// the magnitude calculation.
//
The block description file fft.yml lists the default parameter values.
parameters:
NIPC: 1
NUM_PORTS: 1
NUM_CORES: 1
MAX_FFT_SIZE_LOG2: 10
EN_CP_REMOVAL: 1
EN_CP_INSERTION: 1
MAX_CP_LIST_LEN_INS_LOG2: 5
MAX_CP_LIST_LEN_REM_LOG2: 5
CP_INSERTION_REPEAT: 1
CP_REMOVAL_REPEAT: 1
EN_FFT_BYPASS: 0
EN_FFT_ORDER: 1
EN_MAGNITUDE: 0
EN_MAGNITUDE_SQ: 1
USE_APPROX_MAG: 1
The table below summarizes the valid parameter ranges. A range without upper bound implies that the maximum value for a parameter is dictated by the available FPGA resources, but not by the FPGA IP design.
| Compile-Time Parameter | Range | Comment |
| NIPC | \( 2^N, N=\{0, 1, 2, ...\}\) | |
| NUM_PORTS | 1, 2, 3, ... | Must be integer-divisible by NUM_CORES |
| NUM_CORES | 1, 2, 3, ... | |
| MAX_FFT_SIZE_LOG2 | 10, 11,…,16 | 1k FFT … 64 k FFT |
| EN_CP_REMOVAL | 0, 1 | |
| EN_CP_INSERTION | 0, 1 | |
| MAX_CP_LIST_LEN_INS_LOG2 | 1, 2, 3, ... | |
| MAX_CP_LIST_LEN_REM_LOG2 | 1, 2, 3, ... | |
| CP_INSERTION_REPEAT | 0, 1 | |
| CP_REMOVAL_REPEAT | 0, 1 | |
| EN_FFT_BYPASS | 0, 1 | |
| EN_FFT_ORDER | 0, 1 | |
| EN_MAGNITUDE | 0, 1 | |
| EN_MAGNITUDE_SQ | 0, 1 | |
| USE_APPROX_MAG | 0, 1 | Estimate of the magnitude that saves FPGA resources. |
Choosing Parameter Values
It is important to understand the high-level architecture of the RFNoC FFT block to pick proper compile-time parameters. The figure below introduces the different components that an RFNoC FFT block is composed of.

NUM_CORES, NUM_PORTS, NIPC
These parameters allow to reduce FPGA resource utilization by exploiting that multiple channels may be processed using the same compile-time or run-time parameters.
An RFNoC FFT block can be configured to contain multiple (i)FFT pipelines. Multiple (i)FFT pipelines are useful to process multiple channels within a single RFNoC FFT block. This is more resource-efficient compared to processing each channel in a dedicated RFNoC FFT block. More specifically, an RFNoC FFT is internally sub-divided into one or multiple FFT cores. Each FFT core is further sub divided into one or multiple FFT pipeline wrappers. Finally, each FFT pipeline wrapper can contain one or multiple FFT pipelines as shown in the figure above.
All FFT cores share the same compile time configuration but can have individual runtime configurations. For instance, the FFT direction and size or the CP length may be chosen differently per FFT core, at runtime. The maximum FFT size on the other hand is determined at compile time and shared across all FFT cores. The number of input ports NUM_PORTS to the RFnoC FFT block must be integer-divisible by the number of FFT cores. The first set of NUM_CHAN=NUM_PORTS / NUM_CORES ports will be processed by the first FFT core, the second set of NUM_PORTS / NUM_CORES ports by the second FFT core and so forth.
All channels allocated to an FFT core must be used simultaneously, otherwise the FFT core will stall.
Each channel processed by an FFT core is assigned its dedicated FFT pipeline wrapper. The parameter NIPC controls the number samples per FPGA clock cycle to be processed by the FFT pipeline wrapper. The number of FFT pipelines per FFT core equals NIPC. NIPC can be computed from ceil(MCR / CE). CE is the compute engine clock rate (see: https://kb.ettus.com/RFNoC_Frequently_Asked_Questions#What_are_the_clock_frequencies.3F)
and MCR the master clock rate. Also, some margin is needed. That is, as a rule of thumb ceil(MCR / CE) / (MCR / CE) > 1.05 should be satisfied.
Example: Assume we are using an MCR of 491.52 MS/s on USRP X410 ( CE = 266.667 MHz). MCR / CE = 1.84, ceil(MCR / CE)=2 and ceil(MCR / CE) /
(MCR / CE) = 1.09. NIPC=2 is sufficient in that case.
Choosing compile-time configurations to minimize FPGA resource utilization .
- Enable only the logic that is required (example: CP removal / addition, (squared) magnitude computation, pick the minimal maximum FFT size required by the application)
- Process as many channels as possible within a single RFNoC FFT block and FFT core. Suppose you want to process N channels. Then the most resource-efficient configuration is to use a single RFNoC FFT block with NUM_PORTS = N and NUM_CORES = 1. Of course, this is only possible if all channels share the same compile and runtime configuration and are processed simultaneously.
Dependent compile-time parameters
- EN_FFT_ORDER and EN_CP_INSERTION: The bit reversal, FFT shift and CP insertion blocks share FPGA logic.
- EN_CP_INSERTION = 1 requires EN_FFT_ORDER = 1, otherwise the FPGA compile will fail.
Example FPGA Compile-Time Configuration
An example of a typical OFDM transmitter and receiver configuration covering 400 MHz of RF bandwidth on an USRP X410 is shown below. A maximum 8192 size FFT is configured leading to a minimal subcarrier spacing of 60 kHz. At the transmit side, cyclic prefix insertion is enabled. At the receive side, cyclic prefix removal is enabled.
| Compile-Time Parameter | Value Transmitter | Value Receiver |
| NIPC | 2 | 2 |
| NUM_PORTS | 2 | 2 |
| MAX_FFT_SIZE_LOG2 | 13 | 13 |
| EN_CP_REMOVAL | 0 | 1 |
| EN_CP_INSERTION | 1 | 0 |
| EN_MAGNITUDE | 0 | 0 |
| EN_MAGNITUDE_SQ | 0 | 0 |
Advanced Topics
Xilinx FFT Block Configuration
The RFNoC FFT block is based on a Xilinx FFT block. Different configurations for different maximum FFT sizes have been prepared. The respective Xilinx settings for IP core generation can be obtained by opening the respective .xci files in Xilinx Vivado (example for the maximal 16k FFT: xfft_16k_16b.xci )
An example is shown below.

Implementing Different Subcarrier Spacings in OFDM Systems
When implementing an OFDM system, there is a target subcarrier spacing (inter FFT frequency bin spacing), typically.
Case 1 : FPGA design does not implement sample rate conversion through DUC/DDC

Assume the (i)FFT block is directly connected to the radio block, i.e., the input rate to the FFT block or the output rate of the FFT block equals the master clock rate (MCR). The subcarrier spacing \( Δf_{sc} \) is
\( Δf_{sc} = MCR / N_{FFT} \)
For instance, on USRP X410 a master clock rate of 245.76 MHz and an FFT size \( N_{FFT} \) of 8192 result in a \( Δf_{sc}=30 kHz \) subcarrier spacing.
Case 2 : FPGA design does implement sample rate conversion through DUC/DDC

Assume the (i)FFT block is connected to the radio block through a DDC/DUC block implementing a sample rate conversion factor R. The subcarrier spacing \( Δf_{sc} \) is
\( Δf_{sc} = MCR / {R N_{FFT}} \)
For instance, on USRP X410 a master clock rate of 245.76 MHz, an FFT size of 4096 and a sample rate conversion factor of R=2 result in a 30 kHz subcarrier spacing.
Approximate Processing Latency
The approximate processing latency of the RFNoC FFT block in cycles \( N_{Lat} \) is
\( N_{Lat}≈NIPC⋅(N_{FFT}+N_{CP}) \)
The latency \( T_{Lat} \) in seconds depends on the compute engine clock rate CE
\( T_{Lat}≈N_{Lat}/CE \)
Example for a 400 MHz USRP X410 OFDM configuration:
- MCR = 491.52 MS/s
- CE = 266.667 MHz
- NIPC = 2
- N_FFT=8192, N_CP=576 (subcarrier spacing = 60 kHz)
\( T_{Lat}≈N_{Lat}/CE≈17536/(266.667⋅10e6) s≈66us \)
Time Stamping
CHDR data packets containing FFT input or output data can comprise time stamps. The figure below illustrates how these are computed for the example:
- OFDM configuration – cyclic prefixes are added or removed
- Burst containing 2 OFDM symbols

- For the first packet in a burst:
The incoming packet contains a time stamp for the start of the burst that is provided by either the host or the radio block. This time stamp is in units of the master clock rate. The packet at the output of the (i)FFT block contains this exact same timestamp without modifications. In other words, the FFT block doesn't adjust the timestamp to compensate for the insertion or removal of the cyclic prefix.
- For subsequent packets in a burst:
The FFT block increments the time stamp for the start of the burst by the number of samples per packet (SPP).
Behavior In Case of Streaming Errors
It is possible that transmit or receive packets are dropped on the transport link that connects USRP and host computer. This section discusses how to detect and handle these situations.
RX Streaming (FFT operation)
- After each receive call, check for the out_of_sequence flag in the stream metadata.
- If the receive call metadata indicates the data is out of sequence, then the application can use the time_spec to determine where this data goes in the sequence of expected data and take appropriate action (e.g., discard the data up to the start of the next symbol).
- Streaming can continue as before after this point.
TX Streaming (iFFT operation)
- The application sets up a thread to receive and monitor asynchronous messages ( rev_async_msg() ) from the TX streamer.
- When an async message reports an underflow or sequence error, then the application should end the current burst and wait for the burst ack.
- The FFT block will automatically end the data stream and flush any data out of the FFT block. This may cause the last packet to be sent with corrupted data, since part of an input symbol was lost, but should leave the FFT block in a good state for the next burst.
- If the application doesn't receive a burst ack, then it should reset the FFT block, and reconfigure it before proceeding.
- After this the application can restart streaming from where it left off or at some later point in time as needed by the application.
Timed Commands
The FFT block does not support timed commands. That is, none of its runtime-configurable properties can be configured in a deterministically timed fashion.
Configuring the CHDR Packet Lengths for Initial Debugging
Debugging can be simplified by configuring the CHDR packet length such that an integer multiple of the CHDR packet length equals the FFT size. The FFT functionality does not break if this recommendation is not followed. Example: In case of a 4096 point FFT, the CHDR packet length could be chosen equal to 1024 samples per packet.
Known Limitations
- MAX_FFT_SIZE_LOG2 = 16 (64k FFT) can be configured but has not been validated in hardware
- MAX_FFT_SIZE_LOG2 = 15 (32k FFT): using this maximum-size FFT implementation in an 8 point FFT configuration does not work properly.