Cyclostationary Signal Processing (CSP) and Parallelized Computing using GPUs

Aug 16, 20233 min read

CSP Background

In a previous Spectric blog post, Cyclostationary Signal Processing (CSP) was explored by implementing the Strip Spectral Correlation Analyzer (SSCA) to analyze the cyclostationary properties of radio frequency (RF) signals. We found that the SSCA has the ability to blindly detect cyclic frequency features such as signal baud rate, chip rate, and center frequency.

The Computational Cost of the SSCA.

The SSCA provides robust and detailed information about signals without any a-priori information, but at a heavy computational cost. The algorithms used to generate the SSCA require large amounts of matrix-based math at every step. Consider the following computational load:

A block of N+Np samples is loaded into a data vector: x[n]=[ x[1] x[2] … x[N+Np]
We then generate X, the N x Np data matrix where:

Before we even begin processing this data, we can recognize that based on the values of N and Np, the processing time will increase exponentially. The plot below shows common values for N and Np and how this increases the total number of points computed by the SSCA:

Furthermore, the computation of the resulting Spectral Correlation Function (SCF) matrix: Sx (q,k) must be mapped on to the bifrequency plane: Sx(q,k) where:

f is the spectral frequency
𝛼 is the cycle frequency

The mapping follows the following equations:

Where :

This mapping phase to the bifrequency plane is critical, as it allows for the cyclic cumulants of the SCF to be used in a meaningful way when computing the Cyclic Feature Function (CFF) and other methods of analysis.

While it is clear the computational load grows larger and larger the greater the amount of data processed, there is a solution to the ballooning problem of compute time: Matrix based computations can be massively parallelized using multi-threaded computing architectures such as those that exist in Graphics Processing Units (GPUs). In this context, we will introduce the specific architecture used in this exercise: CUDA.

CUDA, Parallel Computing Platforms, and the SSCA

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model crated by NVIDIA for use in GPUs. The benefits of CUDA based processing are:

Accelerated processing: CUDA provides parallel processing capabilities which allow for faster execution of computationally intensive tasks. Our SSCA computations will execute faster compared to traditional CPUs.
Massive parallelism: CUDA enables massive parallelism by using thousands of cores within the GPU, this will allow us to compute different parts of our matrix simultaneously if we do not access the same data in multiple threads.
Reduced latency: CUDA-based processing can reduce latency of computations by offloading data into the GPUs memory to perform computation-intensive tasks, freeing up the CPU for other tasks.
Improved scalability: CUDA-based processing can be scaled up to handle larger data sets and more complex computations. Our SSCA computation scales up very quickly, so CUDA based computing will help the algorithm scale to larger numbers.

Results of Implementing a CUDA based SSCA

To test the parallelizing the SSCA algorithm, the CPU based code was re-written using CUDA enabled libraries, and computation times were benchmarked on an Intel i7-9700k @ 3.6GHz, and an Nvidia 1080 GPU. Implementation of the SSCA using CUDA based processing yielded fantastic results and were tested to be accurate to their CPU based counterparts within 1e-08. The following charts document the gains in performance, demonstrating the power of applying parallelized computing to computationally expensive algorithms:

Looking at the SSCA processing times, as the SSCA matrix size grows, the processing time scales out of control. The CPU based computation quickly scaled to a point well beyond reasonable to use for any time critical system.

The GPU/CUDA based implementation is on average, two orders of magnitude faster than the CPU based solution! Showing increased performance gains as the algorithm scales to larger matrix sizes. These processing times make implementation on time critical systems feasible, which had always been a challenge for the SSCA, giving this algorithm a path forward to be used in real-time blind signal detection and identification. This also gives the advantage that the GPU based results can quickly be fed into a Machine Learning signal detection model trained on cyclostationary spectra. Will that be next? Tune into the Spectric blog to find out!

Cyclostationary Signal Processing (CSP) and Parallelized Computing using GPUs

CSP Background

The Computational Cost of the SSCA.

CUDA, Parallel Computing Platforms, and the SSCA

Results of Implementing a CUDA based SSCA

Recent Posts

Comments