Edge Inference vs. Cloud Inference: Why 5.5G and 6G Make the RU the Right Place to Run AI

Two Ways to Run a Neural Receiver

There is nothing technically wrong with running a neural receiver in the cloud. NVIDIA proved it works — their CNN+GNN architecture achieves near-LMMSE BER performance when executed on an A100 GPU connected to a radio unit over high-bandwidth fronthaul. The Aerial RAN Computer Pro platform is built on exactly this premise: ship the raw IQ data from the RU to a GPU cluster at the DU, run inference there, send back the decoded results.

The problem is not whether this works in a lab. The problem is whether the economics hold as the network densifies — and in 5.5G and 6G, they don't.

Cloud Inference (NVIDIA Aerial approach)

RU: FFT · CP removal
↕  25–50 Gbps eCPRI fiber
   per radio unit
DU: GPU cluster
   Neural Receiver (A100)
   ch.est + EQ + demap
↕  LLRs
DU: LDPC decode

Works for macro. Breaks at density.

→

Edge Inference (neuraRAN approach)

RU: FFT · CP removal
   Neural Receiver (INT8 silicon)
   ch.est + EQ + demap in 1 pass
↕  ~2 Gbps LLRs
   standard Ethernet
DU: LDPC decode only

Works for macro. Scales to density.

The Physics Problem That Changes the Math

The core issue is propagation. Higher frequencies travel shorter distances — sub-6 GHz signals propagate hundreds of meters, millimeter-wave bands used in 5.5G fade meaningfully within 100–300 meters outdoors, and sub-terahertz bands targeted for 6G lose usable signal within tens of meters. This is not a deployment choice. It is physics.

The structural consequence: achieving consistent mmWave coverage in dense environments requires roughly 60 small cells per square mile, versus 1–3 macro sites per square mile for sub-6 GHz. That is a 20–60× multiplier in the number of radio units deployed per unit area.

If a sub-1 GHz network requires ~50,000 cell sites to cover the continental US, equivalent mmWave 5.5G coverage in dense areas would require millions of small cell sites. Each site is a radio unit. Each radio unit running the cloud inference model needs its own 25–50 Gbps fiber run to a DU.

This is where the cloud inference model hits a wall. Running dedicated eCPRI fiber to 20–40 small cells on a single city block is not economically viable. The fronthaul cost alone — before you factor in the GPU compute at the DU — makes dense mmWave deployment prohibitive. The cloud inference architecture was designed for macro cell topologies. It was never meant to scale to small cell density.

5.5G Is Already Forcing the Issue

"5.5G" — 5G-Advanced, 3GPP Release 18/19 — is not a roadmap slide. Commercial launches are underway in China and the Middle East. Operators are activating mmWave bands for stadiums, transportation hubs, enterprise campuses, and manufacturing floors. Massive MIMO arrays of 64T64R and 128T128R are standard in 5.5G RU hardware, meaning more spatial streams, more PUSCH processing, and more compute demand per radio unit.

The traditional DSP pipeline — LMMSE channel estimation, K-Best MIMO equalization, QAM soft demapping — burns 500–700 mW per radio unit just for the uplink receiver path. Running that across 30–40 small cells per block makes the power envelope untenable for operators with net-zero commitments. The cloud inference model doesn't solve this; it relocates the compute to a GPU cluster that burns far more per inference than purpose-built silicon would.

The Market Numbers Behind the Density Wave

The global small cell 5G network market was $2.76 billion in 2023 and is projected to grow at 70.7% CAGR through 2030 (Grand View Research), with a separate Fortune Business Insights estimate placing it at $74.6 billion by 2032. Over 3 million small cells were already deployed worldwide in 2023, with over 20% of new deployments that year already using mmWave technology.

Year	Phase	Global Small Cells	Market Value
2023	5G sub-6 / early mmWave	~3M installed base	~$2.8B
2025	5G mmWave scaling	Growing rapidly	~$7.5B
2030	5G mmWave + early 6G R&D	Tens of millions globally	$75–126B
2032–35	6G commercial rollout begins	Hundreds of millions (potential)	Nascent — TBD

Sources: Grand View Research ($125.5B by 2030 at 70.7% CAGR), Fortune Business Insights ($74.6B by 2032 at 38.7% CAGR), MarketGrowthReports, PatentPC.

Per-site installation costs run $10,000–$15,000 in dense cities. Net-new small cell deployments are projected to exceed 1 million per year globally by 2030. This is durable physical capex — operators either deploy their licensed spectrum or lose their spectrum rights. The densification wave is not optional.

Why Edge Inference Wins at Density

The case for moving inference to the radio unit is not primarily about AI performance. The neural receiver's BER parity with LMMSE is already validated by NVIDIA research. The case is about deployment economics at the scale that 5.5G and 6G require.

01 — Fronthaul economics

Reducing the fronthaul from 25–50 Gbps IQ to ~2 Gbps LLRs — a 10–20× compression — means dense small cell networks can run over standard Gigabit Ethernet instead of dedicated eCPRI fiber. That removes the single largest capex line item in any mmWave deployment. The economics of densification only close with edge inference.

02 — Power per site

Sub-150 mW INT8 inference silicon versus 500–700 mW legacy DSP — and versus the kilowatts of GPU compute required to run cloud inference for the same number of sites. At 40 small cells per block, the power delta is not incremental. It determines whether the network can be powered at all without dedicated infrastructure.

03 — Latency and resilience

Edge inference means the radio unit operates independently of fronthaul availability. Cloud inference means a fiber cut or DU failure takes down every radio unit behind it. For private 5G networks in industrial facilities, this is a tier-1 reliability requirement, not a nice-to-have.

04 — The missing piece

NVIDIA validated the neural receiver on A100 GPUs — the wrong silicon for a small cell radio unit. Moving inference to the edge requires purpose-built hardware: hardcoded INT8 inference logic that runs the 730K-parameter forward pass at sub-1W inside the RU itself. That silicon does not exist as a commercial product yet.

Bottom Line

Cloud inference was the right starting point. It let the industry validate that neural receivers work — that a CNN+GNN can replace LMMSE and K-Best in the uplink receiver chain without sacrificing BER performance. That validation is complete.

The next step is not faster cloud inference. It is moving inference to where the physics of 5.5G and 6G demand it: inside the radio unit, running on purpose-built silicon, at the power budget and form factor a small cell can actually sustain. A market growing from $2.8B to $75–126B in under a decade is one forcing function. The physics of millimeter-wave propagation is another. Together, they make edge inference not just technically attractive but economically necessary.

The neural receiver is validated. The silicon to run it at the edge doesn't exist yet as a product. That's what neuraRAN is building.