2025.3.0 Broadcast DE Performance Issues with Live inputs on Windows 11 25H2 - Latest working versions?

Hi there,

We have a single Workstation setup with a TRX50, RTX4090 and two Decklink 8K Cards running Windows 11 and Aximmetry 2025.3.0 Broadcast DE rendering in 2160p30. 2 T CAM, 1 V CAM and two external HDSDI live inputs in 1080p30.

Until recently the setup ran fine with CPU and GPU loads well in the green (60-70%). That was until Microsoft forced a background update to Win11 25H2. This caused all kinds of problems, notably forcing us to also upgrade Blackmagic Desktop Video and the Decklink firmware on the cards, as well als the RTX drivers. All of which made it even worse, the system became unstable. Unfortunately there is no readily available backup to the last working configuration, so we had to wipe the system and start from scratch...

Now Windows 11 25H2 is out of the question, so we went back to 24H2. But there also seem to be issues with the Decklink drivers. What happens is whenever we load up one of our previously working compositions the GPU load goes well over 100%. It also happens if we build a new composition with just the 3+3 mixed camera compound and the Unreal file. As soon as we add two additional live inputs the GPU load goes through the roof.

So what is the latest stable combination of Windows 11 version, Decklink Drivers+Firmware and RTX drivers you have found to work without these issues in practice?

EDIT: Windows 11 24H2 didn´t work either, we had to go back to 23H2 to at least get some of the GPU performance back. What we also found is that relieving the Decklink card of some work by taking the PGM signal directly from one of the HDMI outputs of the RTX 4090 instead of letting the Decklink card convert it to HDSDI will at least drop the idle GPU load with 3 cameras and 2 live inputs back down to about 70%. It still spikes way beyond 100% and causes frame drops and delayed cuts whenever we switch cameras or use graphics overlays though. What should we be looking for here?

   Stefan Reck

Comments

Stefan Reck
  -  

So digging deeper into this it gets harder to pinpoint the issue... We decided to rule out hardware and firmware issues and started over with a clean install to test things:

Hardware:

- TRX 50 Mainboard 

- Threadripper 24 Core CPU

- 128 GB RAM

- TUF Gaming RTX4090

- 2x Decklink 8K Pro with 15.3 Firmware

Software:

- Win11 23H2

- Aximmetry 2026.2.0 Broadcast and Film.

So as a test we loaded up the Aximmetry Sports Studio from the Demo sets with three virtual cameras. Rendering in 2160p30, Camera inputs in 2160p30 on one Decklink card, output frame buffered to HDSDI through an output of the same Decklink card. Two additional live inputs for the virtual screens in this set are taken in from the second decklink card, one in 2160p and one in 1080p. 

Without the live inputs the set runs well, but as soon as both of the live inputs are added the GPU hits over 100% and the system starts dropping frames, especially when switching cameras. Interestingly it does make a difference whether we select the camera devices directly in the virtual camera inputs or use video input modules to wire them into the test inputs. The latter takes up more GPU resources - but why? 

We can improve things a bit by shuffling the inputs around on the Decklink cards and taking the rendered output directly from the 4090, but shouldn´t this work right out of the box? What kind of tests can we do here to get to the root cause? 

 

JohanF
  -  

You’re likely saturating the PCIe bus or the CPU to GPU memory transfer bandwidth. These issues can be very hard to troubleshoot and pin down. The difference between the input methods might be that one of them is set to 10bit? My only recommendation would be to check what physical PCIe ports you’re using and check if there are any Bios settings you can tweak. I ReBar available on AMD? 

Stefan Reck
  -  

All cameras and live inputs are 8bit SDR. I can verify that by peeking them in the flow.

The Mainboard we are using has 16x PCIe slots for the GPU and the capture cards and a total of 48 lanes.

At this point I am mainly looking for a benchmark to compare against so I can rule out faulty hardware or a firmware bug. I don't recall the existing setup ever taking such a performance hit from merely activating two additional live video inputs...

JohanF
  -  

All I’m saying is that this sounds exactly like the PCIe bus performance issues we’ve been seeing. It’s not just down yo how many lakes you have, it’s about how fast the system can shuffle all the data between the CPU and GPU and back again (the Decklink capture cards don’t interface directly with the GPU). That’s why i recommend enabling ReBar, or the AMD equivalent.

Adam@Aximmetry
  -  

Hi Stefan,

Because your setup was functioning correctly before the Windows update, the issue is likely related to a low-level system change. Possible causes include driver, firmware, or BIOS changes, PCIe topology changes, or Windows power management settings. Since the system load drops when you bypass the DeckLink SDI output and output directly from the RTX 4090, the bottleneck is most likely somewhere in the DeckLink I/O pipeline, possibly due to limited PCIe bandwidth.

Regarding Johan's comments: the limitation can also be the CPU to GPU memory transfer bandwidth, so enabling ReBar or the AMD equivalent also seems like a valid troubleshooting step.

To narrow it down, please run a simplified bandwidth stress test. First, assign only the DeckLink output ports in Aximmetry, leaving all inputs unassigned. Then gradually increase the output resolution, frame rate, and bit depth. Make a note of the exact format where bandwidth saturation occurs and frame drops begin.

Next, repeat the test in reverse by assigning only the input ports. Finally, test the combined input/output configuration. This should help identify whether the limitation is input-related, output-related, or caused by the combined I/O load.

For comparison, we ran the same test on our internal system, which uses a Blackmagic DeckLink 8K Pro capture card installed in a 4-lane PCIe slot on a Gigabyte X570S AERO G motherboard. Note that since the DeckLink 8K Pro is natively a PCIe Gen 3 x8 device, using it in a 4-lane slot effectively reduces its maximum available bandwidth by half.

In Aximmetry Broadcast DE 2025.3.0, we mapped all four SDI output ports of the DeckLink card and tested the highest format the system could output without dropping frames. The maximum stable result was DCI 4Kp30 10-bit (4:2:2) on all four outputs simultaneously. This bandwidth requirement closely matches the theoretical limit of a PCIe Gen 3 x4 slot, so the result is consistent with the expected performance in Aximmetry.

We suggest checking the actual PCIe link width and speed of your DeckLink cards, then comparing the available PCIe bandwidth with the saturation point observed in your tests. When installed in the correct slots, the DeckLink cards should operate at PCIe Gen 3 x8. If your results are lower than expected for your hardware, please review the driver, firmware, and BIOS settings, or continue with hardware-level troubleshooting.

Warmest regards,
Ádám