Discussion: NVIDIA Display Driver (nvlddmkm.sys) DPC Latency

Status
Not open for further replies.

Hellbovine

Well-Known Member
Note: This post summarizes the entire thread so you do not have to read over 700 replies!

UPDATE (AUGUST 11TH, 2023)
You are probably reading this because you used LatencyMon and it revealed an issue in the Nvidia graphics driver. Everything you need to know is contained in this post, so keep reading. This thread started as an IRQ conflict, but evolved as we learned more, and eventually resulted in the discovery of several bugs causing high latency, which reveals itself in the form of stutters, freezes, crashing, and audio problems.

After months of research and testing, this thread prompted Nvidia and Microsoft to finally acknowledge these issues, which have plagued gamers and audio enthusiasts for many years. This thread also resulted in the creation of a guide that helps people optimize their computers for lower latency. It is important that we make a distinction here because half of these Deferred Procedure Call (DPC) issues are due to Nvidia and Microsoft bugs, while the other half come from computers that need optimizing.

The purpose of this thread was to identify the cause of the malfunctioning nvlddmkm.sys driver and find a tweak to mitigate it. While there were no reliable fixes discovered, Nvidia later added it to their open issues tracker, so we then waited for their response. Several updates were released to fix various bugs, and then an update specifically for the 3xxx series (Ampere architecture) was included because those cards had additional issues that were making the latency even worse. These fixes are included in drivers dated July 18th, 2023 or newer, and while legacy cards still receive security updates, it is unlikely they will get any other fixes, such as those related to DPC latency.

Other problems affecting Nvidia were bugs in the Desktop Window Manager (DWM) and Timeout Detection and Recovery (TDR) features. Microsoft has several preview fixes (link1, link2, link3, link4) for these, but it may be a while before they are available to everyone, and additional updates might be needed after the public has a chance to provide feedback. If the updates are finalized in time, they will be included in the ISO images (link5, link6) that Microsoft posts near the end of each year, and the best advice would be to do a clean install of Windows using those updated builds, and the latest Nvidia driver. Only Windows 10 and 11 will be receiving these updates.

For all other DPC issues, solutions have existed forever in the form of tweaking a computer for more performance. The Gaming Lounge (link7) has a huge list of important information regarding this. The takeaway here is to optimize, which means learning how to cleanly install the operating system, properly installing drivers and firmware, adjusting the BIOS for low latency, and using NTLite to slim down Windows.

Hopefully the Nvidia driver will be fully resolved by the start of 2024, but if the issues persist then all we can do is encourage everyone to submit support tickets to Nvidia and the Microsoft Feedback Hub, contact computer and gaming sites to have them write articles about it, and go to social media to continually remind these companies of the problem until it is addressed.

ORIGINAL POST (JULY 15TH, 2022)
LatencyMon reveals that nvlddmkm.sys (Nvidia Graphics driver kernel) has DPC latency spikes up to 800 microseconds fairly frequently. This is an extremely common issue that can be found all over Google, but I have not come across any solutions, except for a nonsensical one.

According to a post at LinusTechTips forum (link) this is happening because the graphics card is sharing an IRQ with a problematic device. On my computer, msinfo32.exe says the graphics card is sharing IRQ 16 with a motherboard USB host controller.

The solution is to force the IRQ to be reassigned, but the steps are extremely clunky. There has to be a better way than this? I already have an idea in mind that I will test tomorrow, where I will go into my BIOS and disable USB ports until I find the ones tied to this controller, and then move hardware around and reinstall Windows. This seems better than doing the steps listed in the solution, but may not help.
 
Last edited:
Or you can also use Nvidia slimmer type programs and most of them have the ability not only to remove a lot of the garbage but add msi as well
 
Thank you for the replies, was a really busy couple of days so I haven't been able to follow-up on anything yet, but I'll check this out when I can get back on my PC.

I did see references to the MSI utility a few times on different forums, but honestly I just assumed they were talking about MSI the motherboard manufacturer, lol. I thought they were talking about like the MSI Afterburner program or whatever it's called :p
 
Thank you for the replies, was a really busy couple of days so I haven't been able to follow-up on anything yet, but I'll check this out when I can get back on my PC.

I did see references to the MSI utility a few times on different forums, but honestly I just assumed they were talking about MSI the motherboard manufacturer, lol. I thought they were talking about like the MSI Afterburner program or whatever it's called :p
I think we have all been there with MSI thinking one thing when it's the other.
 
That won't make a difference. The driver has to support switching IRQ's and some devices don't. Second, most drivers default to assignments based on PCI lanes (bus controller). When you have a laptop or small form-factor (SFF) PC, it's set by hardware design.

You need a special program like MSI, or hacking the registry. Unlike the DOS days, modern installers don't let you pick IRQ.
 
Maybe. But mbk1969 says in his thread:
You see, for MSI-mode must participate: chipset, device and device drivers.

It would be easier to remap other devices which are more flexible. Any real-time chipset (graphics, audio, network) wants to be tied to their parent bus controller for lowest latency. Other I/O devices (SATA, USB) are less picky because disk I/O is slower than real-time work.

I think some PC configs are always doomed if you depend on the onboard chipsets, and the real answer is spending money on add-on cards.
Don't own a "performance rig", but you see a ton of complaints where no one can figure out a DPC solution. The motherboard vendors are optimizing for gamers, and not for the audiophile markets.
 
mb changing to another pcie slot if its low-end gpu
or flashing old bios ?
disabling the usb hub "fix" the problem ?

it mostly should be fixed by the OEM and bios update
 
Just posting an update for anyone following the thread:

I spent a while troubleshooting. First I disabled all 12 usb slots via my bios, with no change in DPC. Then I also went into device manager and disabled the usb host that was sharing an IRQ with my GPU, and again no change. After spending a bunch of time figuring out which physical usb slots correspond to the bios settings and also the different OS usb hosts, it ended up just verifying that the problem isn't IRQ related. But at least I learned some new stuff and eliminated a suspect from the list.

I haven't tried the MSI vs IRQ thing yet, but with these initial troubleshooting results I doubt it will make a difference. I'll still try anyway, along with other things the next time I get a day to go at it. I'm going to focus on Nvidia next, trying different drivers, specifically the non-DCH ones. In LatencyMon the DirectX kernel also appears as one of the "issues", but I haven't investigated that yet since my theory is it's only spiking because of the Nvidia file. Also, DX is only spiking to like 80, which is well within an excellent DPC range--it only stands out because the rest of my system's DPC is so low in comparison.
 
Just posting an update for anyone following the thread:

I spent a while troubleshooting. First I disabled all 12 usb slots via my bios, with no change in DPC. Then I also went into device manager and disabled the usb host that was sharing an IRQ with my GPU, and again no change. After spending a bunch of time figuring out which physical usb slots correspond to the bios settings and also the different OS usb hosts, it ended up just verifying that the problem isn't IRQ related. But at least I learned some new stuff and eliminated a suspect from the list.

I haven't tried the MSI vs IRQ thing yet, but with these initial troubleshooting results I doubt it will make a difference. I'll still try anyway, along with other things the next time I get a day to go at it. I'm going to focus on Nvidia next, trying different drivers, specifically the non-DCH ones. In LatencyMon the DirectX kernel also appears as one of the "issues", but I haven't investigated that yet since my theory is it's only spiking because of the Nvidia file. Also, DX is only spiking to like 80, which is well within an excellent DPC range--it only stands out because the rest of my system's DPC is so low in comparison.
Thanks for the update
 
I'm hoping someone could help me out please:

I want to try and update to the latest chipset for my discontinued board to see if that changes anything. But here's the problem... Intel a few years ago did a site-wide wipe of all their legacy/discontinued drivers. I tried using Wayback machine to lookup the old links, but without success. I may be able to use chipsets from the Microsoft Catalog, but that catalog confuses me when it comes to drivers, because no matter what I search for it comes up with way too many results, and then I don't know how to figure out which ones actually apply to my system or not, so I literally have to just download a dozen or two of the latest ones that seem appropriate and try them all, which usually doesn't end up successful for me.

I've never had to deal with this legacy issue before because I always kept current on my hardware for gaming and so I've always been able to find official drivers from the usual places without problem. And in a worst case scenario I had all the drivers saved to a USB for *if* they did get discontinued... But this PC I'm working on is a frankenstein build, it was a combination of parts from two computers, one of which didn't belong to me, and so I don't have backups of some of the drivers.

The motherboard is an Intel Corportation DZ77SL-50K, and in the Device Manager the chipset shows up with the following in the USB controllers:
Intel(R) 7 Series/C216 Chipset Family USB Enhanced Host Controller - 1E2D
Intel(R) 7 Series/C216 Chipset Family USB Enhanced Host Controller - 1E26
Intel(R) USB 3.0 eXtensible Host Controller - 1.0 (Microsoft)

I tried narrowing things down by just searching for "1E2D" and "1E26" in both the catalog and Google, but it's not coming up with anything usable so far. Just a lot of outdated threads from other people experiencing issues with the same board.
 
Well, I'm stumped. I tried everything. I opened literally every single Google search result for "nvlddmkm.sys" in new tabs and read through all of them, trying every proposed solution.

Messed with bios, disabled stuff in device manager, changed drivers multiple times, switched to MSI mode, messed with all four HPET related settings, and a bunch of other things I'm forgetting now since my brain is fried from spending all day on this.

It's such a common issue it seems, I don't understand how it can go on for years without being addressed by now. I feel like this bug is probably the main culprit behind many of the performance issues causing people to hang onto certain versions of W10 because as a workaround some specific combinations of driver versions along with a certain version of W10 makes the issue go away. So the real question then is, where is the issue--in Nvidia's drivers, or in W10. I had no such problem on my XP computer, so I'm leaning towards something in the OS causing a problem.

This thread on Reddit (link) was probably the most active of all the places I found regarding this issue.

I renamed this thread to better represent the newer findings. If anyone has any ideas I'm all ears. I did all the basic stuff already, like disabling defender, sysmain, indexer, etcetera. The thing that sticks out the most here, is even in the Reddit thread that OP stated the same thing I did--using the basic Microsoft display driver with the default W10 install doesn't cause any problems. It's only after the Nvidia driver gets installed that it goes haywire (not using geforce experience either). I also specifically used the non-DCH driver and that made no difference. I'm also totally offline, so it's not like the Microsoft Store is messing with my control panel, nor is Windows Update doing anything.

I've tried 6 different driver versions now and they're all the same. I could keep just going further and further back until some old driver eliminates the issue maybe, but that still doesn't solve the problem. There has to be a registry setting in the OS or Nvidia causing the conflict. For example, something along the lines of a power saving feature that got added at some point, or whatever, some new feature that hasn't matured yet or doesn't work right on certain hardware or bios/OS configurations... I'm sure there's a way to truly fix the problem, rather than just workaround it. The problem with going really far back in drivers too is you then lose all the bugfixes and such that each of those drivers resolved, and you just end up trading one problem for another, so I'd rather spend time figuring out what the root issue is instead.

One thing that bothers me a lot is how the OS StartMenuExperience seems to tie into the Nvidia drivers. To see what I mean for yourself if you go into the Nvidia Control Panel and at the top under "Desktop" select the "Display GPU Activity Icon in Notification Area" and then left-click on the new icon that gets added to your tray you can see the following running on the GPU:

searchapp.exe
textinputhost.exe

My theory is that those are the real culprits. Which makes a lot of sense, because frankly the new startmenu is kind of a disaster. I've seen too many bugs and quirks in it already before this issue even arose, so it's clearly not stable in the OS.
 
Last edited:
sniff the output of the nvidia installer with regfromapp to see what it does with the registry.
make a backup of the current power scheme, install driver then compare default and "after" powerplans with PowerSettingsExplorer or quickcpu's very good power settings tool. if you see new entries/values can you alter them and see what happens. you can do that with PowerSettingsExplorer too.

edits
dont matter about the documentation, as you as you see the entry and its values you can fiddled around with them. prolly better doing it through the api with quickcpu or PowerSettingsExplorer.

might be better to duplicate your required plan, set that copy to Active, run the installer then the tools cos if you fkup you can easily go back to your default plan. quickcpu much better as it has more options to play around with, delete duplicate export backup rename change descriptions.
suggest playing around with new entries/values first, up down, see what happens.
MT_ used to do stuff with the registry but now he uses the api and custom powerplans.

have a look at bitsums own power plan, just need Park Control to add it, see if that sets nvidia stuff with the driver already installed.
 
Last edited:
Yeah good idea. I was already in a place I could do this right now, so I just checked the Windows power plan registry tree, and there is one new key that's added, for adaptive display, but it is empty, presumably because I don't have an Nvidia gsync monitor which probably uses that key. What I mean by empty is that it doesn't add any AC or DC values or anything, it's just a placeholder.

I haven't looked into the Nvidia keys yet, until just now (the ones in HKLM\Software\NVIDIA Corporation\) to see if anything can be tweaked there. It's kind of barren though, so I'm not sure if anything will come out of that section unless I can find documentation on that stuff.
 
I submitted a ticket to Nvidia to see what they say. It'll probably take a few days before I can get it elevated to a senior tech though, since I have to jump through all the typical hoops of people trying to tell you to run sfc and all that garbage.

I read through such a colossal number of links today, and looking at how different everyone's hardware is, yet we all have the same issue, and the only two common factors are Nvidia+W10. I have a feeling that it's way more widespread of an issue than it might seem, because the majority of computer users don't use LatencyMon and so it's just flying under the radar. I'm curious how many of our forum members here have Nvidia cards and if they check also have the same issue, without realizing it. I'll keep troubleshooting it tommorow, spent a solid 10 hours on it today so I'm beat.
 
Status
Not open for further replies.
Back
Top