Discussion: NVIDIA Display Driver (nvlddmkm.sys) DPC Latency

Status
Not open for further replies.
Hellbovine i don't know if this is a thing yet but it might be worth to collect all the possible fixes into a single post. I know nothing is conclusive and some things work for some and do nothing for others but I've also been reading upon this issue on Google and found some interesting things, like deleting the Nvidia audio driver solving this for some (it didn't for me but to be fair, i don't think i was able to really delete it and not make Windows reinstall it - I don't have the time nor the knowledge to go really deep and I also don't want to risk the current state of my OS) - maybe it's worth for all of us to try all of these one by one and even if none of works, some of us might have interesting stuff in common that could get us closer. Clanger please feel free to delete this post after Hellbovine acknowledges it to reduce clutter. Thanks.
 
So just to clarify, you have no Nvidia spiking, and the highest spike you got on a system that has all drivers installed, including Nvidia, was less than 200? If you answer yes to that, then the next question I have is, does your same machine still get similar results on a clean install of a *default* Windows (no modifications) with the Nvidia driver?
Yes for the first question, edited my earlier post for the second question with new tests.
 
Last edited:
FWIW, I've deleted my Nvidia drivers with DDU in Safe Mode, Windows 11 loaded driver version 456.71 for me (without GeForce Experience) and it seems the worst of the latency issues are gone - the highest spike I get from nvlddmkm.sys is 650. The test has been running for 5 minutes so I'd like to think it really has got better. I'll just stick with these drivers until I feel like experimenting again in any case.
By "the issue is gone" I mean I'm not getting the audio pops and crackles anymore, yet I know 650 is still higher than it should be. At least it's not audibly bothering me anymore.
 
Last edited:
Ok I am done with another giant round of testing. Quick recap:

Situation I started out with:
  • Windows 22H2 installation.
  • 13900k MSI WIFI FORCE Z690 motherboard with most recent BIOS.
  • Got a new 4090, installed it and had some problems booting.
  • Got it to work after a couple of reboot cycles (seemed to be a BIOS hickup).
  • Tested DPC latency to make sure everything works now after reinstalling NVIDIA driver.
  • DPC latency turned out to be too high and spiking.
  • Switched back to my old 3080 TI and reinstalled drivers, same thing.
  • Came to the conclusion that DPC latency was bad for a while without me noticing.
  • Figured out that I can create crazy spikes by copying files from/to my M.2 SSDs, resulting in complete audio dropouts and mouse stuttering.
  • Tried to fix the issue, couldn't figure out what's causing it.
  • Ordered a new mainboard (ASUS Z790 GAMING E), fresh installation of 22H2 had spikes too but not that crazy and no audio dropouts / stuttering.
  • Installed 21H2, got some spikes but less than on 22H2.
  • Played around with MSI mode and IrqPolicy and got a perfectly working 21H2 system.

Working 21H2 installation after messing with MSI mode and IrqPolicy (can't recreate it, more on that later):
1672161634172.png
1672161702116.png
1672161683356.png

From there on I've tried to figure out what the actual problem is and how I fixed it. Here are my observations:
  • Fresh installations of an optimized and slimmed down Windows called "Ghost Spectre 21H2", same DPC latency spikes as on my 21H2 before the MSI / IrqPolicy tweaks. No audio dropouts or stuttering, but 800-1600ms spikes of the nvidia driver from time to time.
  • Installed all the drivers that I've installed on my working 21H2 installation, no change.
  • Messed around with the MSI / IrqPolicy settings, tried basically everything imaginable... no change. Can't reproduce what I did to my 21H2 main installation.
  • Installed the same Ghost Spectre version on another SSD, did the same things. Same result: DPC spikes from the nvidia driver.
  • Powered off my computer and removed all M.2 SSDs and all but one regular SSDs.
  • Booted the first broken Ghost Spectre installation and messed around with MSI / IrqPolicy again... and got a perfectly working installation. No DPC latency spikes from the nvidia driver anymore. Looks the same as on the pictures above.
  • Reinstalled all the hard drives again and booted up the Ghost Spectre installation I just fixed. Still works perfectly.
  • Tried the same things on the broken Ghost Spectre installation... can't get it to work.
  • Tried comparing installed driver versions, interrupt mappings, MSI modes, IrqPolicies etc. between the broken and the working Ghost Spectre installation. Can't see any differences. Even the interrupts were assigned the same.
Uninstalled all drives again today and did the following:
  • Fresh installation of Ghost Spectre 21H2 without any other drives in the system... DPC latency spikes.
  • Tried all the MSI / IrqPolicy things again (with no other drives in the system)... can't get it to work.
  • Reinstalled with all USB devices uninstalled (except mouse and keyboard). No change.
  • Reset BIOS, RAM on stock speed, setting PCIe Gen manually. Reinstalled. No change.
  • Removed secondary display, got a new display port cable, disabled GSYNC, disabled resizable BAR support. No change (maybe a bit lower amount of DPC calls from the nvidia driver?)
  • Installed Ghost Spectre 20H2 (Windows 10). No change, even a bit worse DPC latency from the nvidia driver.
  • Removed front panel USB, connected the SSD I'm doing the testing on to the port of the working installations SSD. Reinstalled Ghost Spectre... no change.
  • Reassembled the PC and called it a day.
I'm pretty much out of ideas. This is beyond frustrating as it is just trial and error without any fix in sight. I'm starting to doubt the measurements of LatencyMon tbh. My new RAM should arrive soon and I will do a quick test on the broken installation. But that's pretty much it, I don't feel like investing more time into this problem. Maybe it's just buggy LatencyMon reporting wrong data, I don't have any audio dropouts or stuttering on the "broken" installations. Just the reported spikes of the nvidia driver and the reported high amount of DP-calls from said driver.

What's interesting is that on the "broken" installations all CPU cores process interrupt requests / DPC:

1672163119143.png

and the DPC count of the nvidia driver is insanly high:

1672163149557.png

That's an image of just one minute of running LatencyMon on a "broken" installation. Compare that to the screenshot above from my main installation running for 7 minutes. Again, I am not sure anymore if the reportings of LatencyMon can be trusted.

So at the end I have 2 "working" installations (official 21H2 and Ghost Spectre 21H2) now with extremly low reported DPC latency and a very low DPC count from the nvidia driver... that I have no clue how I can recreate them. Also got the "broken" installation of Ghost Spectre that's (in theory) identical to the working installation. This has to be a software problem or simply LatencyMon reporting crap.

So long... hope you guys find a fix for your latency issues.
 
I'll just stick with these drivers until I feel like experimenting again in any case...I know 650 is still higher than it should be. At least it's not audibly bothering me anymore...
Yeah that's a perfectly reasonable choice to make. I want to use your comment as a way to address our lurker friends that might be tearing their hair out too. I tried to clarify this (link) recently, but I can make it more clear here since it's important stuff that I should have discussed earlier:

On a computer that has no general DPC issues, it is "normal" for some of the Windows 10/11 drivers, such as ntsokrnl, to spike up to 200, as well as the Nvidia driver to spike up to about 500-800. This is the best anyone can get, without us finding a fix in this thread, or until Microsoft and Nvidia address it. You can lower all of these numbers further by reducing the overhead of Windows, but so far there hasn't been a tweak that solves the Nvidia spikes specifically.

If you have achieved these numbers and want to stop messing with it, then this is a respectable place to quit and move on with your life. To reiterate though, these Nvidia spikes are still not acceptable for audio production or gaming, but because the numbers are low enough at least half (or more) of all users won't notice the consequences. Once you sit down at a machine that has general DPC issues combined with the Nvidia bug, that's when you get into the territory where it's not even usable because the consequences get so bad that everyone notices.

I attached some screenshots from my machine as an example. These were taken on a fresh install of W10 21H2 using my customized image. Main_OS is a screenshot of what it looks like with every driver installed, except for Nvidia. The Main_Nvidia screenshot is what happens when the Nvidia driver is then installed. This is a good opportunity to remind everyone, if your LatencyMon is bad *before* you install Nvidia, then that's your bigger problem and you need to go fix that first before joining this thread. I'm currently working on a guide for people that need help with their non-Nvidia DPC problems since we know those can be solved at least.
 

Attachments

  • Main_OS.png
    Main_OS.png
    31 KB
  • Main_Nvidia.png
    Main_Nvidia.png
    38.2 KB
Last edited:
I attached some screenshots from my machine as an example.

What's the DPC count of the nvlddmkm.sys driver on your system after 5 minutes? On my broken installations it's in the thousands after like 10-30 seconds. On my working installations it's very low (336 in the screenshot above after 7 minutes).
 
What's the DPC count of the nvlddmkm.sys driver on your system after 5 minutes? On my broken installations it's in the thousands after like 10-30 seconds. On my working installations it's very low (336 in the screenshot above after 7 minutes).
9799. I attached the screenshot here. It's from the same 5 min run as that other recent post.

Yeah, there's a crazy huge difference in fixed versus broken. I really cannot understand how it's gone along for so many years without a fix. Everyone from gamers, to game designers, to game testers, to Microsoft, to Nvidia, someone should have noticed this...What's so bananas about it all, is that this manifests into very real, noticeable problems in-game. I recognize that at lower spikes it's easy to not be affected by them, but we have way too many users here, and on Google that have completely unusable systems, with constant spikes in the thousands. This is some serious negligence going on by Microsoft/Nvidia. The Nvidia driver is also making several other drivers go up, in both spikes and DPC counts too. Ntsokrnl for example almost doubles when the Nvidia driver is installed.
 

Attachments

  • Driver_Nvidia.PNG
    Driver_Nvidia.PNG
    87.9 KB
9799. I attached the screenshot here. It's from the same 5 min run as that other recent post.
Yeah that's also very high. If I can trust my LatencyMon readings you have some issues on your system as well (I mean you knew it already :p ).

I really cannot understand how it's gone along for so many years without a fix.
Probably like this: user reports bug to NVIDIA and gets the response "please contact Microsoft as this seems to be a bug in Windows". User reports bug to Microsoft and gets the response "please contact NVIDIA as this is related to the NVIDIA driver".

This user is me btw! :)

I would really like to see more data from other people with similar hardware. Also I would like to know how I fixed the issue... this has to be some software configuration thing. I wish I could dive deeper into the problem with debug tools and compare the difference between the working and the broken installations. But where to start and how to debug?
 
Yeah that's a perfectly reasonable choice to make. I want to use your comment as a way to address our lurker friends that might be tearing their hair out too. I tried to clarify this recently (link), but I can make it more clear here since it's important stuff that I should have discussed earlier... (shortened)
To be fair, I'm also using my PC for audio production mostly, and with this latency I get no audible clicks and pops in my DAW at all with the buffer of my Audient id14 mkII set to 64 samples, which I think is pretty effing reasonable. I've really listened hard.
And obviously, if there's a reasonable fix that allows me to use the latest graphics drivers and/or get the latency even lower, I'm in! So it's not over for me yet for sure.
 
Ok I am done with another giant round of testing...
I love everything about this post, lots of great points, lots of relevant information and testing, and good testing methodology too. This is the type of stuff I've been doing offline as well, and I'm learning a lot and feel like I'm making progress, but it's just so slow and time consuming. Below are some of the interesting topics I've come across which relate to aurox87's findings:

NEW TECHNOLOGY
Newer technology seems to suffer more than older tech. I'm noticing that people on nvme/m.2 drives, users with $500 CPUs and GPUs, using SLI mode, and even multiple monitors, all tend to have far more problems than people using older technology. I don't know why this is, maybe it's just bad BIOS settings, since that is something overlooked quite often.

MULTIPLE MONITORS
Something totally odd, is that I've seen several complaints on forums about HDMI cables and multiple monitors. A few people said that switching from HDMI to DVI fixed their problems with DPC on their second monitor, or that the problems go away if they remove the 2nd monitor. It probably isn't HDCP as the culprit here, because DVI supports that too, so I was wondering if maybe some types of cables ignore power saving features in the OS, or perhaps when the power plans get dynamically adjusted during install that it changes some settings whether it detects HDMI or not?

DISK DRIVES
Disk drives continually act as a common source of general DPC for people, which compounds the Nvidia bug and makes it so much worse than either issue is by themselves. I have a feeling there's some sort of IDE/AHCI related bug that got introduced with Vista, or we just didn't notice until AHCI had become mainstream. Or maybe it's some other related component, but why are drives such a problem on modern OS?

CPU AFFINITY
Replicating a fix is the main issue here. I truly believe a few people have used Microsoft's interrupt affinity tool to fix their problems. What confuses me however, is those same people can't replicate it consistently. I also cannot replicate their success on my machine, in fact my GPU just ignores any mask I set. I haven't figured this part out yet. This leads me to think that one of the reasons why Microsoft discontinued this tool many years ago, is because it became unreliable due to changes in the OS.

I have a few theories of why this is happening. First, there's the possibility of misattributing a fix. It could be that people who fixed their problem did set a mask on the GPU, but they also set masks for other drivers. It could be that it's really the other drivers being moved around that fixed the issue, not the GPU mask.

The second theory is that Microsoft no longer promotes this tool because it's not the right way to handle affinity anymore, and now interrupts are handled by power plans. I know for a fact there are many CPU/Interrupt settings in the power plans, and it could be that those settings are acting as an override to the GPU affinity mask, which is why we cannot replicate the fix since these plans are interfering. I want to investigate power plans more when time permits, as I think that's where the universal fix is, but I just haven't had time lately. I will get to it though.

GHOST SPECTRE ISO
Just my personal musings about these gaming images...I plan on testing all of these one day, to see if I can steal any tweaks. But, from what I've seen in preliminary research before I made my own image, is that many of these are not good, for 2 main reasons.

One, they are overhyped, and the reality is the performance is nowhere near as improved as they claim it to be. I saw a benchmark video by TechYesCity (link) and the Ghost Spectre image didn't do any better than a super simple one that TechYesCity slapped together by removing a few components. Basically, we can all achieve the same or better results using NTLite, with the benefit of it being totally transparent, and customizable to our personal preferences.

The other thing is, I've been hunting for tweaks on the internet for so many years now and have seen it all...Still today in 2022 people continue to try and use the same deprecated tweaks from the XP era, many of which never even worked on XP. I have a feeling stuff like these gaming images is full of a ton of placebo tweaks, as I've been told they integrate a lot of registry keys.

I felt like this was all worth mentioning because the level of misinformation on the internet for gamers is completely out of control in today's age. All it takes is some testing and we can see that most tweaks just don't actually do anything, or make performance worse, or don't do what people think they do. My point here is to all the lurkers reading this, try to take the time to test a few things and see if a resource you are using is credible and reliable. If you pick 5 random tweaks and test them, and only 1 of them works for example, run far away and don't listen to anything else that person/website has to say.

Also, if your testing method is how it "feels" then you're doing it wrong. Get a benchmark program and do a before/after there, as well as a before/after in LatencyMon. If it's a tweak that does something else, like disables Indexer, well go and check inside Windows if it's actually disabled. Find a way to evaluate if the tweaks you are being fed do what the person says they do. You'll be surprised just how often most people are actually feeding you crap.

Another sidenote about Ghost Spectre, it uses heavy disk compression, and that's probably why it has DPC issues. I know there's some arguements on the internet that support the use of compression in some places, such as memory compression, which can theoretically speed things up, but in practice I have never found this to be true in the real world. All things similar to prefetcher, indexer, sysmain, and disk compression have only ever caused problems for me, on every system I've ever touched, including old spinner drives.
 
Last edited:
Altho i agree in general, if it helps debunk......... i have 2 nvme drives, my cpu was released in april 2022, yeah the graphics card is what some would class old, and my bios latest update was a few months ago, and the chip firmware is the same, but in a way i am one of the fortunate ones not to notice issue.

I know this may be a LONG shot, but..... i just noticed on my motherboards 2803 bios update, there was a fTPM issue that caused stuttering ........ seeing that and obviously stutterings for some and drops, and as i said, a long shot.... could it even be something as stupid as a TPM issue causing the spikes ?
 
conflicts.PNG
Your SATA controller sharing the same IRQ with your audio controller which could be the issue.
Try :
1. Reinstalling or changing the sata or HD Audio drivers.
2. Set MSI mode on both of them.
3. Use Microsoft Interrupt-Affinity Policy tool to bound them to separate cores.

I have googled what is PEG10-460D and why is it on the same line with GPU.
The PEG10-460D appearsh to be internal Intel driver for managing PCE Express line within windows.
Try disabling PEG ASPM and PCI-E link state power management in BIOS to reduce latency.
Try disabling all settings under Platform Misc Configuration.
Please dont try to delete PEG10 - 460D driver as it can lead to entire windows crash.
You can try updating or installing older version of Chipset drivers.
Im using AMD platform and have the same thing around all PCI-E devices - but its called PCI-to-PCI Bridge.
 
Altho i agree in general, if it helps debunk......... i have 2 nvme drives, my cpu was released in april 2022, yeah the graphics card is what some would class old, and my bios latest update was a few months ago, and the chip firmware is the same, but in a way i am one of the fortunate ones not to notice issue.

I know this may be a LONG shot, but..... i just noticed on my motherboards 2803 bios update, there was a fTPM issue that caused stuttering ........ seeing that and obviously stutterings for some and drops, and as i said, a long shot.... could it even be something as stupid as a TPM issue causing the spikes ?
Disabling TPM on my AM4 platform reduced latency by 20% and increased fps in CPU bound games by 5%.
Useless feature if you dont encrypt hard drives.
 
Altho i agree in general, if it helps debunk......... i have 2 nvme drives, my cpu was released in april 2022, yeah the graphics card is what some would class old, and my bios latest update was a few months ago, and the chip firmware is the same, but in a way i am one of the fortunate ones not to notice issue.

I know this may be a LONG shot, but..... i just noticed on my motherboards 2803 bios update, there was a fTPM issue that caused stuttering ........ seeing that and obviously stutterings for some and drops, and as i said, a long shot.... could it even be something as stupid as a TPM issue causing the spikes ?
I removed my tpm on my old board because I hated it that windows 11 tried to force it down my throat. Bypassed and removed! I hate things getting forced on me like my wife trying to hide onions in every supper meal haha
 
Well, it seems like I'm not completely out of the water yet with Nvidia driver 456.71.
Latest results.
Nvidia driver still had a spike above 1000, however, I also had a (seemingly) unrelated to Nvidia driver "Highest measured interrupt to process latency" spike above 1000 - what could this be?
 
Well, it seems like I'm not completely out of the water yet with Nvidia driver 456.71.
Latest results.
Nvidia driver still had a spike above 1000, however, I also had a (seemingly) unrelated to Nvidia driver "Highest measured interrupt to process latency" spike above 1000 - what could this be?
If you can try a few things...disable your network driver and close out your Nvidia program from the background and try again to see what happens. Let me know.....

Always make sure this is done on a fresh restart.
 
If you can try a few things...disable your network driver and close out your Nvidia program from the background and try again to see what happens. Let me know.....

Always make sure this is done on a fresh restart.
Yes, thank you. I'm happy to try these, although I'm noticing high inconsistency between LatencyMon measurements (I'm not blaming LatencyMon for the inconsistencies, it just goes to show how fragile this whole issue is). Yesterday I was fine (not great but around 600), changed nothing today and now I'm above 1000 again. This is such a complex and mysterious issue that I'm starting to feel like certainly doesn't depend on a single factor and I'm also starting to feel like there could not be a simple universal fix. What I'm sure about is that going from the most recent Nvidia drivers to 456.71 (that Windows automatically installed for me after DDU removal) made everything tremendously better.
 
Last edited:
Status
Not open for further replies.
Back
Top