Discussion: NVIDIA Display Driver (nvlddmkm.sys) DPC Latency

Status
Not open for further replies.
furthest back they are listing is 460.79. hang on, think ive found them. thanks for that, i'll try them :)
what a bleedin palava :rolleyes:
 
Last edited:
i exercised the ol' google-fu.
That's the best thing to do :p Did you use NVCleanstall and remove the bloat? That might make a difference, and enable the msi tweak too. I am not having latency issues on Win10 21H2 and nvidia 522.25 using NVCleanstall + msi enabled :)
 
i will look into the msi tool. im stuck with an F cpu, the rest of my cpus all have igpu so im hoping to rebuild this month/early january and the way its going i might just say stuff it and stick with igpu, isnt as if im gaming or streaming 4k video.
 
I am sure it's not a Nvidia problem anymore since people have the same driver and perfectly fine were as some are not. This is why I came to the conclusion that it would come down to power delivery.

The only way I really could test this besides my own conclusions would be a duplicate system to the exact specs,system everything except a different power supply.
It is not necessary. Most of the common platforms like Intel LGA and AMD AM use same drivers for hardware - like chipset drivers for ryzen 3600 and 5600 could be the same version,and that goes to any other popular hardware series (RTX series cards etc).
Would be great if people start to share their Acronis True Image system backups with list of hardware they used and something like "audit log" enabled on windows. This way people with similar hardware could install it and browse which tweek/driver version/config/windows build etc helped to eliminate the issue.
 
Try 441.41 and 456.71, according to this post these versions have lower latency than the latest versions.
I have tried 441.41 with disablewritecombining - the result on ASUS DUAL RTX 2060 and WIN 10 17763 LTSC is TERRIBLE. Spikes up to 1100. When 526 driver has spike 280 maximum.
 
Here is my result after 5 minutes right after starting up.
This is interesting for sure. So it looks like your Nvidia driver only spiked up to 79, which is excellent (anything under 100 is). The next biggest culprits were ntsokrnl and dxgkrnl which is expected, even on a fresh install without any drivers installed those will spike a little, but they were still good. I'd be happy if I could achieve these results. On XP nothing on the machine ever spiked above 50, but I don't expect these more bloated OSs to ever achieve that again except in niche scenarios, so anything under 200 is probably the best target for most users.

Savitarax also achieved very outstanding results through manual core distribution, but this isn't something we can recommend for the average user as it's too advanced, and also not something we can integrate into an image or automate, so I'd like to strive for an easier solution.

Based on Necrosaro's and Savitarax's posts, it sounds like this is a conflict issue at its root. Something in the OS or the Nvidia driver is taking too long to process and is causing a bottleneck to occur. So by Savitarax manually moving the Nvidia driver to another core it acted as a workaround to solve the problem, because it's removing the bottleneck (we just don't yet know what the bottleneck is). If Necrosaro's tweaks are responsible for his low DPC, then it's highly likely we can eventually have a definitive solution if we find the component removal or tweak that eliminated the conflict.

Necrosaro, if you have a chance in the future, be it days or even weeks from now, would you be able to do a clean install of Windows (without any tweaks) on that same machine, with only the required drivers installed and check the LatencyMon results? This would greatly help to rule out a lot of stuff. For example, if a clean install has bad DPC on your machine then we know right away that the solution lies within your tweaks, but if a clean install results in equally good DPC then we can ignore all the NTLite and post-install tweaks and focus on hardware and bios instead as that is going to be where the solution is.

I followed the link AeonX posted and in one of the replies a guy asked how the OP achieved such low DPC, to which the guy responded with 3 more links. I checked those links out briefly and there was actually a lot of good information in them, surprisingly. I already do a lot of their tips, but I'm definitely going to add those links to my todo list and work my way through all of them to see what else I can come up with that helps. Everyone should be wary of the screenshot posted on that forum showing stupidly low DPC though, because it was only a 10 second snapshot which is easily fudged and meaningless.
 
Last edited:
This is interesting for sure. So it looks like your Nvidia driver only spiked up to 79, which is excellent (anything under 100 is). The next biggest culprits were ntsokrnl and dxgkrnl which is expected, even on a fresh install without any drivers installed those will spike a little, but they were still good. I'd be happy if I could achieve these results. On XP nothing on the machine ever spiked above 50, but I don't expect these more bloated OSs to ever achieve that again except in niche scenarios, so anything under 200 is probably the best target for most users.

Savitarax also achieved very outstanding results through manual core distribution, but this isn't something we can recommend for the average user as it's too advanced, and also not something we can integrate into an image or automate, so I'd like to strive for an easier solution.

Based on Necrosaro's and Savitarax's posts, it sounds like this is a conflict issue at its root. Something in the OS or the Nvidia driver is taking too long to process and is causing a bottleneck to occur. So by Savitarax manually moving the Nvidia driver to another core it acted as a workaround to solve the problem, because it's removing the bottleneck (we just don't yet know what the bottleneck is). If Necrosaro's tweaks are responsible for his low DPC, then it's highly likely we can eventually have a definitive solution if we find the component removal or tweak that eliminated the conflict.

Necrosaro, if you have a chance in the future, be it days or even weeks from now, would you be able to do a clean install of Windows (without any tweaks) on that same machine, with only the required drivers installed and check the LatencyMon results? This would greatly help to rule out a lot of stuff. For example, if a clean install has bad DPC on your machine then we know right away that the solution lies within your tweaks, but if a clean install results in equally good DPC then we can ignore all the NTLite and post-install tweaks and focus on hardware and bios instead as that is going to be where the solution is.

I followed the link AeonX posted and in one of the replies a guy asked how the OP achieved such low DPC, to which the guy responded with 3 more links. I checked those links out briefly and there was actually a lot of good information in them, surprisingly. I already do a lot of their tips, but I'm definitely going to add those links to my todo list and work my way through all of them to see what else I can come up with that helps. Everyone should be wary of that LatencyMon screenshot posted on that forum showing stupidly low DPC though, because it was only a 10 second snapshot which is easily fudged and meaningless.
I have gotten my machine almost perfect(for me at lease) with just two things left to do. The next time I do a format I will post results.

I cannot quite remember if I ever done a lacmon test when it's got the basics installed on later systems. Usually I just get to work cleaning it up. This has always been my thing ever since my win 98 days.

However if anything changes I will be letting everyone know.

Second though if I cloned my ssd and put it on another drive I could just do a swap right?
 
Last edited:
(please don't quote this post in your replies, it's too much spam, isolate the specific text you want to focus on instead)

This is a long read, meant to serve as a breakdown of all 8 pages in this thread so far, and an update on current theories and tests, to help brainstorm how to solve this problem. It's probably only worth reading if you are the type of person that likes to troubleshoot and want to try and tackle this issue too. If so, please read on and see if you can figure out what I cannot. Hopefully this post will spark someone to go, "Oh what about..." and come up with something that not many people know of. For example, in this post I talk about "Data Collector Sets", which is something most users don't know exist, and perhaps someone can point out other similar things that are generally unknown/buried in the OS that may be the culprit here. I tried to break up my thought stream into sections below for easier digestion.

DPC COUNT
What I think is really interesting and I'm hopeful will lead to the solution, is a specific column in LatencyMon that is easy to overlook since we're all normally focused on the millisecond/microsecond numbers instead. In the "Drivers" column there is "DPC Count" and basically this number increases on each driver file individually, each time LatencyMon detects a moment when that driver got tied up for too long.

Comparing a standard Windows install to Necrosaro's, I noticed his "DPC Count" columns are ridiculously low in general when I also run it for 5 minutes. A specific example is the LatencyMon driver itself (rspLLL64.sys) which has an unusually high DPC Count on a standard Windows install. It's this way across the whole board, almost every single one of Necrosaro's various driver files are very low in comparison.

This got me thinking that something in Windows must be hooking into all of the drivers in some way, probably a type of monitoring process, and is what's slowing everything down. Graphics cards and Network adapters in particular are extremely sensitive to latency and they tend to show up with very high DPC numbers when there's a problem, even if the problem isn't coming from the graphics or network drivers.

NEW TESTING INSIGHTS
So with all that being said, I went and ran another test where I uninstalled the Nvidia driver and ran LatencyMon again to check some other sets of data this time, rather than just looking for DPC spikes. We already know for sure that DPC latency is usually "normal" until the Nvidia driver is installed. However, what I saw this time while paying attention to the "DPC Count" column is that the LatencyMon driver's DPC Count is 3x higher than it is on Necrosaro's system even though I didn't even have the Nvidia driver installed yet. This is telling me that something in the OS is causing a problem here, and Necrosaro has that conflict disabled/removed. The Ntsokrnl also has a high "DPC Count" in general, however mine was still fairly close to Necrosaro's.

On that note, I did notice that Necrosaro's USB driver and Storport drivers had much higher DPC Count than on my machine, so there's a problem there on his system, requiring perhaps a bios update, bios setting tweak, or updated drivers for those, but otherwise I don't think we'll ever be able to get Ntsokrnl DPC Count lowered without some fixes on Microsoft's side, because Necrosaro and I both have highly tweaked systems, and we both approached our tweaking differently too, yet neither of us could fix Ntsokrnl.

A test that someone could do which might help us find a solution, is to try each of the 4 different xml presets in NTLite, and see if one of them fixes the issue. Then if we identify a preset that does, we could compare the differences between that preset and the one above it to narrow down which removal was responsible for the fix. The same concept can apply to custom presets as well. Eventually I can get around to doing this, but I already have so much on my plate right now testing other things that it'll be a long time before I get to it.

DATA COLLECTOR SETS
For anyone that's an old school computer builder/tweaker, you'll probably remember the days of adjusting the "PCI Latency" and AGP video card settings in the bios. These latency settings were things that could change the performance of the system, because each device needs fair time to process data, but if you give any device too much or too little time then it negatively affects everything else, meaning it needed to be carefully balanced on a per-system basis. In modern bios this is no longer a thing because we're on PCI express and so forth now, but the underlying concept still remains in Windows. If a monitoring tool is interfering with a device it will cause that device to take too long to process its data, causing system-wide DPC issues since DPC is something that snowballs from one device to another.

So I went digging in Windows, trying to reverse engineer whatever Necrosaro tweaked to remove this conflict. I started messing with the various AutoLoggers. You can access them in W10 from the user interface at Start > Windows Administrative Tools > Performance Monitor > Data Collector Sets > Event Trace Sessions (as well as Startup Event Trace Sessions).

I disabled all of the ones I could, but there are 3 of them in total you need to takeover permissions otherwise you cannot disable them (I didn't do that yet, maybe the fix is in those). This didn't fix the issue unfortunately, however it did seem to lower DPC latency, improve system responsiveness, lower memory usage, as well as quicker reboot times, but this is just anecdotal right now since I didn't do thorough testing. I am mostly just looking for the "big fix" at the moment, I just want to see that Nvidia driver stop spiking.

I think if we continue to follow this train of thought it will be fruitful. I know there are several threads already where people discuss AutoLoggers and quite a few veterans here disable them all. At this time my leading theory is that one of these types of things in the OS are causing the issue. Maybe not AutoLoggers, but something else similar that is silently working in the background unless someone removed/disabled them via NTLite or registry keys.

WINDOWS AS A CULPRIT
Necrosaro may be right about the Nvidia driver not being at fault. Something that I noticed recently, is that a clean install of Windows has an alarming number of errors in the event log in daily usage. There's clearly a lot of bugs that need ironing out in W10/W11 still, especially in the Edge browser which appears with errors every day.

The theory of the OS being the culprit is also supported by how each new operating system tends to be worse for gaming performance/DPC in general, because more and more loggers and OS troubleshooters are being added with each new Windows, adding more and more possibilities for conflicts and interference, thus DPC issues.

Furthermore, this theory also ties into the current known bug with the new 22H2 version of W10 and W11 that just released, where Nvidia identified an issue in Windows that Microsoft had enabled debugger tools by mistake, causing noticeable performance issues in the graphics driver. I'm thinking there are more unidentified problems like this, existing prior to 22H2. Why Nvidia drivers are so sensitive compared to AMD, I don't know.

All of these background processes, loggers, troubleshooters, etcetera, are where a lot of the bloat in the number of threads/handles are coming from with each new Windows. When you look at the difference between XP to W7 for example, it was a colossal increase in the amount of background activity, because starting with Vista they began adding a ton of these monitoring tools, and it's progressively grown to be insanely high in W10/W11 now. The simplest example I can give is that XP SP3 idled at the desktop on less than 200 MB memory, while W10 idles using over 1,300 (even with prefetch/superfetch disabled).

Most Windows processes don't have anything to do with the user experience, they are just monitoring/debugging/telemetry for Microsoft, which is why as a user there really isn't anything "new" you gain by using newer operating systems (besides hardware compatibility), because most of the new "features" in general are really only there for Microsoft's sake. As much performance improvements as Microsoft adds in each new Windows, gets objectively overshadowed by all the extra bloat that comes with it. The only reason newer operating systems continue to perform well is because people are buying new hardware over time, and the advancements in hardware have been substantial in the last 20 years. A crappy SSD compared to a 10,000 RPM Raptor drive for example is night and day. Even a crappy processor in a laptop today can easily outperform a gaming processor on a desktop from 10 years ago.

WHAT WE KNOW SO FAR
We have had a number of people replying with things that helped the issue for them. These steps have varied wildly between users, and I think the main takeaway here is that in the long chain of drivers that are being negatively affected by DPC issues these users are simply figuring out which ones were causing the biggest problems for their machine, and fixing that. However, these aren't resulting in a universal fix. The closest thing to a universal fix is manually moving drivers around to different cores, but this is very complex and too advanced for most people. We need a better solution, or Microsoft needs to patch things up so that Windows moves things around better on its own, or for Nvidia to make their driver less sensitive.

Some things users have reported that help:
- Disable HDCP in Nvidia driver
- Try different Nvidia drivers
- Disable unused devices in Bios
- Put graphics card in MSI mode
- Use high performance power plan
- Disable Windows Defender
- Set Nvidia control panel to maximum performance
- Changing HPET settings

Overall though, I can say with certainty that none of the list items above are the root cause of the underlying DPC problem, these things are just fixing top-level DPC issues, while under the hood something in the OS is clearly causing DPC issues system-wide, it's just that not everyone is blatantly affected, because some machines happen to be splitting up the drivers onto different cores better than others, creating less conflicts at the base of everything for those people. In other words, so far everyone is resolving symptoms, not the root cause. It's like taking DayQuil for a cold, it does absolutely nothing to help you get rid of the virus, it just alleviates some symptoms.

Essentially, I think we are well beyond "simple" fixes here. The root problem likely has nothing to do with things like driver versions or anything that the average user can tackle. The solution is to figure out what deeply buried process in the OS is interfering with these sensitive drivers, which would then make fixing all other DPC issues so much easier, because right now this deep-level DPC problem is just exasperating all the other more easily fixable DPC issues that are layered on top. DPC is really complicated in this way, it's very much something that has to be chipped away at in layers, and is why the issue varies from person to person, with some people having spikes of up to 300, while others are at 50,000 or more.
 
On that note, I did notice that Necrosaro's USB driver and Storport drivers had much higher DPC Count than on my machine, so there's a problem there on his system, requiring perhaps a bios update, bios setting tweak, or updated drivers for those, but otherwise I don't think we'll ever be able to get Ntsokrnl DPC Count lowered without some fixes on Microsoft's side, because Necrosaro and I both have highly tweaked systems, and we both approached our tweaking differently too, yet neither of us could fix Ntsokrnl.


This is due to my outdated drivers for my motherboard, there won't be any new ones and windows 11 will be this final operating system. Lucky win 10 drivers did work. Also my 18 terabyte drive is quite full that could be the issue as well being not enough room available to do the work.
 
Hellbovine wants you to donate your PC "for science".
We will chat about it and go through everything my system has been gone through instead of cluttering up the forums with our dribble.

Easier to dissect a PC then anything else. Maybe I have the holy grail haha
 
i did a latmon and it had spikes with dx and nvlddmkm.sys but lasso said that it was perfectly fine, no issues

 
Status
Not open for further replies.
Back
Top