Jump to content

Are there any plans to allow DCS to use multiple cores?


MobiSev

Recommended Posts

After the latest update DCS.exe now defaults (on my PC at least) to use all cores! Thank you ED!

 

Ty. That's good news.

 

Does this support depend on certain maps?

 

Or is the OP's question answerd by that?

 

That's a bit of a jump to conclusion isn't it?

 

But the initial views and realization of the mistake does offer a better outlook right now.

 

 

EDIT: I've seen your posts in other threads now JimmyWA, I believe this thread is about DCS multi-threaded optimizations, even tho the title is confusing, not about the CPU Affinity bug.

 

 

Vulkan is actually pretty bad for smaller game studios. Sure, it's amazing for studios that focus on game engines. It offers little gain over dx12 but requires tons of expertise.

 

 

DCS isn't exactly handled by a small studio, and Vulkan API happened to already be chosen over DX12. Me and others were cheering up for Vulkan API early on as well, but I'm sure they compared both options in practice with all the other reasons and decided on their own what the optimal way is.


Edited by Worrazen

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

More appropriate thread to reply to: From the DCS Update Patch Notes

 

No. DCS is still only two-threaded program (in regards of the simulation itself) - rendering, physics, AI and so on run on the same single thread. Windows just shuffles them over multiple cores (suboptimal scheduling) for some reason for some people, which causes the easily misinterpretable results in task manager.

 

Actually I didn't say necessairly said that the type of scheduling is less optimal ... althought I agree I was jumping around with many theories, speculations and changes of opinion during that forums thread.

 

Thread bouncing may just be a normal thing that was always present, maybe it's bouncing way more rapidly on newer modern systems than on older, I'm not sure, there isn't that much of a difference on most CPUs but it does indeed depend on the architecture, it does apparently become a problem if it doesn't schedule properly on newer much different architectures such as for example AMD's CCX and one of them may just be thread bouncing arbitrairly to a completely different CCX for no reason, in which the cache would need to be read from one CCX it originated from to the new CCX and the rest of the CPU Core level caches and in that case the thread would need to wait for the caches to finish being written to the new CCX before proceeding execution on the new CCX which would produce a stutter. That's how I remember it I think.

 

One thing I worded wrongly or too harsh at first, even tho I wrote many paragraphs to explain it eventually but throughout all that the main point could have been forgotten, is that Task Manager does indeed show you correctly about the Core utilization, so it's not totally wrong as some people reading those posts may think, it is strictly just utilization, there no other context to it, zero thread context, this is not explained or indicated by TaskManager nor MS (well maybe in dev docs) and most people including me would interpret it wrongly, I believe this is a mass confusion because almost everyone in various reviews, tutorials, troubleshooting videos is looking at those TaskManager graphs in one way or another and may be doing it all wrong, same goes for the Memory, everyone looks at the Memory number there to the left, that's all wrong as well, Commit Chagre is the number to look because that's the number where things start crashing, that's the number that triggers Resource Exhaustion dialog, to put it simply.

 

 

One of the biggest problems with this thread bouncing phenomena, is more in human analysis area, unless there's a better, easier, faster way I must have missed it, but it's awfully tricky to figure out with the tools I experimented with, including WPA, fidding with the correct filtering settings in WPA is a pain in the butt for me as an relatively inexperienced user there so far, even if I got it right I don't know I got it right so I have to dobule check and all ...

 

 

Key points:

 

1. Thread bouncing is a global phenomenon, not a DCS specific one

2. Threads can be programmatically set to be favourited to a particular CPU core if devs so desire.

3. We as users can only set the CPU Affinity for a whole process, not a specific thread within a process (???), such an ability would have made things clear in any CPU core utilization utility and shown perfectly what is going on (ofcourse the more cores you have the more detailed that representation would be) and it would save me a ton of posting and explaining.

 

 

Back to your quote:

 

But that's not exactly accurate either, two-threaded, we only say that for convenience in practice, the threads that do the amount of work enough to mention, the wording really needs to be perfect as this is complex enough and any slight of confusion or misunderstanding can mean a big difference. It's not two-threaded in regards to the simulation, if you're talking about simulation as in just the physics part then that isn't multi-threaded right now and will probably never be because of the nature of the task as we know, each calculation relies on the result of the previous one.

 

It's very very important that we define what do we mean by the term of "simulation", simulating physics, simulating AI, simulating sound, simulating weather, simulating traffic ... and I know you probably meant it correctly, just wording that is also the other part so others who don't know yet don't get confused.

 

 

Here's some of the possible parallelization opportunities, generally speaking, the most obvious is the first case, then the second case for parellel tasks, but these things aren't immediately known, parallelization possibility in some things is hard to immediately figure out and gauge, these things weren't done on purpose to make it "better for single-core", they just happened, that's just the way things were programmed before multi-core was a thing, it didn't matter if a task is serial or it can be parallelized, many of these things aren't immediately known what they are as it's all sandwiched together, that's why it requires a lot of resources because it is necessary to figure it out how it works then figuring how it could work and looking for potential gains of separating things apart or de-sandwiching, and it sometimes such a big deal that it's basically rewriting whole batches of code rather than tacking some bits one by one, AFAIK.

 

But the wording is again ... even a parallel task is a parallel just from a higher point of viewe, each of the subtasks of the parallel task is a serial task, so this can get confusing.

 

Maybe it would be possible to split physics calculations between aircraft, so does the AI-physics run on a different thread?

 

We could thread All-AI stuff as one big task and then treat it as Serial, but what if it CAN be parallelized in a bit, what if there's more separate serial tasks inside of AI stuff, AI-airplane-physics are probably not related to the AI-SAMSite-radar-logic or the AI-Command logic which commands the units around, is it? IMO Only if it has to do with sync (so everything is at the right place as it should be in every single frame) then it would have to be kept on the same thread. However Multiplayer is all each-PC simming it's own aircraft and then the server mashes it all together, if it's acceptable there then maybe it could be so on a local machine singleplayer evne tho I disagree with this as I think it introcudes, Multiplayer is probably quite a bit off and shouldn't be used for testing or as a basis.

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

i don’t think we need to worry about thread bouncing.

modern operating systems will bounce a thread between cores “on purpose” to spread the heat load and keep the chip cooler. this enables faster clock rates and more time in the boost modes. there is no performance impact to “bouncing threads” so this is a good thing to do.

 

this is one of the reasons we caution users from playing around with core affinities. in most cases the operating system knows “way more” about the intricacies of the hardware than the user does and it is able to make better choices in how to utilize the compute resources.

Link to comment
Share on other sites

forgot to mention, yes threads can be programmatically set to a particular CPU core, but they shouldn't do that. it can really mess things up. even microsoft recommends against doing it.

 

microsoft says,

"Thread affinity forces a thread to run on a specific subset of processors. Setting thread affinity should generally be avoided, because it can interfere with the scheduler's ability to schedule threads effectively across processors. This can decrease the performance gains produced by parallel processing."

 

i'd be very surprised if DCS took this route.

Link to comment
Share on other sites

microsoft says,

"Thread affinity forces a thread to run on a specific subset of processors. Setting thread affinity should generally be avoided, because it can interfere with the scheduler's ability to schedule threads effectively across processors. This can decrease the performance gains produced by parallel processing."

 

We've gone over that yes, if you were able to pull out the point of my above post, that the only "problem" with Thread Bouncing is just for when a person tries to dissect what is what and where and how much, there is no easy straightforward way to do it, as far as I know right now.

 

How is "parallel processing" going to help you with a serial workload? Just having a "parallel processor" and putting 1 thread on it to jump around isn't going to speed anything up as we already established. They're just overblowing the parallelization thing, it could be PR hype, to make 10 million gamers think their Windows is "parallelizing their game threads".

 

While in simple logic the real reason may not be performance IMO, it may be small, it may be due to some cache-stuff being prepared or some look-ahead stuff the CPU could be doing, the heat-power stuff isn't that clear either but it makes sense, it could be only with heat distribution in the heatspreader and therefore avoding one part of the CPU being more hot, but that's like so far away from anything so important, like all this thread bouncing, scheduling, just for some heat spreading ... if it doesn't decrease perf, which yes it doesn't, then why not, but it's kinda weird why would they just not be honest and they keep hyping up this parallel obsession.

 

Let's say you have a Quad-Core CPU and if you set DCS most hungry thread on CPU0 it would simply stay there, consdering you're not doing much with the PC while you're playing DCS then CPU0 wouldn't be used up much by anything else and if it were then the scheduler would have cleared the way and moved other threads out for that DCS main thread to take up as much as it wants on CPU0, you would still have the scheduler doing it's thing for all the rest of the DCS threads and the programs/OS. This type of setup should be tested gainst the fully thread-jumpy one and let's see what really are the perf, power, heat, stability differences, one popular tech reviewer could try such a thing and get this confusion sorted out once and for all, hopefully, if there was just a way to do per-thread affinity without needing the developer builds or needing in-game support necessairly. Otherwise I could have done this test already and we'd get to the bottom right here.

 

 

Now at last, one of the performance benefits of thread affinity is that you would be able to use it with a particular overclocked core, some CPUs have a varying degree of core quality inside, some cores are more stable than others, some cores perform better than others, some are more overclocable than others, a pro-gamer could figure out which core is the best on a particular CPU, overclock that core, and because it's only one core it wouldn't heat up the PC as much, it wouldn't draw that much power either, probably would be more stable as you're not touching the less stable cores, and you get ALL the benefits in DCS and similar software which relies on single-core performance. This is actually a more feasible practical scenario, but to prove it first the DCS developers would need to open up support for this in the settings.

----

 

 

Your first post was exactly what I was figuring out, and reconfirmed recently with a refresh that this thread triggered in my mind, it's all a power-heat thing, it's not a maximization of the performance.

 

Please read my posts carefully as I put a lot of time into them, I was in no way shape or form saying that anyone has to be configuring CPU Affinity, that's only done as a tweak in case things don't work right by default, and in cases of troubleshooting and analysis.

 

Besides other people on this forum have experimented with CPU Affinity a long time ago already and there's whole threads about just that.


Edited by Worrazen

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

there is no performance impact to “bouncing threads” so this is a good thing to do.

There is a huge impact, because each time a thread awakens on a diferent core, he has lost his L1 and L2 cache data, and it has to load it again from DRAM ...

 

 

microsoft says,

"Thread affinity forces a thread to run on a specific subset of processors. Setting thread affinity should generally be avoided, because it can interfere with the scheduler's ability to schedule threads effectively across processors. This can decrease the performance gains produced by parallel processing."

 

i'd be very surprised if DCS took this route.

That would be True if the Windows Scheduler was smarter ...

 

I don't know if still happens on matlab, but back in 2013, Matlab solved a Matrix Inverse faster forcing affinity to only one core, that if I let him use the 4 cores. And if you looked at the task manager, all 4 cores were almost at 100% utilization ...


Edited by cercata
Link to comment
Share on other sites

How is "parallel processing" going to help you with a serial workload?

Once I watched a presentation from Naughty Dog, and how they organized everything in Jobs, that were serial tasks. Once they knew the dependencies and shared data between jobs, they asigned those jobs to different cores.

 

That aproach is brilliant, but they can do it because they know the processor where it's running, and they know what cores are available for them.

 

If windows had a Game mode, were a process had 75% of the cores, and the rest the remaining 25%, and it had an efficient way off creating threads, and assigning them to "virtual cores", DEVs could achieve something similar. If there a more "virtual cores" than "real cores" available, then the scheduler should find a way to map it, and assing several virtual cores to a real one.

 

But Windows doesn't has that ... so DEVs are fuc_ked :(

Link to comment
Share on other sites

There is a huge impact, because each time a thread awakens on a diferent core, he has lost his L1 and L2 cache data, and it has to load it again from DRAM ...

 

 

 

That would be True if the Windows Scheduler was smarter ...

 

I don't know if still happens on matlab, but back in 2013, Matlab solved a Matrix Inverse faster forcing affinity to only one core, that if I let him use the 4 cores. And if you looked at the task manager, all 4 cores were almost at 100% utilization ...

 

Well I didn't see a huge impact with this test.

 

 

There's one weird effect in this program apparently that sometimes it doesn't run full speed, in the first test you can see the green bar is missing a bit in the Multi-Threaded section and thus being beaten by the Single-Threaded one, even tho in this case both are forced onto their own respective cores.

When I set the affinity of the multi-threaded one to more than one CPU then this behavior switches around, it may be some kind of an external factor so this should be treated as error margin and treated as if there's no difference, at least in this specific video case.

 

Process Lasso will be quite handy ... I thought I'd contriube a bit and took a monthly subscription of the PRO version to assist with testing even more :)


Edited by Worrazen

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

Well I didn't see a huge impact with this test.

Interesting tool

 

The impact depends on what the threads do, and on it's memory usage patern. The "matrix calculations" is an extreme example of what gets hits the most with thread bouncing. There is also data sharing between threads in that case.

 

Project Lasso is an amazing tool, but what it does should be really integrated into Windows, because any change on the windows Scheduler via update risks of breaking your system completly.


Edited by cercata
Link to comment
Share on other sites

Okay I got some results, after digging into WPR/WPA for like 6 hours this morning, going through the MS docs/tutorials, I was rusty for a whole week but got back into it to actually display things correctly, but I need to make screenshots and label it all it may take another third of the day, but before I begin that, one thing that's very important with any kind of deep brainstorming on a PC is to take a break outside in the beautiful weather :)

 

--

 

How much CPU cores it's using and what the thread behavior is can be done in one screenshot, but going forward to show things how they work in a really good way beyond just totals and summaries I would actually need to make a video of the whole mission replay being played side-by-side with the charts in the WPA with an added timeline cursor on the chart. It'll need some video editing to add a timeline cursor and figuring out the sync so it actually moves correctly proportional to the DCS mission replay time and the events on the chart. Secondly it's a lot of horizontalness with all of this side-by-side stuff ... might have to figure some way, turn the monitor to vertical and have it above-below window positioning but I never tried that yet.


Edited by Worrazen

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

vulkan wont do much at all. vulkan doesnt magically transform a single-threaded simulation into multi-threaded simulation.

it only lowers rendering API overhead and allows to send commands to GPU in a multi-threaded manner. however rendering is only a tiny portion of what the game does in the main thread, so increase in performance will be marginal.

what needs to be separated into other threads is physics and AI.

 

 

No, it also has potential for that

Link to comment
Share on other sites

Okay I got some results, after digging into WPR/WPA for like 6 hours this morning, going through the MS docs/tutorials, I was rusty for a whole week but got back into it to actually display things correctly, but I need to make screenshots and label it all it may take another third of the day, but before I begin that, one thing that's very important with any kind of deep brainstorming on a PC is to take a break outside in the beautiful weather :)

 

--

 

How much CPU cores it's using and what the thread behavior is can be done in one screenshot, but going forward to show things how they work in a really good way beyond just totals and summaries I would actually need to make a video of the whole mission replay being played side-by-side with the charts in the WPA with an added timeline cursor on the chart. It'll need some video editing to add a timeline cursor and figuring out the sync so it actually moves correctly proportional to the DCS mission replay time and the events on the chart. Secondly it's a lot of horizontalness with all of this side-by-side stuff ... might have to figure some way, turn the monitor to vertical and have it above-below window positioning but I never tried that yet.

 

Check out this video which shows similar performance improvements to what I get in VR (I go from an average of 20fps with 1 core to 40fps with All cores in use, with all graphic settings on high, and shadows on low).

 

i9 12900KS, ASUS ROG Maximus Z790 APEX, 64 GB DDR5 4700 mhz, ASUS RTX4090, Water cooled, C - NVME SSD, 😧 DCS on SSD, TM HOTAS Warthog Stick & Throttle, Crosswind rudder Pedals, 2 x Thrustmaster MFDs on LCD Screens, Varjo Aero VR, Logitech game controller

Link to comment
Share on other sites

Check out this video which shows similar performance improvements to what I get in VR (I go from an average of 20fps with 1 core to 40fps with All cores in use, with all graphic settings on high, and shadows on low).

 

 

I did saw that video somewhere, that's good if you get improvement, but I never had the affinty bug, not sure what would set it to one CPU only, it always used all CPUs for me, and I don't always have HT enabled either, and I still have the older traditional intel CPU, not the newer CCD/CCX designs.

 

Looking at the thread breakdown in single-core test below, it's pretty obvious why you get a perf boost when switching to multi-core. (posting it any minute now)

 

 

vulkan wont do much at all. vulkan doesnt magically transform a single-threaded simulation into multi-threaded simulation.

 

it only lowers rendering API overhead and allows to send commands to GPU in a multi-threaded manner. however rendering is only a tiny portion of what the game does in the main thread, so increase in performance will be marginal.

 

what needs to be separated into other threads is physics and AI.

 

Except, the DX11 "rendering" overhead is PRETTY BIG, enough to send you down below 20 FPS.

 

It's not just proper render command multithreading, it's much less driver overhead, and draw calls cost less by orders of magnitude.

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

Actually ... I keep figuring this out and it keeps on going and I kinda learn more and more, so back to square one, won't be doing any quick screenshots.

 

It's been 3 times I wanted to post just a quick example, but in the cause for doing it right, I keep raising that bar of what should be okay and correct enough of "quick example with just basic details"

 

I keep getting back to the point what I had earlier, I really shouldn't be doing quick anything with this.

 

I wanted quick just to add to this threads discussion but by the time I finish it'll also be way over the scope and it'll be more suitable for it's own proper forum thread.

 

The whole point of this will be just thinkering with where there can be some optimizations done, not that developers don't know, but for something to fiddle and talk about with while we wait, and what a good way to learn these tools on so I can use them with all the other stuff non-DCS when debuggins/troubleshooting in general.

Modules: A-10C I/II, F/A-18C, Mig-21Bis, M-2000C, AJS-37, Spitfire LF Mk. IX, P-47, FC3, SC, CA, WW2AP, CE2. Terrains: NTTR, Normandy, Persian Gulf, Syria

 

Link to comment
Share on other sites

  • 3 weeks later...
there is no performance impact to “bouncing threads” so this is a good thing to do.

There is a huge impact, because each time a thread awakens on a diferent core, he has lost his L1 and L2 cache data, and it has to load it again from DRAM ...

 

theoretically yes, but in generalized workloads (games, sims, netflix, even DCS), anything that is not 100% compute, it makes no difference.

 

you see this all the time when users post their perf data after spending hours and hours "tweaking", pinning dcs to a subset of cores, or trying get the process lasso magic to take.

 

we don't have low level control of the scheduling so all of this is wasted effort.

Link to comment
Share on other sites

Just rewrite it all in low level assembly language :)

New hotness: I7 9700k 4.8ghz, 32gb ddr4, 2080ti, :joystick: TM Warthog. TrackIR, HP Reverb (formermly CV1)

Old-N-busted: i7 4720HQ ~3.5GHZ, +32GB DDR3 + Nvidia GTX980m (4GB VRAM) :joystick: TM Warthog. TrackIR, Rift CV1 (yes really).

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...