2080Ti and Random Crashes - ED Forums
 


Notices

Reply
 
Thread Tools Display Modes
Old 01-26-2019, 01:18 AM   #1
FragBum
Senior Member
 
Join Date: Jun 2016
Location: QLD, AUS.
Posts: 2,012
Default 2080Ti and Random Crashes

Long story short I upgraded my GPU to a shinny new 2080Ti apart from the somewhat smaller form factor compared to my 1080Ti I didn't give it much thought all this whilst going through some really odd image issues in VR and that's another story from the 2.5.4.x changes. One thing I noticed from the time I installed the 2080Ti is that the back plane got very hot whilst running DCS 60+ C.

Running old school settings netted me slightly more eye candy but still same ole same ole FPS, alright so I did some tweaking and experimenting with settings and found my rig could mostly do 90FPS in Rift NTTR and Caucasus even PG and Normandy but not as consistent with some interesting observations more latter.

So what I observe happening is for example on the Caucasus map CPU would sit at about 17% GPU about 50% 45FPS then the CPU would ramp up to 25% and GPU to 80% and voila 90FPS.

This meant GPU back plane temps rose a few more degrees as my previous 1080Ti GPU ported some of the heat out the back the new 2080Ti does not (all the 3 fan designs look much the muchness here) and increases the temperature inside the case. Anyone else think not porting at least some of the heat out the back is a bad idea?

The crashing was consistent and brutal, BSOD and a restart after a few of these and with a fairly hot day yesterday the system crashed but this time when I fired up taskmanager I had 8GB of RAM missing I'm not sure how many starforce points I racked up because of "Total System Memory" changes.

Platform is X99 so 4 memory slots either side of the CPU all close to the GPU backplane and getting hotter than before. Front RAM set about 50C rear RAM set about 55C.

The temporary fix, drop RAM speed from 2400 to 2200 so far stable but long term I need to revise the system cooling especially the GPU and RAM. So if you upgraded recently and I can't say if the same applies to other 20 series GPUs it might be worth checking temps etc if your getting seemingly random crashes.

As a side note dropping the RAM speed noticeably affected performance,... such is life.

HTH.
__________________
Control is an illusion which usually shatters at the least expected moment.
Gazelle Mini-gun version is endorphins with rotors. See above.
DCS Rig. ASUS X99Pro i7 5930K@4.7Ghz Ino3D RTX2080Ti@1950MHz 8*8GB DDR4@2.2GHz SSD EVGA 850W G2 Rift CV1 Windows 10Pro bu0836x and Frankensteined Cyclic Collective and Pedals.
FragBum is offline   Reply With Quote
Old 01-26-2019, 06:09 AM   #2
Headwarp
Member
 
Headwarp's Avatar
 
Join Date: Dec 2015
Posts: 698
Default

Do you have custom fan curves set for your GPU?

I have mine set to be at 100% by 45C using EVGA Precision x1, you can also set custom fan curves in msi afterburner. My card sounds like a quiet vacuum cleaner when I'm gaming but it stays pretty cool. Backplates can get a little warm.. but 60C shouldn't hurt your GPU. It might throttle down slightly at that point thouugh. 55C shouldn't kill your ram either, but that doesn't mean there isn't a faulty memory module in need of an RMA or replacement. Did you try re-seating the ram to be sure nothing to jiggled loose inside?
__________________
Win 10 pro 64bit i7 8700k @ 4.9ghz (all cores) - EVGA RTX 2080Ti XC Ultra, 32GB DDR4 3200 16CAS, 970 Evo 250Gb boot drive, pny 480gb sata 3 SSDx2, 4TB HDD - Acer Predator x34 (3440x1440@100hz), Samsung Odyssey WMR, - Peripherals = msffb2 for heli/prop planes, warthog (for jets), -Warthog throttle, Logitech G13, MFG Crosswinds

Last edited by Headwarp; 01-26-2019 at 06:14 AM.
Headwarp is offline   Reply With Quote
Old 01-26-2019, 08:18 AM   #3
Tinkickef
Member
 
Join Date: May 2016
Location: UK - North Yorkshire
Posts: 438
Default

Quote:
Originally Posted by FragBum View Post
Long story short I upgraded my GPU to a shinny new 2080Ti apart from the somewhat smaller form factor compared to my 1080Ti I didn't give it much thought all this whilst going through some really odd image issues in VR and that's another story from the 2.5.4.x changes. One thing I noticed from the time I installed the 2080Ti is that the back plane got very hot whilst running DCS 60+ C.

Running old school settings netted me slightly more eye candy but still same ole same ole FPS, alright so I did some tweaking and experimenting with settings and found my rig could mostly do 90FPS in Rift NTTR and Caucasus even PG and Normandy but not as consistent with some interesting observations more latter.

So what I observe happening is for example on the Caucasus map CPU would sit at about 17% GPU about 50% 45FPS then the CPU would ramp up to 25% and GPU to 80% and voila 90FPS.

This meant GPU back plane temps rose a few more degrees as my previous 1080Ti GPU ported some of the heat out the back the new 2080Ti does not (all the 3 fan designs look much the muchness here) and increases the temperature inside the case. Anyone else think not porting at least some of the heat out the back is a bad idea?

The crashing was consistent and brutal, BSOD and a restart after a few of these and with a fairly hot day yesterday the system crashed but this time when I fired up taskmanager I had 8GB of RAM missing I'm not sure how many starforce points I racked up because of "Total System Memory" changes.

Platform is X99 so 4 memory slots either side of the CPU all close to the GPU backplane and getting hotter than before. Front RAM set about 50C rear RAM set about 55C.

The temporary fix, drop RAM speed from 2400 to 2200 so far stable but long term I need to revise the system cooling especially the GPU and RAM. So if you upgraded recently and I can't say if the same applies to other 20 series GPUs it might be worth checking temps etc if your getting seemingly random crashes.

As a side note dropping the RAM speed noticeably affected performance,... such is life.

HTH.

Sorry to be that guy. I really hope I am wrong and you have just jiggled a RAM module loose.

However...


Prepare for black screen and RMA. When you go BSoD and it suddenly fails to recover, its time to pull the card and try reebooting with your on board graphics.
The GPU and VRAM are rated at 85c. Unfortunately from what I have heard, it is the vram overheating and failing. Nvidia are now sourcing their memory from Samsung afaik.

No one knows what the real problem is as Nvidia are blowing smoke over the whole deal inc rate of failure. They reckoned that some "test cards" got sent out by mistake, but does not seem plausible as third party cards also affected.

As for failure rate, best info I found out on the forums came from non partisan retailers themselves, who reckoned the overall return rate on the 2080 series were running at 3.5% with just under 50% of that purely because the customer changed their mind and sent it back. So the problem is not a common one, it's just that the cards are failing usually within two months or so, making it appear artificially widespread.

In case you missed the threads on my particular failure.... you will see the similarities.

VR section.

https://forums.eagle.ru/showthread.php?t=229469

Hardware section.

https://forums.eagle.ru/showthread.php?t=230077


Edit. New card just arrived. Currently doing a full system image backup before installing it just in case. I learned my lesson!
__________________
System spec: i7 4790k, mainboard - Asus Z97 pro gamer, PSU EVGA Supernova Gold G3 850w. GPU Asus Dual OC 2080TI, 32GB HyperX Fury DDR3 1866 RAM, Samsung evo 850 boot ssd, Crucial MX500 dedicated Oculus Driver and DCS storage SSD, Seagate 1TB hybrid drive for other stuff, Oculus Rift, warthog, crosswind, Buttkicker Gamer 2, dual Inatek 4port USB 3.0 expansion cards.

Last edited by Tinkickef; 01-26-2019 at 12:51 PM.
Tinkickef is offline   Reply With Quote
Old 02-06-2019, 10:16 AM   #4
FragBum
Senior Member
 
Join Date: Jun 2016
Location: QLD, AUS.
Posts: 2,012
Default

Well fortunately the GPU is still working but I fear the mother board is failing I'm getting random boot failures along the lines of think Holly from Red Dwarlf .

PC from windows boot manager, I cant' boot today John there's no boot drive.

Me But I just checked from command line C:\ is still there. Do a repair again,..

PC user account "John" password?

Me yes here is my password types it in.

PC sorry John no boot drive can't boot today.

Now it seems everything is correct MBR did all the fixes, It now boots between 4 and 10 restarts this is the 3rd SSD that has gone weird.

The irony for me is the "no boot drive found" yet all the command line tools say it's fine and yet it seems to work for a while after re-installing Windows then becomes intermittent.
__________________
Control is an illusion which usually shatters at the least expected moment.
Gazelle Mini-gun version is endorphins with rotors. See above.
DCS Rig. ASUS X99Pro i7 5930K@4.7Ghz Ino3D RTX2080Ti@1950MHz 8*8GB DDR4@2.2GHz SSD EVGA 850W G2 Rift CV1 Windows 10Pro bu0836x and Frankensteined Cyclic Collective and Pedals.
FragBum is offline   Reply With Quote
Old 02-06-2019, 02:56 PM   #5
Headwarp
Member
 
Headwarp's Avatar
 
Join Date: Dec 2015
Posts: 698
Default

Quote:
Originally Posted by FragBum View Post
Well fortunately the GPU is still working but I fear the mother board is failing I'm getting random boot failures along the lines of think Holly from Red Dwarlf .

PC from windows boot manager, I cant' boot today John there's no boot drive.

Me But I just checked from command line C:\ is still there. Do a repair again,..

PC user account "John" password?

Me yes here is my password types it in.

PC sorry John no boot drive can't boot today.

Now it seems everything is correct MBR did all the fixes, It now boots between 4 and 10 restarts this is the 3rd SSD that has gone weird.

The irony for me is the "no boot drive found" yet all the command line tools say it's fine and yet it seems to work for a while after re-installing Windows then becomes intermittent.
Sounds like the grief my 2500k would give me at times before I upgraded. I attributed it to the lack of driver updates for my EVGA mobo in that rig since 2013. With quite a bit of searching and investigating the individual components of the mobo i found some success with drivers directly from intel for the sata controller, as well as network drivers that were somewhat problematic. It's.. "Fun" installing drivers meant for 8.1 in windows 10.

I can't say I miss those moments with my old 2nd gen rig. Unsure if that's the case with you. But my ssd's from that old rig are still working like a charm in this build.
__________________
Win 10 pro 64bit i7 8700k @ 4.9ghz (all cores) - EVGA RTX 2080Ti XC Ultra, 32GB DDR4 3200 16CAS, 970 Evo 250Gb boot drive, pny 480gb sata 3 SSDx2, 4TB HDD - Acer Predator x34 (3440x1440@100hz), Samsung Odyssey WMR, - Peripherals = msffb2 for heli/prop planes, warthog (for jets), -Warthog throttle, Logitech G13, MFG Crosswinds
Headwarp is offline   Reply With Quote
Old 02-06-2019, 03:08 PM   #6
tintifaxl
Senior Member
 
Join Date: Mar 2009
Posts: 1,625
Default

I remember the first Sandybridge boards that had a flawed SATA controller that would fail after some time only too well.

Same symptoms. But I knew what was wrong...
__________________
Windows 10 64bit, Intel i5-7600@4.6Ghz, 32 Gig RAM, AMD Nitro+ 580, 1 TB SSD, 43" 2160p@1440p monitor.
tintifaxl is offline   Reply With Quote
Old 02-06-2019, 11:14 PM   #7
FragBum
Senior Member
 
Join Date: Jun 2016
Location: QLD, AUS.
Posts: 2,012
Default

Okay looking at it I started getting BSOD's a few months back, WHEA and watch dog timer errors which I dismissed as maybe software errors but that and the fact the mother board never did run XMP with 2 different sets of RAM, wouldn't do it's automatic over-clock right from the get go. although RAM and CPU preform very well with a mild manual OC when it boots.

The bios/uefi was updated to the latest version only a week ago I was thinking it might be a OS/bios issue but even dropping all OC settings out it's still has this boot issue.

Thinking about it I have had this weird boot problem since day one although at the time I put it down to using an intel 750 series PCIe NVME SSD it's just gotten much worse recently and that's with a SATA SSD boot drives now. I've just recently pulled the NVME drive out and prior to that NVME drive has preformed very well as a data drive.

It does point to a dodgy mother board good news is it's still under warranty!

I'll wait and see what the supplier says about this.
__________________
Control is an illusion which usually shatters at the least expected moment.
Gazelle Mini-gun version is endorphins with rotors. See above.
DCS Rig. ASUS X99Pro i7 5930K@4.7Ghz Ino3D RTX2080Ti@1950MHz 8*8GB DDR4@2.2GHz SSD EVGA 850W G2 Rift CV1 Windows 10Pro bu0836x and Frankensteined Cyclic Collective and Pedals.
FragBum is offline   Reply With Quote
Old 02-07-2019, 09:04 AM   #8
BitMaster
Veteran
 
BitMaster's Avatar
 
Join Date: Oct 2013
Location: SW-Germany
Posts: 5,483
Default

WHEA and WatchDog are very much indicating your RAM is beyond what the combo will take.

I have had dozens over dozens of thos ewhen I tried to get mine running XMP on this board.

Mind you, have those a couple times and you have a real good chance of a corrupted OS.

Windows and any OS in general do not like RAM errors at all, sooner or later the problems get worse, even at 2133MHz then.


Try to downclock your RAM and see what happens. I had to settle with 3000MHz, 3200MHz gives me your errors and any fatser is bound to crash in less than 30min with BSOD or wont even beep or boot.
__________________
Asus Strix Z370-E - Intel i7-8700K@5.2G_delidded - 32GB 3200/3600MHz - Asus 1080GTX-Ti Poseidon 2050/12006- 1x 960Evo 250GB - 2x 850Pro 256GB Raid-0 - 1 x 840 Evo 250GB PageFile - 2x Seagate 2TB - Heatkiller IV - MoRa3-360LT@4x180mm fans - Corsair AXi-1200 - TiR5-Pro - Warthog Hotas - Saitek Combat Pedals - Asus PG278Q 27" WQHD Gsync 144Hz@60Hz - Oculus Rift VR - Win10Pro64 - Slave to the Machine
BitMaster is offline   Reply With Quote
Old 02-07-2019, 09:52 AM   #9
FragBum
Senior Member
 
Join Date: Jun 2016
Location: QLD, AUS.
Posts: 2,012
Default

Yes BM this is another possibility that I have not discounted as every now and again I have seen a RAM module drop out of bios and/or OS when booted.

TBH the CPU and MB and GPU's have been great performers but the RAM side isn't so good. I have used both separately of course , Trident Z 3200 which runs at 2200 used to run at 2400, rated at 3200 this MB has never ran at the rated XMP level and Corsair Vengeance 2666, would run at 2200 but never run at it's XMP of 2666.

So I have been down-clocking my RAM attempting to find a stable state and today put the CPU back up to 4.7GHZ, so far so good with RAM at 2200.

Crashes could be memory or memory controller, 2 sets of RAM similar results. Cause?

Boot issue could effect so more testing.
__________________
Control is an illusion which usually shatters at the least expected moment.
Gazelle Mini-gun version is endorphins with rotors. See above.
DCS Rig. ASUS X99Pro i7 5930K@4.7Ghz Ino3D RTX2080Ti@1950MHz 8*8GB DDR4@2.2GHz SSD EVGA 850W G2 Rift CV1 Windows 10Pro bu0836x and Frankensteined Cyclic Collective and Pedals.
FragBum is offline   Reply With Quote
Old 02-07-2019, 06:29 PM   #10
BitMaster
Veteran
 
BitMaster's Avatar
 
Join Date: Oct 2013
Location: SW-Germany
Posts: 5,483
Default

Yikes,

once you had those, chances are that your OS already has errors that will keep on crashing it even if your RAM is stable now. You will see what I mean if it is true with yours, no doubt.

FragBum, you have a 4-channel monster, there is not much need to increase bandwidth. Yours run at 2133 as fast in bandwidth as ours at 4266, so lean back and relax Not much need to oc or force XMP.

I assume you have 4 modules and 8 slots, so why dont you change them all over, despite the manual says No-No, give it a try, sometimes this cures bad termination and other creepy symptoms that cause RAM to fail.

You can start with 1 module in one of those slots that only get occupied if you have more than 4 modules, like A1, B1, C1, D1. Usually you would use A2 B2 D2 and C2.

Give it a try.

One more tip if you do RAM testing.

!!!!!!!!!!!!!!!!DO NOT BOOT INTO W I N D O W S !!!!!!!!!!!!!!!!

make yourself a bootable USB stick with Linux Mint or Ubuntu and test there. Its a lot if they boot w/o crashing, once in desktop, just run stressapptest -s 3600 in console and lean back.

If you boot into Win, you are asking for trouble.
__________________
Asus Strix Z370-E - Intel i7-8700K@5.2G_delidded - 32GB 3200/3600MHz - Asus 1080GTX-Ti Poseidon 2050/12006- 1x 960Evo 250GB - 2x 850Pro 256GB Raid-0 - 1 x 840 Evo 250GB PageFile - 2x Seagate 2TB - Heatkiller IV - MoRa3-360LT@4x180mm fans - Corsair AXi-1200 - TiR5-Pro - Warthog Hotas - Saitek Combat Pedals - Asus PG278Q 27" WQHD Gsync 144Hz@60Hz - Oculus Rift VR - Win10Pro64 - Slave to the Machine
BitMaster is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT. The time now is 10:32 PM. vBulletin Skin by ForumMonkeys. Powered by vBulletin®.
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.