this post was submitted on 01 May 2025
50 points (94.6% liked)

Arch Linux

8574 readers
5 users here now

The beloved lightweight distro

founded 5 years ago
MODERATORS
 

I have recently built a new PC, to be used as a server. For months now, I have been getting unexplained crashes, sometimes after a few minutes, sometimes after a few days, where the PC just reboots without any trace in the logs. Just normal occasional status logs, and then, a few seconds later, the log of a normal boot process.

This is slowly driving me crazy because I just can't make out the issue. I have tried multiple different Linux installs, swapped out the ssd and PSU and ran a ram test but this behaviour stills persists.

Today something was different. Instead of rebooting, it showed me this blue screen, this time finally with a log. But I still can't seem to make out the issues. Some quick internet searches show some very vague answers; everything from software to hardware, and psu to CPU.

Can any Linux wizard help me fix my problem? Link to the log

Update: I have now faced an even weirder issue. I booted up, installed cpupower like a comment suggested, installed man to look up its documentation and then the screen froze, and I was forced to reboot the PC by pressing the power button for 3s. Then when I booted back up, my bash history was reset to a state a from a few days back (~.bash_history mod time from 2 days ago) even though I rebooted several times since then, and have not had any persistency errors like this. man was also not installed anymore. Even weirder is that cpupower was still installed. So it seems like some data was saved, while other files were discarded. I will now use a second ssd and try to replicate this. I now suspect some kind of Storage issue, even though the two ssd drives in question have never caused issues in my laptop. This seems scary, I have never witnessed a so weirdly corrupted Linux install, ever.

you are viewing a single comment's thread
view the rest of the comments
[–] mjhelto@lemm.ee 5 points 2 days ago (1 children)

Just my two cents as someone who does this a lot, myself, only change one thing at a time when testing troubleshooting suggestions. I know the reply suggested a few things in succession, but that was showing progressive steps to confirm and identify the underlaying cause. Doing them all at once fails to correctly identify the root-cause at best, and at worst may have introduced new problems.

I say this again, as someone who notoriously does this all the time. It's a time-saver reflex, but one that will bite you in the ass eventually.

[–] Molecular5869@feddit.org 4 points 2 days ago (1 children)

Yes, I went to fast because I have been sitting on this for months now. Normally I would only change one thing at a time, but with this situation it can take everywhere from 5 minutes to multiple days to test one single thing. If it doesn't crash for 48 hours, it might be because I fixed the issues, or it might just be a coincidence and it will crash in hour 49 ¯_(ツ)_/¯.

But your right, I will attempt it the right way when I find the time, even though it will probably take weeks 😮‍💨.

[–] mjhelto@lemm.ee 1 points 2 days ago (1 children)

I know it sucks but I'm glad you seen to have corrected the problem. As someone who does more harm than good with Linux systems, myself, to fix a Linux issue without completely reinstalling the OS, is impressive and you should be proud to have accomplished such a feat!

[–] Molecular5869@feddit.org 3 points 2 days ago (1 children)

Well I've not fixed anything yet😅. It was sadly just a hypothetical. Sorry if that wasn't clear from the comment.

[–] mjhelto@lemm.ee 4 points 2 days ago

Well I'm still rooting for your success!