I just did a fresh install of Garuda, and everything is great except that it's not recognizing an nvme drive that I have installed. I noticed this in the live environment but figured I would do the install and then try to fix it.
I have a secondary nvme drive that's the same size as my main system one (1TB). In my old install, this was recognized on nvme1n1, but it's no longer recognized. Here's the output of lsblk now:
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
zram0 254:0 0 31.3G 0 disk [SWAP]
nvme0n1 259:0 0 953.9G 0 disk
├─nvme0n1p1 259:1 0 300M 0 part /boot/efi
└─nvme0n1p2 259:2 0 953.6G 0 part /var/cache
I haven't made any BIOS changes.
I have the blkid of the device from my old fstab if that's helpful.
Here's my garuda-inxi. Thanks in advance for any help.
What I'm expecting to see is a second drive that's nvme1n1, in addition to nvme0n1. Each of those should be 953.6G. Instead I'm only seeing one, which is what my main system is installed on.
Welp, I'm very confused, because that was absolutely the full output of my lsblk when I created this post:
But when I run it now, I get more. Here's the full output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
zram0 254:0 0 31.3G 0 disk [SWAP]
nvme0n1 259:0 0 953.9G 0 disk
├─nvme0n1p1 259:1 0 300M 0 part /boot/efi
└─nvme0n1p2 259:2 0 953.6G 0 part /var/log
/srv
/var/cache
/home
/var/tmp
/root
/
I installed through the basic Calamares installer, and chose to replace the full disk. When I installed, Calamares was only seeing one of my two nvme drives, and the SSD installed in my system. It didn't even see the other nvme I have in there. But my Garuda install at the time was seeing it.
How are the two nvme drives connected? Directly to the motherboard, or is there a drive enclosure, etc.
It would be good to go through your BIOS anyway. Double-check any settings related to SATA, NVME, or RAID. Update the thread with any findings you are unsure about.
Check dmesg and sudo parted -l too, to see if other tools are able to detect the drive.
Just to confirm: from a hardware standpoint, nothing has changed between the last installation and this one?
Just a shot in the dark...
If nothing else works, try the linux-lts kernel.
This is only because long ago, in kernel 5.18, there was a bug, solved in 5.19, in some cases with two identical nvme disks one was not recognized.
As said, it was fixed, so take it only as an option...
Sometimes when your hardware is not being detected properly, resetting the bios to the factory default can correct the problem. You will of course need to change some of the settings after a factory reset to make things Linux compatible (disable secure boot, set to AHCI etc).
Thanks for the replies, y’all! I’ll answer some now, and I’ll mess with bios later and post again.
It was also Garuda KDE Dragonized. I hadn’t been able to get any kernel other than lts to work for months and I didn’t feel like messing with it, so I chose to do a fresh reinstall. (Zen kernel is working after the reinstall.)
Both connected directly to the motherboard.
Technically that’s correct. I did switch from an Nvidia to an AMD GPU a day before the install. But after doing that, I booted into my old install a few times, and verified that the drive was showing up.
But this is making me think that I should just double check the connection. I’ll remove and reseat the drive as a test.
Results:
sudo parted -l
Model: ATA Samsung SSD 870 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 2000GB 2000GB ext4 hidden
Model: T-FORCE TM8FP8001T (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 2097kB 317MB 315MB fat32 boot, esp
2 317MB 1024GB 1024GB btrfs root
Model: Unknown (unknown)
Disk /dev/zram0: 33.6GB
Sector size (logical/physical): 4096B/4096B
Partition Table: loop
Disk Flags:
Number Start End Size File system Flags
1 0.00B 33.6GB 33.6GB linux-swap(v1)
Let me know if I should be checking dmesg in some other way:
sudo dmesg | grep nvme
[ 5.082904] nvme nvme0: pci function 0000:01:00.0
[ 5.082944] nvme nvme1: pci function 0000:04:00.0
[ 5.300956] nvme nvme0: failed to set APST feature (2)
[ 5.301276] nvme nvme1: failed to set APST feature (2)
[ 5.337908] nvme nvme0: allocated 64 MiB host memory buffer.
[ 5.338246] nvme nvme1: allocated 64 MiB host memory buffer.
[ 5.435270] nvme nvme0: 7/0/0 default/read/poll queues
[ 5.435614] nvme nvme1: 7/0/0 default/read/poll queues
[ 5.469710] nvme nvme1: globally duplicate IDs for nsid 1
[ 5.469712] nvme nvme1: VID:DID 10ec:5763 model:T-FORCE TM8FP8001T firmware:V9002s77
[ 5.484987] nvme0n1: p1 p2
[ 5.571335] BTRFS: device fsid 14889f51-4b3d-4a50-b2bf-c106e143d926 devid 1 transid 666 /dev/nvme0n1p2 scanned by systemd-udevd (242)
[ 9.433164] BTRFS info (device nvme0n1p2): using crc32c (crc32c-intel) checksum algorithm
[ 9.433169] BTRFS info (device nvme0n1p2): using free space tree
[ 9.452904] BTRFS info (device nvme0n1p2): enabling ssd optimizations
[ 10.111472] BTRFS info (device nvme0n1p2: state M): use zstd compression, level 3
[ 10.111475] BTRFS info (device nvme0n1p2: state M): turning on async discard
[ 50.258524] BTRFS info: devid 1 device path /dev/nvme0n1p2 changed to /dev/disk/by-uuid/14889f51-4b3d-4a50-b2bf-c106e143d926 scannedby Thread (pooled) (6364)
[ 78.421541] BTRFS info: devid 1 device path /dev/disk/by-uuid/14889f51-4b3d-4a50-b2bf-c106e143d926 changed to /dev/nvme0n1p2 scannedby mount (7563)
Back in February, a check for duplicate ID’s was added to the upstream kernels (see here), which started showing up on the 5.18 kernel. Someone opened up an issue on the kernel bug tracker back in May (https://bugzilla.kernel.org/show_bug.cgi?id=216049) with pretty much the same issue you are describing.
The gist of what is happening is sometimes manufacturers are producing disks that are not getting uniquely labeled how they should be, and after this update the kernel basically refuses to acknowledge a device with a dupe label. This also explains why you did not have any issues on the LTS kernel (the LTS kernel is still back on 5.15).
If you read through the bug report, you will see there are a few kernel patches floating around to try to resolve the problem, but a complicating factor appears to be that this seems to be an error on the drive vendor’s part–not necessarily something that should be fixed in the kernel.
Keith Busch 2022-05-30 14:26:46 UTC
The change to prevent duplicates was on purpose. Duplicates break udev’s ability to create reliable by-id symlinks, which can cause data corruption.
The EUI/NGUID/UUID from the nvme controller are supposed to be globally unique and are set by the vendor. If different namespaces report the same EUI64, then your vendor has messed it up.
Except for perhaps vendor specific tools, this is not a user controllable identifier.
If your vendor can’t fix it, then we would have to quirk the driver for your vendor:device to ignore the bogus values.
Check to see if there is a firmware update available for the drives you picked up. Updating firmware for nvme drives on Linux is not as simple as on a Windows box where you just click your way through a wizard, but that doesn’t mean it can’t be done. Check this article for updating Samsung SSDs on Linux: Solid state drive - ArchWiki or who knows, you might get lucky with fwupd, which is very easy to use: fwupd - ArchWiki
Obviously those resources will only be helpful if Samsung has released a firmware update to address this issue in the first place.
If not, your options are:
Build a custom kernel with one of the available patches.
Get back on the LTS kernel until the issue is resolved either in the upstream kernel or when a firmware update is released for the disk.
Change out the disk–maybe swap with a friend or something, or better yet reach out to Samsung and see what they can do for you.
Wow, you've solved a 6-month riddle for me, thank you, @BluishHumility!!!
This tracks exactly with the issues I've been seeing. I searched, and there's not a firmware update for the Teamgroup drives I'm using. I've got a new WD Black drive coming today that I'll switch out, and I'll just find something else fun to use that other SSD for.