Fresh install not recognizing nvme

I just did a fresh install of Garuda, and everything is great except that it's not recognizing an nvme drive that I have installed. I noticed this in the live environment but figured I would do the install and then try to fix it.

I have a secondary nvme drive that's the same size as my main system one (1TB). In my old install, this was recognized on nvme1n1, but it's no longer recognized. Here's the output of lsblk now:

lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0   1.8T  0 disk
└─sda1        8:1    0   1.8T  0 part
zram0       254:0    0  31.3G  0 disk [SWAP]
nvme0n1     259:0    0 953.9G  0 disk
├─nvme0n1p1 259:1    0   300M  0 part /boot/efi
└─nvme0n1p2 259:2    0 953.6G  0 part /var/cache

I haven't made any BIOS changes.

I have the blkid of the device from my old fstab if that's helpful.

Here's my garuda-inxi. Thanks in advance for any help.

System:
Kernel: 6.0.12-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=14889f51-4b3d-4a50-b2bf-c106e143d926 rw rootflags=subvol=@
quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
loglevel=3 ibt=off
Desktop: KDE Plasma v: 5.26.4 tk: Qt v: 5.15.7 info: latte-dock
wm: kwin_x11 vt: 1 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Desktop System: ASUS product: N/A v: N/A serial: <superuser required>
Mobo: ASUSTeK model: TUF GAMING X570-PRO (WI-FI) v: Rev X.0x
serial: <superuser required> UEFI: American Megatrends v: 4021
date: 08/10/2021
CPU:
Info: model: AMD Ryzen 5 5600X bits: 64 type: MT MCP arch: Zen 3 gen: 4
level: v3 note: check built: 2021-22 process: TSMC n7 (7nm)
family: 0x19 (25) model-id: 0x21 (33) stepping: 0 microcode: 0xA201016
Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 3 MiB desc: 6x512 KiB
L3: 32 MiB desc: 1x32 MiB
Speed (MHz): avg: 4200 min/max: 2200/5279 boost: enabled scaling:
driver: acpi-cpufreq governor: performance cores: 1: 4200 2: 4200 3: 4200
4: 4200 5: 4200 6: 4200 7: 4200 8: 4200 9: 4200 10: 4200 11: 4200 12: 4200
bogomips: 100803
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
STIBP: always-on, RSB filling, PBRSB-eIBRS: Not affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M]
vendor: Gigabyte driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s
lanes: 16 ports: active: DP-1,HDMI-A-1 empty: DP-2,HDMI-A-2
bus-ID: 0c:00.0 chip-ID: 1002:73df class-ID: 0300
Device-2: Logitech HD Pro Webcam C920 type: USB
driver: snd-usb-audio,uvcvideo bus-ID: 5-2:3 chip-ID: 046d:082d
class-ID: 0102 serial: <filter>
Display: x11 server: X.Org v: 21.1.5 with: Xwayland v: 22.1.6
compositor: kwin_x11 driver: X: loaded: amdgpu unloaded: modesetting,radeon
alternate: fbdev,vesa dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 7280x2160 s-dpi: 96 s-size: 1924x571mm (75.75x22.48")
s-diag: 2007mm (79.01")
Monitor-1: DP-1 mapped: DisplayPort-0 pos: primary,right
model: AOC U34G2G4R3 serial: <filter> built: 2021 res: 3440x1440 hz: 60
dpi: 110 gamma: 1.2 size: 797x334mm (31.38x13.15") diag: 864mm (34")
modes: max: 3440x1440 min: 720x400
Monitor-2: HDMI-A-1 mapped: HDMI-A-0 pos: left model: OLED55-H1
serial: <filter> built: 2020 res: 3840x2160 hz: 60 dpi: 81 gamma: 1.2
size: 1209x680mm (47.6x26.77") diag: 1639mm (64.5") ratio: 16:9 modes:
max: 3840x2160 min: 720x400
API: OpenGL v: 4.6 Mesa 22.3.1 renderer: AMD Radeon RX 6750 XT (navi22
LLVM 14.0.6 DRM 3.48 6.0.12-zen1-1-zen) direct render: Yes
Audio:
Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel
bus-ID: 5-2:3 pcie: chip-ID: 046d:082d gen: 4 speed: 16 GT/s class-ID: 0102
lanes: 16 serial: <filter> bus-ID: 0c:00.1 chip-ID: 1002:ab28
class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 0e:00.4 chip-ID: 1022:1487 class-ID: 0403
Device-3: Logitech HD Pro Webcam C920 type: USB
driver: snd-usb-audio,uvcvideo
Sound API: ALSA v: k6.0.12-zen1-1-zen running: yes
Sound Server-1: PulseAudio v: 16.1 running: no
Sound Server-2: PipeWire v: 0.3.63 running: yes
Network:
Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel pcie: gen: 2
speed: 5 GT/s lanes: 1 bus-ID: 05:00.0 chip-ID: 8086:2723 class-ID: 0280
IF: wlp5s0 state: down mac: <filter>
Device-2: Intel Ethernet I225-V vendor: ASUSTeK driver: igc v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 1 port: N/A bus-ID: 06:00.0
chip-ID: 8086:15f3 class-ID: 0200
IF: enp6s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Bluetooth:
Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 1-4:3
chip-ID: 8087:0029 class-ID: e001
Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
Drives:
Local Storage: total: 2.75 TiB used: 216.19 GiB (7.7%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: TeamGroup
model: T-FORCE TM8FP8001T size: 953.87 GiB block-size: physical: 512 B
logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
rev: V9002s77 temp: 39.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 QVO 2TB
size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 2B6Q scheme: GPT
Partition:
ID-1: / raw-size: 953.57 GiB size: 953.57 GiB (100.00%)
used: 216.19 GiB (22.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 608 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 953.57 GiB size: 953.57 GiB (100.00%)
used: 216.19 GiB (22.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-4: /var/log raw-size: 953.57 GiB size: 953.57 GiB (100.00%)
used: 216.19 GiB (22.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-5: /var/tmp raw-size: 953.57 GiB size: 953.57 GiB (100.00%)
used: 216.19 GiB (22.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 31.26 GiB used: 3 MiB (0.0%) priority: 100
dev: /dev/zram0
Sensors:
System Temperatures: cpu: 42.0 C mobo: N/A gpu: amdgpu temp: 56.0 C
mem: 62.0 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
Processes: 333 Uptime: 8m wakeups: 0 Memory: 31.26 GiB
used: 5.24 GiB (16.8%) Init: systemd v: 252 default: graphical
tool: systemctl Compilers: gcc: 12.2.0 clang: 14.0.6 Packages: pm: pacman
pkgs: 1975 libs: 552 tools: octopi,paru Shell: fish v: 3.5.1 default: Bash
v: 5.1.16 running-in: yakuake inxi: 3.3.24
Garuda (2.6.10-1):
System install date:     2022-12-16
Last full system update: 2022-12-16
Is partially upgraded:   No
Relevant software:       NetworkManager
Windows dual boot:       No/Undetected
Snapshots:               Snapper
Failed units:

Second (nvme0n1p2) is a /var/cache file.

ls -la /var/cache

Post full output, please, from lsblk nest time :slight_smile:

Thanks for the quick reply!

That is the full output of my lsblk.

What I'm expecting to see is a second drive that's nvme1n1, in addition to nvme0n1. Each of those should be 953.6G. Instead I'm only seeing one, which is what my main system is installed on.

How did you boot the system without /home?
How did you install?
Do you change something in calamares?

lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
zram0       254:0    0  13,5G  0 disk [SWAP]
nvme0n1     259:0    0 953,9G  0 disk
├─nvme0n1p1 259:1    0   260M  0 part /boot/efi
├─nvme0n1p2 259:2    0    16M  0 part
├─nvme0n1p3 259:3    0 252,6G  0 part
├─nvme0n1p4 259:4    0  1000M  0 part
├─nvme0n1p5 259:5    0 348,5G  0 part /var/tmp
│                                     /var/log
│                                     /var/cache
│                                     /srv
│                                     /root
│                                     /home
│                                     /
└─nvme0n1p6 259:6    0 351,5G  0 part

Welp, I'm very confused, because that was absolutely the full output of my lsblk when I created this post:

image

But when I run it now, I get more. Here's the full output:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0   1.8T  0 disk
└─sda1        8:1    0   1.8T  0 part
zram0       254:0    0  31.3G  0 disk [SWAP]
nvme0n1     259:0    0 953.9G  0 disk
├─nvme0n1p1 259:1    0   300M  0 part /boot/efi
└─nvme0n1p2 259:2    0 953.6G  0 part /var/log
                                      /srv
                                      /var/cache
                                      /home
                                      /var/tmp
                                      /root
                                      /

I installed through the basic Calamares installer, and chose to replace the full disk. When I installed, Calamares was only seeing one of my two nvme drives, and the SSD installed in my system. It didn't even see the other nvme I have in there. But my Garuda install at the time was seeing it.

So you must add the disk in /etc/fstab or mount in fle browser.

What was the old install?

How are the two nvme drives connected? Directly to the motherboard, or is there a drive enclosure, etc.

It would be good to go through your BIOS anyway. Double-check any settings related to SATA, NVME, or RAID. Update the thread with any findings you are unsure about.

Check dmesg and sudo parted -l too, to see if other tools are able to detect the drive.

Just to confirm: from a hardware standpoint, nothing has changed between the last installation and this one?

2 Likes

Just a shot in the dark...
If nothing else works, try the linux-lts kernel.
This is only because long ago, in kernel 5.18, there was a bug, solved in 5.19, in some cases with two identical nvme disks one was not recognized.
As said, it was fixed, so take it only as an option...

2 Likes

Sometimes when your hardware is not being detected properly, resetting the bios to the factory default can correct the problem. You will of course need to change some of the settings after a factory reset to make things Linux compatible (disable secure boot, set to AHCI etc).

3 Likes

Thanks for the replies, y’all! I’ll answer some now, and I’ll mess with bios later and post again.

It was also Garuda KDE Dragonized. I hadn’t been able to get any kernel other than lts to work for months and I didn’t feel like messing with it, so I chose to do a fresh reinstall. (Zen kernel is working after the reinstall.)

Both connected directly to the motherboard.

Technically that’s correct. I did switch from an Nvidia to an AMD GPU a day before the install. But after doing that, I booted into my old install a few times, and verified that the drive was showing up.

But this is making me think that I should just double check the connection. I’ll remove and reseat the drive as a test.

Results:

sudo parted -l
Model: ATA Samsung SSD 870 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
1      1049kB  2000GB  2000GB  ext4               hidden


Model: T-FORCE TM8FP8001T (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
1      2097kB  317MB   315MB   fat32              boot, esp
2      317MB   1024GB  1024GB  btrfs        root


Model: Unknown (unknown)
Disk /dev/zram0: 33.6GB
Sector size (logical/physical): 4096B/4096B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system     Flags
1      0.00B  33.6GB  33.6GB  linux-swap(v1)

Let me know if I should be checking dmesg in some other way:

sudo dmesg | grep nvme
[    5.082904] nvme nvme0: pci function 0000:01:00.0
[    5.082944] nvme nvme1: pci function 0000:04:00.0
[    5.300956] nvme nvme0: failed to set APST feature (2)
[    5.301276] nvme nvme1: failed to set APST feature (2)
[    5.337908] nvme nvme0: allocated 64 MiB host memory buffer.
[    5.338246] nvme nvme1: allocated 64 MiB host memory buffer.
[    5.435270] nvme nvme0: 7/0/0 default/read/poll queues
[    5.435614] nvme nvme1: 7/0/0 default/read/poll queues
[    5.469710] nvme nvme1: globally duplicate IDs for nsid 1
[    5.469712] nvme nvme1: VID:DID 10ec:5763 model:T-FORCE TM8FP8001T firmware:V9002s77
[    5.484987]  nvme0n1: p1 p2
[    5.571335] BTRFS: device fsid 14889f51-4b3d-4a50-b2bf-c106e143d926 devid 1 transid 666 /dev/nvme0n1p2 scanned by systemd-udevd (242)
[    9.433164] BTRFS info (device nvme0n1p2): using crc32c (crc32c-intel) checksum algorithm
[    9.433169] BTRFS info (device nvme0n1p2): using free space tree
[    9.452904] BTRFS info (device nvme0n1p2): enabling ssd optimizations
[   10.111472] BTRFS info (device nvme0n1p2: state M): use zstd compression, level 3
[   10.111475] BTRFS info (device nvme0n1p2: state M): turning on async discard
[   50.258524] BTRFS info: devid 1 device path /dev/nvme0n1p2 changed to /dev/disk/by-uuid/14889f51-4b3d-4a50-b2bf-c106e143d926 scannedby Thread (pooled) (6364)
[   78.421541] BTRFS info: devid 1 device path /dev/disk/by-uuid/14889f51-4b3d-4a50-b2bf-c106e143d926 changed to /dev/nvme0n1p2 scannedby mount (7563)

Here’s what I found:

  • SATA port enabled
  • SATA mode is AHCI
  • NVMe RAID mode disabled
  • AMI native NVMe driver support enabled

I think this is the smoking gun right here:

Back in February, a check for duplicate ID’s was added to the upstream kernels (see here), which started showing up on the 5.18 kernel. Someone opened up an issue on the kernel bug tracker back in May (https://bugzilla.kernel.org/show_bug.cgi?id=216049) with pretty much the same issue you are describing.

Here is another thread with someone who has this problem: arch linux - globally duplicate IDs for nsid - Unix & Linux Stack Exchange

The gist of what is happening is sometimes manufacturers are producing disks that are not getting uniquely labeled how they should be, and after this update the kernel basically refuses to acknowledge a device with a dupe label. This also explains why you did not have any issues on the LTS kernel (the LTS kernel is still back on 5.15).

If you read through the bug report, you will see there are a few kernel patches floating around to try to resolve the problem, but a complicating factor appears to be that this seems to be an error on the drive vendor’s part–not necessarily something that should be fixed in the kernel.

Keith Busch 2022-05-30 14:26:46 UTC
The change to prevent duplicates was on purpose. Duplicates break udev’s ability to create reliable by-id symlinks, which can cause data corruption.

The EUI/NGUID/UUID from the nvme controller are supposed to be globally unique and are set by the vendor. If different namespaces report the same EUI64, then your vendor has messed it up.

Except for perhaps vendor specific tools, this is not a user controllable identifier.

If your vendor can’t fix it, then we would have to quirk the driver for your vendor:device to ignore the bogus values.

Check to see if there is a firmware update available for the drives you picked up. Updating firmware for nvme drives on Linux is not as simple as on a Windows box where you just click your way through a wizard, but that doesn’t mean it can’t be done. Check this article for updating Samsung SSDs on Linux: Solid state drive - ArchWiki or who knows, you might get lucky with fwupd, which is very easy to use: fwupd - ArchWiki

Obviously those resources will only be helpful if Samsung has released a firmware update to address this issue in the first place.

If not, your options are:

  • Build a custom kernel with one of the available patches.
  • Get back on the LTS kernel until the issue is resolved either in the upstream kernel or when a firmware update is released for the disk.
  • Change out the disk–maybe swap with a friend or something, or better yet reach out to Samsung and see what they can do for you.
3 Likes

Wow, you've solved a 6-month riddle for me, thank you, @BluishHumility!!!

This tracks exactly with the issues I've been seeing. I searched, and there's not a firmware update for the Teamgroup drives I'm using. I've got a new WD Black drive coming today that I'll switch out, and I'll just find something else fun to use that other SSD for. :slight_smile:

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.