System:
Kernel: 5.19.8-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=9d94f1a7-6727-49e7-ada1-357e62c1d7ef rw rootflags=subvol=@
quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
resume=UUID=0e3b11bb-b58e-4b86-b090-7c8791dbc60b loglevel=3
amdgpu.ppfeaturemask=0xffffffff
Desktop: KDE Plasma v: 5.25.5 tk: Qt v: 5.15.6 wm: kwin_wayland vt: 2
dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Desktop System: Gigabyte product: AB350-Gaming 3 v: N/A
serial: <superuser required>
Mobo: Gigabyte model: AB350-Gaming 3-CF serial: <superuser required>
UEFI: American Megatrends LLC. v: F52h date: 07/27/2022
Battery:
Device-1: sony_controller_battery_06:ed:69:89:63:25 model: N/A serial: N/A
charge: N/A status: full
CPU:
Info: model: AMD Ryzen 5 4500 bits: 64 type: MT MCP arch: Zen 2 gen: 3
built: 2020-22 process: TSMC n7 (7nm) family: 0x17 (23) model-id: 0x60 (96)
stepping: 1 microcode: 0x8600106
Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 3 MiB desc: 6x512 KiB
L3: 8 MiB desc: 2x4 MiB
Speed (MHz): avg: 3220 high: 3593 min/max: N/A cores: 1: 2972 2: 2840
3: 2947 4: 2937 5: 3593 6: 3593 7: 3593 8: 2994 9: 3593 10: 3593 11: 2994
12: 2994 bogomips: 86242
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed mitigation: untrained return thunk; SMT enabled with STIBP
protection
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP:
always-on, RSB filling, PBRSB-eIBRS: Not affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: XFX Pine driver: amdgpu v: kernel arch: GCN-4 code: Arctic Islands
process: GF 14nm built: 2016-20 pcie: gen: 3 speed: 8 GT/s lanes: 16
ports: active: DP-1,DVI-D-1,HDMI-A-1 empty: DP-2,DP-3 bus-ID: 01:00.0
chip-ID: 1002:67df class-ID: 0300
Display: wayland server: X.org v: 1.21.1.4 with: Xwayland v: 22.1.3
compositor: kwin_wayland driver: X: loaded: amdgpu unloaded: modesetting
alternate: fbdev,vesa gpu: amdgpu d-rect: 5760x1080 display-ID: 0
Monitor-1: DP-1 pos: center res: 1920x1080 size: N/A modes: N/A
Monitor-2: DVI-D-1 pos: primary,left res: 1920x1080 size: N/A modes: N/A
Monitor-3: HDMI-A-1 pos: right res: 1920x1080 size: N/A modes: N/A
OpenGL: renderer: AMD Radeon RX 590 Series (polaris10 LLVM 14.0.6 DRM
3.47 5.19.8-zen1-1-zen) v: 4.6 Mesa 22.1.7 direct render: Yes
Audio:
Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
vendor: XFX Pine driver: snd_hda_intel bus-ID: 3-1.4.3:12 v: kernel pcie:
chip-ID: 046d:0a8f class-ID: 0300 gen: 1 speed: 2.5 GT/s lanes: 16
link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.1 chip-ID: 1002:aaf0
class-ID: 0403
Device-2: AMD Renoir Radeon High Definition Audio driver: snd_hda_intel
v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4
speed: 16 GT/s bus-ID: 08:00.1 chip-ID: 1002:1637 class-ID: 0403
Device-3: AMD Family 17h/19h HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
link-max: gen: 4 speed: 16 GT/s bus-ID: 08:00.6 chip-ID: 1022:15e3
class-ID: 0403
Device-4: Logitech H390 headset with microphone type: USB
driver: hid-generic,snd-usb-audio,usbhid
Sound Server-1: ALSA v: k5.19.8-zen1-1-zen running: yes
Sound Server-2: sndio v: N/A running: no
Sound Server-3: PulseAudio v: 16.1 running: no
Sound Server-4: PipeWire v: 0.3.57 running: yes
Network:
Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
vendor: Gigabyte driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
lanes: 1 port: e000 bus-ID: 04:00.0 chip-ID: 10ec:8168 class-ID: 0200
IF: eno1 state: up speed: 100 Mbps duplex: full mac: <filter>
Bluetooth:
Device-1: Cambridge Silicon Radio Bluetooth Dongle (HCI mode) type: USB
driver: btusb v: 0.8 bus-ID: 1-7:3 chip-ID: 0a12:0001 class-ID: e001
Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
Drives:
Local Storage: total: 3.18 TiB used: 749.79 GiB (23.0%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Crucial model: CT500P2SSD8
size: 465.76 GiB block-size: physical: 512 B logical: 512 B
speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: P2CR048
temp: 41.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Crucial model: CT500MX500SSD1
size: 465.76 GiB block-size: physical: 4096 B logical: 512 B
speed: 6.0 Gb/s type: SSD serial: <filter> rev: 023
ID-3: /dev/sdb maj-min: 8:16 vendor: Crucial model: CT500MX500SSD1
size: 465.76 GiB block-size: physical: 4096 B logical: 512 B
speed: 6.0 Gb/s type: SSD serial: <filter> rev: 023
ID-4: /dev/sdc maj-min: 8:32 type: USB vendor: Western Digital model: WD
My Passport 0748 size: 1.82 TiB block-size: physical: 512 B
logical: 512 B type: N/A serial: <filter> rev: 1019 scheme: GPT
Partition:
ID-1: / raw-size: 431.13 GiB size: 431.13 GiB (100.00%) used: 159.46 GiB
(37.0%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) used: 608 KiB
(0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 431.13 GiB size: 431.13 GiB (100.00%) used: 159.46
GiB (37.0%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-4: /var/log raw-size: 431.13 GiB size: 431.13 GiB (100.00%) used: 159.46
GiB (37.0%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-5: /var/tmp raw-size: 431.13 GiB size: 431.13 GiB (100.00%) used: 159.46
GiB (37.0%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 10 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 31.22 GiB used: 0 KiB (0.0%) priority: 100
dev: /dev/zram0
ID-2: swap-2 type: partition size: 34.34 GiB used: 0 KiB (0.0%)
priority: -2 dev: /dev/nvme0n1p3 maj-min: 259:3
Sensors:
System Temperatures: cpu: 38.6 C mobo: 26.0 C gpu: amdgpu temp: 41.0 C
Fan Speeds (RPM): cpu: 0 fan-1: 0 fan-3: 0 gpu: amdgpu fan: 647
Power: 12v: N/A 5v: N/A 3.3v: 1.69 vbat: 1.58 gpu: amdgpu watts: 45.01
Info:
Processes: 359 Uptime: 8m wakeups: 1 Memory: 31.22 GiB used: 3.36 GiB
(10.8%) Init: systemd v: 251 default: graphical tool: systemctl
Compilers: gcc: 12.2.0 alt: 11 clang: 14.0.6 Packages: pacman: 2174
lib: 556 Shell: fish v: 3.5.1 default: Bash v: 5.1.16 running-in: konsole
inxi: 3.3.20
e[1;34mGaruda (2.6.7-1):e[0m
e[1;34m System install date:e[0m 2022-09-11
e[1;34m Last full system update:e[0m 2022-09-14
e[1;34m Is partially upgraded: e[0m No
e[1;34m Relevant software: e[0m NetworkManager
e[1;34m Windows dual boot: e[0m No/Undetected
e[1;34m Snapshots: e[0m Snapper
e[1;34m Failed units: e[0m
Hi I am trying to run Stable Diffusion on AMD HW, RX 590 Fatboy. I installed ROCm with the Arch Wiki guide here --> GPGPU-Wiki and below I show some outout that seems to indicate that my GPU is supported.
This is the terminal output when I try the program:
python scripts/txt2img.py --prompt "some text" --outdir ./output
/home/binarydepth/.conda/envs/ldm/lib/python3.8/site-packages/torchvision/io/image.py:13: ***UserWarning: Failed to load image Python extension: libc10_cuda.so: cannot open shared object file: No such file or directory***
warn(f"Failed to load image Python extension: {e}")
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Traceback (most recent call last):
File "scripts/txt2img.py", line 345, in <module>
main()
File "scripts/txt2img.py", line 241, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "scripts/txt2img.py", line 51, in load_model_from_config
pl_sd = torch.load(ckpt, map_location="cpu")
File "/home/binarydepth/.conda/envs/ldm/lib/python3.8/site-packages/torch/serialization.py",line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/binarydepth/.conda/envs/ldm/lib/python3.8/site-packages/torch/serialization.py",line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3380.4)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD
Platform Host timer resolution 1ns
Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name Ellesmere
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0 AMD-APP (3380.4)
Driver Version 3380.4 (PAL,HSAIL)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 590 Series
Device PCI-e ID (AMD) 0x67df
Device Topology (AMD) PCI-E, 0000:01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 36
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1545MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 36
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple (kernel) 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8589934592 (8GiB)
Global free memory (AMD) 8321000 (7.936GiB) 0 (0B)
Global memory channels (AMD) 8
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 7301444403 (6.8GiB)
Unified memory for Host and Device No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 6571299840 (6.12GiB)
Preferred total size of global vars 8589934592 (8GiB)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
***Image support Yes***
Max number of samplers per kernel 16
Max size for 1D images from buffer 456340275 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 16
Max pipe packet size 3006477107 (2.8GiB)
Local memory type Local
Local memory size 65536 (64KiB)
Local memory size per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 7301444403 (6.8GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 262144 (256KiB)
Max size 8388608 (8MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Number of P2P devices (AMD) 0
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1663158834042297879ns (Wed Sep 14 08:33:54 2022)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 4
Max real-time compute queues (AMD) 1
Max real-time compute units (AMD) 0
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.1
ICD loader Profile OpenCL 3.0
e[37mROCk module is loadede[0m
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 4500 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 4500 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 0
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32732472(0x1f37538) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32732472(0x1f37538) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32732472(0x1f37538) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
***Name: gfx803 ***
*** Uuid: GPU-XX ***
*** Marketing Name: AMD Radeon RX 590 Series ***
*** Vendor Name: AMD***
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26591(0x67df)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1545
BDFID: 256
Internal Node ID: 1
Compute Unit: 36
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8388608(0x800000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***