centos8 安装nvidia 显卡驱动(一路踩坑一路填) |
您所在的位置:网站首页 › centos安装图形化界面报错 › centos8 安装nvidia 显卡驱动(一路踩坑一路填) |
最近在安装centos8的nvidia显卡驱动,遇到了一些问题,希望能在大家的帮助下共同解决、共同学习。废话不多说,直接上内容 1 首先确认内核版本和发行版本,再确认显卡型号 (1)uname -a Linux localhost.localdomain 4.18.0-147.el8.x86_64 #1 SMP Wed Dec 4 21:51:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux (2)cat /etc/redhat-release CentOS Linux release 8.1.1911 (Core) (3)lspci | grep -i nvidia 04:00.0 3D controller: NVIDIA Corporation GK208M [GeForce GT 730M] (rev a1) 2从官网下载对应版本驱动 官方高级驱动搜索 | NVIDIA https://www.nvidia.cn/Download/index.aspx?lang=cn 点击搜索后,可以看到它的版本418.113,发布日期2019.11.5,网页上还有它的发布重点、产品支持列表和其他信息 Linux x64 (AMD64/EM64T) Display Driver 版本:418.113发布日期:2019.11.5操作系统:Linux 64-bit语言:Chinese (Simplified)文件大小:104.78 MB3 开始安装 chmod 777 NVIDIA-Linux-x86_64-418.113.run #添加文件执行权限 以root 运行,进入命令行模式init 3 ./NVIDIA-Linux-x86_64-418.113.run 3.1 此时遇到了第一个错误: ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver. nouveau 是很多linux 发行版带的驱动,目的是为了兼容各种不同显卡,要安装nvidia驱动必须禁用nouveau驱动。点击‘OK’后,会出现如下所示: 这个页面是提示安装程序可以在modprobe添加文件来达到禁用nouveau的目的。可以先选择‘Yes’,点击后,可以看到生成了如下文件 /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf /etc/modprobe.d/nvidia-installer-disable-nouveau.conf. 两个文件里的内容是一样的,如下所示: # generated by nvidia-installer blacklist nouveau options nouveau modeset=0 然后,再次安装,会有提示: WARNING: One or more modprobe configuration files to disable Nouveau are already present at:/usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,/etc/modprobe.d/nvidia-installer-disable-nouveau.conf. Please be sure you have rebooted your system since these files were written. If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration file. Please consult the NVIDIA driver READMEand your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver. 但关键的是,这样的操作并没有禁用nouveau驱动,一样会报第一张图片的错误,使用命令查看(所有操作均在root帐户下执行) [root@localhost ***]# lsmod | grep nouveau nouveau 2215936 2 mxm_wmi 16384 1 nouveau i2c_algo_bit 16384 2 i915,nouveau drm_kms_helper 217088 2 i915,nouveau ttm 110592 1 nouveau drm 524288 15 drm_kms_helper,i915,ttm,nouveau wmi 32768 3 wmi_bmof,mxm_wmi,nouveau video 45056 3 thinkpad_acpi,i915,nouveau 正确的做法应该如下所示 (1)在grub 启动中禁用nouveau, vim /etc/default/grub "GRUB_CMDLINE_LINUX"中添加 rd.driver.blacklist=nouveau nouveau.modeset=0 然后更新grub:grub2-mkconfig -o /boot/grub2/grub.cfg (2)在/usr/lib/modprobe.d/dist-blacklist.conf 或/etc/modprobe.d/blacklist.conf中末尾添加blacklist。如下是/usr/lib/modprobe.d/dist-blacklist.conf的原有内容 vim /usr/lib/modprobe.d/dist-blacklist.conf # # Listing a module here prevents the hotplug scripts from loading it. # Usually that'd be so that some other driver will bind it instead, # no matter which driver happens to get probed first. Sometimes user # mode tools can also control driver binding. # # Syntax: see modprobe.conf(5). # # watchdog drivers blacklist i8xx_tco # framebuffer drivers blacklist aty128fb blacklist atyfb blacklist radeonfb blacklist i810fb blacklist cirrusfb blacklist intelfb blacklist kyrofb blacklist i2c-matroxfb blacklist hgafb blacklist nvidiafb 在末尾添加: blacklist nouveau 保存 (3)备份 initramfs nouveau image镜像 mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak (4)使用 dracut重新建立 initramfs dracut -v /boot/initramfs-$(uname -r).img $(uname -r) (5)reboot 重启,然后lsmod | grep nouveau 确认nouveau没有被加载 重新安装 ./NVIDIA-Linux-x86_64-418.113.run 3.2 遇到第二个错误 ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com. 找不到kernel-source-tree 确认kernel-headers 和kernel-devel 是否已安装和安装版本 root@localhost ***]# dnf list kernel-headers 已安装的软件包 kernel-headers.x86_64 4.18.0-305.19.1.el8_4 @baseos [root@localhost ***]# dnf list kernel-headers 已安装的软件包 kernel-headers.x86_64 4.18.0-193.28.1.el8_2 @BaseOS 可安装的软件包 kernel-headers.x86_64 4.18.0-305.19.1.el8_4 BaseOS [root@localhost ***]# dnf list kernel-devel kernel-devel.x86_64 4.18.0-305.19.1.el8_4 @baseos 可以看到kernel-headers 已安装,安装版本4.18.0-193.28.1.el8_2 ,kernel-devel还未安装。此时要解决这个问题有两点:一是要确保kernel-headers、kernel-devel都要安装,而要确保两个软件包的版本与uname -a 显示的版本一致。 使用dnf install kernel-devel 安装后,安装是当前主版本4.18.0的最高小版本4.18.0-305.19.1.el8_4 。也可以安装当前版本,在网上没找到对应版本的kernel-headers和kernel-devel。现在就只有升级内核版本了。使用dnf distro-sync同步(等同于yum update)命令将所有软件更新至最新版本,当然也包括内核版本,同步结束后重启,再次确认版本 [root@localhost ***]# uname -a Linux localhost.localdomain 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Wed Sep 15 15:39:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ***]# dnf list kernel-headers kernel-headers.x86_64 4.18.0-305.19.1.el8_4 @baseos [root@localhost ***]# dnf list kernel-devel 已安装的软件包 kernel-devel.x86_64 4.18.0-305.19.1.el8_4 @baseos 可以看到升级后的版本为4.18.0-305.19.1.el8_4 ,升级后的默认启动内核为最新版本4.18.0-305.19.1.el8_4 ,而且启动项也多了这个版本的内核选项,而且kernel-headers 、kernel-devel 版本已于uname -a 显示的一致。 此时要注意不能直接dnf install kernel-4.18.0-305.19.1.el8_4,如果这样版本内核安装了,但是没有生成启动内核,进入不了新内核版本,而要使用dnf distro-sync kernel-headers 下载地址https://pkgs.org/download/kernel-headers kernel-devel 下载地址https://pkgs.org/download/kernel-devel 继续安装 ./NVIDIA-Linux-x86_64-418.113.run 3.3 遇到了第三个错误 ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details. Checking to see whether the nvidia kernel module was successfully built 构建内核时出错。网上的主要方法有两个,一是降版本(参考资料3),二是下载最新驱动(参考资料4) 3.3.1 冥思苦想,没有发现dnf 可以将升级后的降回去,有一个dnf downgrade ,是针对某个特定软件包的。最后只好自己试一次。 升级之后的kernel 软件包如下: [root@localhost ***]# dnf --showduplicates list kernel | expand 已安装的软件包 kernel.x86_64 4.18.0-147.el8 @anaconda kernel.x86_64 4.18.0-305.19.1.el8_4 @baseos 可安装的软件包 kernel.x86_64 4.18.0-305.3.1.el8 baseos kernel.x86_64 4.18.0-305.7.1.el8_4 baseos kernel.x86_64 4.18.0-305.10.2.el8_4 baseos kernel.x86_64 4.18.0-305.12.1.el8_4 baseos kernel.x86_64 4.18.0-305.17.1.el8_4 baseos kernel.x86_64 4.18.0-305.19.1.el8_4 baseos 4.18.0-147.el8是之前的内核版本,4.18.0-305.19.1.el8_4是升级之后的内核版本 [root@localhost ***]# dnf list kernel* 已安装的软件包 kernel.x86_64 4.18.0-147.el8 @anaconda kernel.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-core.x86_64 4.18.0-147.el8 @anaconda kernel-core.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-devel.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-headers.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-modules.x86_64 4.18.0-147.el8 @anaconda kernel-modules.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-tools.x86_64 4.18.0-305.19.1.el8_4 @baseos kernel-tools-libs.x86_64 4.18.0-305.19.1.el8_4 @baseos 可安装的软件包 kernel-abi-stablelists.noarch 4.18.0-305.19.1.el8_4 baseos kernel-cross-headers.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-debug.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-debug-core.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-debug-devel.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-debug-modules.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-debug-modules-extra.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-doc.noarch 4.18.0-305.19.1.el8_4 baseos kernel-modules-extra.x86_64 4.18.0-305.19.1.el8_4 baseos kernel-rpm-macros.noarch 125-1.el8 appstream kernelshark.x86_64 2.7-9.el8 appstream 可以发现kernel-core,kernel-modules均是新旧版本并存。开始尝试。 第一步:dnf remove kernel-4.18.0-305.19.1.el8_4,然后重启,发现4.18.0-305.19.1.el8_4的启动项还在,并且可以进入,uname -a 的版本依旧是4.18.0-305.19.1.el8_4 第二步删除 /boot/loader/entries/ 的下文件 3dfed34393c14fd091784d3c4f08ca02-4.18.0-305.19.1.el8_4.x86_64.conf 和/boot下文件 vmlinuz-4.18.0-305.19.1.el8_4.x86_64 最后使用grubby --set-default /boot/vmlinuz-4.18.0-147.el8.x86_64 更改默认启动内核为4.18.0-147.el8.x86_64。 经过这样一番”神操作“后,恢复了原来的样子,再重新安装驱动 发现问题依旧。。。。。。。。。。。 其实可以不用降级,可以从启动项进入原来之前版本的内核,不过软件版本都是同步之后的新版本软件 3.3.2 从官网下载最新驱动, 当时这个版本418.113的发布日期为2019.11.5,而内核4.18.0的产生时间大约是在2019年初,而且418.113 发布重点里有这么一句话Fixed kernel module build problems with Linux kernel 5.4.0 release candidates。可以确认这个版本为较新版本。从其他地方下载了一个418.56的版本,尝试后一样的错误 3.3.3 再次升级。 没有办法,只有再次升级,使用dnf distro-sync后,问题来了,只安装了最新内核kernel-4.18.0-305.19.1.el8_4,开机启动项无kernel-4.18.0-305.19.1.el8_4,且/boot/loader/entries 和/boot 下无对应版本的vmlinuz文件 其实3.3.1的降级中只删除了内核文件,内核相关的文件都没有删除,正确的方法应该是: 第一布:dnf remove kernel-4.18.0-305.19.1.el8_4 第二步:dnf remove kernel-core-4.18.0-305.19.1.el8_4 这里会有提示,让你确认删除依赖的软件包,如modules,确认就行 第三步 删除4.18.0-305.19.1.el8_4的其他软件包,如tools,tools-libs,而kernel-headers 不要删除,它只是升级版本而已 进行这三步后,会自动删除启动项//boot/loader/entries/ 和/boot下的vmlinuz-4.18.0-305.19.1.el8_4.x86_64 二进制文件,当然重启后用dnf distro-sync 也可以同步至最新版本。 3.3.4 查看log 日志 ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details. Checking to see whether the nvidia kernel module was successfully built Using: nvidia-installer ncurses v6 user interface -> Detected 4 CPUs online; setting concurrency level to 4. -> Tagging shared libraries with chcon -t textrel_shlib_t. -> Installing NVIDIA driver version 470.74. -> Performing CC sanity check with CC="/usr/bin/cc". -> Performing CC check. -> Kernel source path: '/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source' -> Kernel output path: '/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build' -> Performing Compiler check. -> Performing Dom0 check. -> Performing Xen check. -> Performing PREEMPT_RT check. -> Performing vgpu_kvm check. -> Cleaning kernel module build directory. executing: 'cd ./kernel; /usr/bin/make -k -j4 clean NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source" SYSOUT="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build"'... rm -f -r conftest make[1]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[1]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' -> Building kernel modules executing: 'cd ./kernel; /usr/bin/make -k -j4 NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source" SYSOUT="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build"'... make[1]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' /usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64/Makefile:984: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel". Stop. make[2]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[1]: *** [Makefile:157: sub-make] Error 2 make[1]: Target 'modules' not remade because of errors. make[1]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make: *** [Makefile:80: modules] Error 2 -> ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details. 日至中提示缺少 libelf-dev, libelf-devel 或 elfutils-libelf-devel,最后只安装了第三个 root@localhost ***]# dnf install elfutils-libelf-devel ================================================================================ 软件包 架构 版本 仓库 大小 ================================================================================ 安装: elfutils-libelf-devel x86_64 0.182-3.el8 baseos 59 k 安装依赖关系: zlib-devel x86_64 1.2.11-17.el8 baseos 58 k 事务概要 ================================================================================ 安装 2 软件包 总下载:116 k 安装大小:171 k 确定吗?[y/N]: 确认后,重启,重新安装 3.4 此时遇到了第四个问题 3.4.1 ERROR: An error occurred while performing the step: "Checking to see whether the nvidia kernel module was successfully built". See /var/log/nvidia-installer.log for details. ERROR: The nvidia kernel module was not created. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com. 直接查看日志: Using: nvidia-installer ncurses v6 user interface -> Detected 4 CPUs online; setting concurrency level to 4. -> Tagging shared libraries with chcon -t textrel_shlib_t. -> Installing NVIDIA driver version 418.113. -> Performing CC sanity check with CC="/usr/bin/cc". -> Kernel source path: '/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source' -> Kernel output path: '/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build' -> Performing Compiler check. -> Performing Dom0 check. -> Performing Xen check. -> Performing PREEMPT_RT check. -> Performing vgpu_kvm check. -> Cleaning kernel module build directory. executing: 'cd ./kernel; /usr/bin/make -k -j4 clean NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source" SYSOUT="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build"'... rm -f -r conftest make[1]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[1]: Leaving directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' -> Building kernel modules executing: 'cd ./kernel; /usr/bin/make -k -j4 NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/source" SYSOUT="/lib/modules/4.18.0-305.19.1.el8_4.x86_64/build"'... make[1]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' make[2]: Entering directory '/usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64' SYMLINK /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv-kernel.o SYMLINK /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia-modeset/nv-modeset-kernel.o CONFTEST: INIT_WORK CONFTEST: remap_pfn_range CONFTEST: follow_pfn CONFTEST: hash__remap_4k_pfn CONFTEST: vmap CONFTEST: set_pages_uc CONFTEST: list_is_first CONFTEST: set_memory_uc CONFTEST: set_memory_array_uc CONFTEST: change_page_attr CONFTEST: pci_get_class CONFTEST: pci_choose_state CONFTEST: vm_insert_page CONFTEST: acpi_device_id CONFTEST: acquire_console_sem CONFTEST: console_lock CONFTEST: kmem_cache_create CONFTEST: on_each_cpu CONFTEST: smp_call_function CONFTEST: acpi_evaluate_integer CONFTEST: ioremap_cache CONFTEST: ioremap_wc CONFTEST: acpi_walk_namespace CONFTEST: pci_domain_nr CONFTEST: pci_dma_mapping_error CONFTEST: sg_alloc_table CONFTEST: sg_init_table CONFTEST: pci_get_domain_bus_and_slot CONFTEST: get_num_physpages CONFTEST: efi_enabled CONFTEST: proc_create_data CONFTEST: pde_data CONFTEST: proc_remove CONFTEST: pm_vt_switch_required CONFTEST: xen_ioemu_inject_msi CONFTEST: phys_to_dma CONFTEST: get_dma_ops CONFTEST: write_cr4 CONFTEST: of_get_property CONFTEST: of_find_node_by_phandle CONFTEST: of_node_to_nid CONFTEST: pnv_pci_get_npu_dev CONFTEST: of_get_ibm_chip_id CONFTEST: for_each_online_node CONFTEST: node_end_pfn CONFTEST: pci_bus_address CONFTEST: pci_stop_and_remove_bus_device CONFTEST: pci_remove_bus_device CONFTEST: request_threaded_irq CONFTEST: register_cpu_notifier CONFTEST: cpuhp_setup_state CONFTEST: dma_map_resource CONFTEST: backlight_device_register CONFTEST: register_acpi_notifier CONFTEST: timer_setup CONFTEST: pci_enable_msix_range CONFTEST: compound_order CONFTEST: do_gettimeofday CONFTEST: dma_direct_map_resource CONFTEST: vmf_insert_pfn CONFTEST: remap_page_range CONFTEST: address_space_init_once CONFTEST: kbasename CONFTEST: fatal_signal_pending CONFTEST: list_cut_position CONFTEST: vzalloc CONFTEST: wait_on_bit_lock_argument_count CONFTEST: bitmap_clear CONFTEST: usleep_range CONFTEST: radix_tree_empty CONFTEST: radix_tree_replace_slot CONFTEST: pnv_npu2_init_context CONFTEST: drm_dev_unref CONFTEST: drm_reinit_primary_mode_group CONFTEST: get_user_pages_remote CONFTEST: get_user_pages CONFTEST: drm_gem_object_lookup CONFTEST: drm_atomic_state_ref_counting CONFTEST: drm_driver_has_gem_prime_res_obj CONFTEST: drm_atomic_helper_connector_dpms CONFTEST: drm_connector_funcs_have_mode_in_name CONFTEST: drm_framebuffer_get CONFTEST: drm_gem_object_get CONFTEST: drm_dev_put CONFTEST: is_export_symbol_gpl_of_node_to_nid CONFTEST: is_export_symbol_present_swiotlb_map_sg_attrs CONFTEST: is_export_symbol_present_swiotlb_dma_ops CONFTEST: i2c_adapter CONFTEST: pm_message_t CONFTEST: irq_handler_t CONFTEST: acpi_device_ops CONFTEST: acpi_op_remove CONFTEST: outer_flush_all CONFTEST: proc_dir_entry CONFTEST: scatterlist CONFTEST: sg_table CONFTEST: file_operations CONFTEST: vm_operations_struct CONFTEST: atomic_long_type CONFTEST: file_inode CONFTEST: task_struct CONFTEST: kuid_t CONFTEST: dma_ops CONFTEST: swiotlb_dma_ops CONFTEST: dma_map_ops CONFTEST: noncoherent_swiotlb_dma_ops CONFTEST: vm_fault_present CONFTEST: vm_fault_has_address CONFTEST: backlight_properties_type CONFTEST: vmbus_channel_has_ringbuffer_page CONFTEST: kmem_cache_has_kobj_remove_work CONFTEST: sysfs_slab_unlink CONFTEST: fault_flags CONFTEST: atomic64_type CONFTEST: address_space CONFTEST: backing_dev_info CONFTEST: mm_context_t CONFTEST: vm_ops_fault_removed_vma_arg CONFTEST: node_states_n_memory CONFTEST: drm_bus_present CONFTEST: drm_bus_has_bus_type CONFTEST: drm_bus_has_get_irq CONFTEST: drm_bus_has_get_name CONFTEST: drm_driver_has_legacy_dev_list CONFTEST: drm_driver_has_set_busid CONFTEST: drm_crtc_state_has_connectors_changed CONFTEST: drm_init_function_args CONFTEST: drm_mode_connector_list_update_has_merge_type_bits_arg CONFTEST: drm_helper_mode_fill_fb_struct CONFTEST: drm_master_drop_has_from_release_arg CONFTEST: drm_driver_unload_has_int_return_type CONFTEST: kref_has_refcount_of_type_refcount_t CONFTEST: drm_atomic_helper_crtc_destroy_state_has_crtc_arg CONFTEST: drm_crtc_helper_funcs_has_atomic_enable CONFTEST: drm_mode_object_find_has_file_priv_arg CONFTEST: dma_buf_owner CONFTEST: drm_connector_list_iter CONFTEST: drm_atomic_helper_swap_state_has_stall_arg CONFTEST: drm_driver_prime_flag_present CONFTEST: dom0_kernel_present CONFTEST: nvidia_vgpu_hyperv_available CONFTEST: nvidia_vgpu_kvm_build CONFTEST: nvidia_grid_build CONFTEST: drm_available CONFTEST: drm_atomic_available CONFTEST: is_export_symbol_gpl_refcount_inc CONFTEST: is_export_symbol_gpl_refcount_dec_and_test CC [M] /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv-frontend.o CC [M] /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv-instance.o CC [M] /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv.o CC [M] /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv-acpi.o /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv.c: In function 'nvidia_probe': /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv.c:4129:5: error: implicit declaration of function 'vga_tryget'; did you mean 'vga_get'? [-Werror=implicit-function-declaration] vga_tryget(VGA_DEFAULT_DEVICE, VGA_RSRC_LEGACY_MASK); ^~~~~~~~~~ vga_get CC [M] /tmp/selfgz9906/NVIDIA-Linux-x86_64-418.113/kernel/nvidia/nv-chrdev.o 下面还有其他错误,如drm/drmP.h: No such file or directory, 'NULL' undeclared,field 'base' has incomplete type等等。针对第一个错误,查看本机的vgaar.h 文件内容,cat /usr/src/kernels/4.18.0-305.19.1.el8_4.x86_64/include/linux/vgaarb.h #ifndef LINUX_VGA_H #define LINUX_VGA_H #include /* Legacy VGA regions */ #define VGA_RSRC_NONE 0x00 #define VGA_RSRC_LEGACY_IO 0x01 #define VGA_RSRC_LEGACY_MEM 0x02 #define VGA_RSRC_LEGACY_MASK (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM) /* Non-legacy access */ #define VGA_RSRC_NORMAL_IO 0x04 #define VGA_RSRC_NORMAL_MEM 0x08 /* Passing that instead of a pci_dev to use the system "default" * device, that is the one used by vgacon. Archs will probably * have to provide their own vga_default_device(); */ #define VGA_DEFAULT_DEVICE (NULL) struct pci_dev; /* For use by clients */ /** * vga_set_legacy_decoding * * @pdev: pci device of the VGA card * @decodes: bit mask of what legacy regions the card decodes * * Indicates to the arbiter if the card decodes legacy VGA IOs, * legacy VGA Memory, both, or none. All cards default to both, * the card driver (fbdev for example) should tell the arbiter * if it has disabled legacy decoding, so the card can be left * out of the arbitration process (and can be safe to take * interrupts at any time. */ #if defined(CONFIG_VGA_ARB) extern void vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes); #else static inline void vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes) { }; #endif #if defined(CONFIG_VGA_ARB) extern int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible); #else static inline int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible) { return 0; } #endif /** * vga_get_interruptible * @pdev: pci device of the VGA card or NULL for the system default * @rsrc: bit mask of resources to acquire and lock * * Shortcut to vga_get with interruptible set to true. * * On success, release the VGA resource again with vga_put(). */ static inline int vga_get_interruptible(struct pci_dev *pdev, unsigned int rsrc) { return vga_get(pdev, rsrc, 1); } /** * vga_get_uninterruptible - shortcut to vga_get() * @pdev: pci device of the VGA card or NULL for the system default * @rsrc: bit mask of resources to acquire and lock * * Shortcut to vga_get with interruptible set to false. * * On success, release the VGA resource again with vga_put(). */ static inline int vga_get_uninterruptible(struct pci_dev *pdev, unsigned int rsrc) { return vga_get(pdev, rsrc, 0); } #if defined(CONFIG_VGA_ARB) extern void vga_put(struct pci_dev *pdev, unsigned int rsrc); #else #define vga_put(pdev, rsrc) #endif #ifdef CONFIG_VGA_ARB extern struct pci_dev *vga_default_device(void); extern void vga_set_default_device(struct pci_dev *pdev); extern int vga_remove_vgacon(struct pci_dev *pdev); #else static inline struct pci_dev *vga_default_device(void) { return NULL; }; static inline void vga_set_default_device(struct pci_dev *pdev) { }; static inline int vga_remove_vgacon(struct pci_dev *pdev) { return 0; }; #endif /* * Architectures should define this if they have several * independent PCI domains that can afford concurrent VGA * decoding */ #ifndef __ARCH_HAS_VGA_CONFLICT static inline int vga_conflicts(struct pci_dev *p1, struct pci_dev *p2) { return 1; } #endif #if defined(CONFIG_VGA_ARB) int vga_client_register(struct pci_dev *pdev, void *cookie, void (*irq_set_state)(void *cookie, bool state), unsigned int (*set_vga_decode)(void *cookie, bool state)); #else static inline int vga_client_register(struct pci_dev *pdev, void *cookie, void (*irq_set_state)(void *cookie, bool state), unsigned int (*set_vga_decode)(void *cookie, bool state)) { return 0; } #endif #endif /* LINUX_VGA_H */ 可以看出,本机头文件vgaarb.h 有vga_get函数,但没有vga_tryget函数,所以报错。 3.4.2 尝试安装其他版本,如418.56,安装后查看日志 error: "NV_BUILD_MODULE_INSTANCES" is not defined,也有vga_get、vga_tryget错误 3.4.3从官网寻找其他版本 选择传统GPU超级新版本390.144,发布日期2021.7.20。安装,出现提示“Install NVIDIA's 32-bit compatibility libraries?”,选择‘yes’,继续安装,安装成功 [root@localhost ***]# nvidia-smi Mon Sep 27 21:07:30 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.144 Driver Version: 390.144 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730M Off | 00000000:04:00.0 N/A | N/A | | N/A 44C P8 N/A / N/A | 0MiB / 2004MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ 390.144版本可以安装成功,但418.113不行,可能内核版本与驱动版本不匹配,等待大神解答 3.4.4 390.144版本虽然已经安装成功,但可能版本较低,与正在运行的cuda版本不符,出现too old,决定卸载原版本,安装470.74版本,安装过程同390.144一样,以下是470.74的输出 root@localhost ~]$ nvidia-smi Wed Sep 29 21:40:08 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:04:00.0 N/A | N/A | | N/A 44C P8 N/A / N/A | 3MiB / 2004MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ 可以看出cuda版本11.4 4 后记 nvidia 驱动卸载 ./NVIDIA-Linux-x86_64-470.74.run --uninstall ./NVIDIA-Linux-x86_64-470.74.run --update //更新 然后nvidia-smi 提示找不到命令,说明卸载成功 附nvidia驱动和版本对应关系Release Notes :: CUDA Toolkit Documentation 参考资料: 1 centos 7 禁用nouveau驱动.https://blog.csdn.net/qq_37296212/article/details/114265216 2 ERROR: Unable to find the kernel source tree for the currently running kernel – CentOS / RHEL / AlmaLinux https://linuxconfig.org/error-unable-to-find-the-kernel-source-tree-for-the-currently-running-kernel-centos-rhel3 centos7.5英伟达驱动:unable to find the kernel source tree for current running kernel;nvidia-smi has faild https://blog.csdn.net/HaixWang/article/details/90408538 4成功解决 ERROR: An error occurred while performing the step: “Building kernel modules“. See /var/log/nv_一个处女座的程序猿-CSDN博客5 安装NVIDIA显卡驱动报错:An error occurred while performing the step: “Building kernel modules”_muli-CSDN博客 6 centos7-内核版本降级_weixin_33842328的博客-CSDN博客 8 redhat - Nvidia driver installation on RHEL 8 - Stack Overflow |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |