Skip to content
  • Mukul Joshi's avatar
    drm/amdkfd: Fix circular locking dependency warning · d69fd951
    Mukul Joshi authored
    
    
    [  150.887733] ======================================================
    [  150.893903] WARNING: possible circular locking dependency detected
    [  150.905917] ------------------------------------------------------
    [  150.912129] kfdtest/4081 is trying to acquire lock:
    [  150.917002] ffff8f7f3762e118 (&mm->mmap_sem#2){++++}, at:
                                     __might_fault+0x3e/0x90
    [  150.924490]
                   but task is already holding lock:
    [  150.930320] ffff8f7f49d229e8 (&dqm->lock_hidden){+.+.}, at:
                                    destroy_queue_cpsch+0x29/0x210 [amdgpu]
    [  150.939432]
                   which lock already depends on the new lock.
    
    [  150.947603]
                   the existing dependency chain (in reverse order) is:
    [  150.955074]
                   -> #3 (&dqm->lock_hidden){+.+.}:
    [  150.960822]        __mutex_lock+0xa1/0x9f0
    [  150.964996]        evict_process_queues_cpsch+0x22/0x120 [amdgpu]
    [  150.971155]        kfd_process_evict_queues+0x3b/0xc0 [amdgpu]
    [  150.977054]        kgd2kfd_quiesce_mm+0x25/0x60 [amdgpu]
    [  150.982442]        amdgpu_amdkfd_evict_userptr+0x35/0x70 [amdgpu]
    [  150.988615]        amdgpu_mn_invalidate_hsa+0x41/0x60 [amdgpu]
    [  150.994448]        __mmu_notifier_invalidate_range_start+0xa4/0x240
    [  151.000714]        copy_page_range+0xd70/0xd80
    [  151.005159]        dup_mm+0x3ca/0x550
    [  151.008816]        copy_process+0x1bdc/0x1c70
    [  151.013183]        _do_fork+0x76/0x6c0
    [  151.016929]        __x64_sys_clone+0x8c/0xb0
    [  151.021201]        do_syscall_64+0x4a/0x1d0
    [  151.025404]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  151.030977]
                   -> #2 (&adev->notifier_lock){+.+.}:
    [  151.036993]        __mutex_lock+0xa1/0x9f0
    [  151.041168]        amdgpu_mn_invalidate_hsa+0x30/0x60 [amdgpu]
    [  151.047019]        __mmu_notifier_invalidate_range_start+0xa4/0x240
    [  151.053277]        copy_page_range+0xd70/0xd80
    [  151.057722]        dup_mm+0x3ca/0x550
    [  151.061388]        copy_process+0x1bdc/0x1c70
    [  151.065748]        _do_fork+0x76/0x6c0
    [  151.069499]        __x64_sys_clone+0x8c/0xb0
    [  151.073765]        do_syscall_64+0x4a/0x1d0
    [  151.077952]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  151.083523]
                   -> #1 (mmu_notifier_invalidate_range_start){+.+.}:
    [  151.090833]        change_protection+0x802/0xab0
    [  151.095448]        mprotect_fixup+0x187/0x2d0
    [  151.099801]        setup_arg_pages+0x124/0x250
    [  151.104251]        load_elf_binary+0x3a4/0x1464
    [  151.108781]        search_binary_handler+0x6c/0x210
    [  151.113656]        __do_execve_file.isra.40+0x7f7/0xa50
    [  151.118875]        do_execve+0x21/0x30
    [  151.122632]        call_usermodehelper_exec_async+0x17e/0x190
    [  151.128393]        ret_from_fork+0x24/0x30
    [  151.132489]
                   -> #0 (&mm->mmap_sem#2){++++}:
    [  151.138064]        __lock_acquire+0x11a1/0x1490
    [  151.142597]        lock_acquire+0x90/0x180
    [  151.146694]        __might_fault+0x68/0x90
    [  151.150879]        read_sdma_queue_counter+0x5f/0xb0 [amdgpu]
    [  151.156693]        update_sdma_queue_past_activity_stats+0x3b/0x90 [amdgpu]
    [  151.163725]        destroy_queue_cpsch+0x1ae/0x210 [amdgpu]
    [  151.169373]        pqm_destroy_queue+0xf0/0x250 [amdgpu]
    [  151.174762]        kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
    [  151.180577]        kfd_ioctl+0x223/0x400 [amdgpu]
    [  151.185284]        ksys_ioctl+0x8f/0xb0
    [  151.189118]        __x64_sys_ioctl+0x16/0x20
    [  151.193389]        do_syscall_64+0x4a/0x1d0
    [  151.197569]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  151.203141]
                   other info that might help us debug this:
    
    [  151.211140] Chain exists of:
                     &mm->mmap_sem#2 --> &adev->notifier_lock --> &dqm->lock_hidden
    
    [  151.222535]  Possible unsafe locking scenario:
    
    [  151.228447]        CPU0                    CPU1
    [  151.232971]        ----                    ----
    [  151.237502]   lock(&dqm->lock_hidden);
    [  151.241254]                                lock(&adev->notifier_lock);
    [  151.247774]                                lock(&dqm->lock_hidden);
    [  151.254038]   lock(&mm->mmap_sem#2);
    
    This commit fixes the warning by ensuring get_user() is not called
    while reading SDMA stats with dqm_lock held as get_user() could cause a
    page fault which leads to the circular locking scenario.
    
    Signed-off-by: default avatarMukul Joshi <mukul.joshi@amd.com>
    Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    d69fd951