1. 01 Nov, 2020 40 commits
    • Heiner Kallweit's avatar
      r8169: fix issue with forced threading in combination with shared interrupts · fa67cc69
      Heiner Kallweit authored
      [ Upstream commit 2734a24e6e5d18522fbf599135c59b82ec9b2c9e ]
      
      As reported by Serge flag IRQF_NO_THREAD causes an error if the
      interrupt is actually shared and the other driver(s) don't have this
      flag set. This situation can occur if a PCI(e) legacy interrupt is
      used in combination with forced threading.
      There's no good way to deal with this properly, therefore we have to
      remove flag IRQF_NO_THREAD. For fixing the original forced threading
      issue switch to napi_schedule().
      
      Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading")
      Link: https://www.spinics.net/lists/netdev/msg694960.html
      
      Reported-by: default avatarSerge Belyshev <belyshev@depni.sinp.msu.ru>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Tested-by: default avatarSerge Belyshev <belyshev@depni.sinp.msu.ru>
      Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa67cc69
    • Guillaume Nault's avatar
      net/sched: act_mpls: Add softdep on mpls_gso.ko · 62d9cec6
      Guillaume Nault authored
      TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso
      packets. Such packets will thus require mpls_gso.ko for segmentation.
      
      v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and
          David).
      
      Fixes: 2a2ea508
      
       ("net: sched: add mpls manipulation actions to TC")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.1603708007.git.gnault@redhat.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62d9cec6
    • Alex Elder's avatar
      net: ipa: command payloads already mapped · 2bc5d5c3
      Alex Elder authored
      [ Upstream commit df833050cced27e1b343cc8bc41f90191b289334 ]
      
      IPA transactions describe actions to be performed by the IPA
      hardware.  Three cases use IPA transactions:  transmitting a socket
      buffer; providing a page to receive packet data; and issuing an IPA
      immediate command.  An IPA transaction contains a scatter/gather
      list (SGL) to hold the set of actions to be performed.
      
      We map buffers in the SGL for DMA at the time they are added to the
      transaction.  For skb TX transactions, we fill the SGL with a call
      to skb_to_sgvec().  Page RX transactions involve a single page
      pointer, and that is recorded in the SGL with sg_set_page().  In
      both of these cases we then map the SGL for DMA with a call to
      dma_map_sg().
      
      Immediate commands are different.  The payload for an immediate
      command comes from a region of coherent DMA memory, which must
      *not* be mapped for DMA.  For that reason, gsi_trans_cmd_add()
      sort of hand-crafts each SGL entry added to a command transaction.
      ...
      2bc5d5c3
    • Zenghui Yu's avatar
      net: hns3: Clear the CMDQ registers before unmapping BAR region · 1336d288
      Zenghui Yu authored
      [ Upstream commit e3364c5ff3ff975b943a7bf47e21a2a4bf20f3fe ]
      
      When unbinding the hns3 driver with the HNS3 VF, I got the following
      kernel panic:
      
      [  265.709989] Unable to handle kernel paging request at virtual address ffff800054627000
      [  265.717928] Mem abort info:
      [  265.720740]   ESR = 0x96000047
      [  265.723810]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  265.729126]   SET = 0, FnV = 0
      [  265.732195]   EA = 0, S1PTW = 0
      [  265.735351] Data abort info:
      [  265.738227]   ISV = 0, ISS = 0x00000047
      [  265.742071]   CM = 0, WnR = 1
      [  265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000
      [  265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000
      [  265.764314] Internal error: Oops: 96000047 [#1] SMP
      [  265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206
      [  265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
      [  265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
      [  265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300
      [  265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300
      [  265.861992] sp : ffff80004c983b50
      [  265.881411] pmr_save: 000000e0
      [  265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500
      [  265.889744] x27: 0000000000000000 x26: 0000000000000000
      [  265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90
      [  265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150
      [  265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90
      [  265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004
      [  265.916198] x17: 0000000000000000 x16: ffff800011ae32e8
      [  265.916201] x15: 0000000000000028 x14: 0000000000000002
      [  265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8
      [  265.946619] x11: ffff80004c983b50 x10: 0000000000000000
      [  265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000
      [  265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff
      [  265.951917] x5 : ffff80004c983930 x4 : 0000000000000001
      [  265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100
      [  265.964487] x1 : 0000000000000000 x0 : ffff800054627000
      [  265.964491] Call trace:
      [  265.964494]  hclgevf_cmd_uninit+0xbc/0x300
      [  265.964496]  hclgevf_uninit_ae_dev+0x9c/0xe8
      [  265.964501]  hnae3_unregister_ae_dev+0xb0/0x130
      [  265.964516]  hns3_remove+0x34/0x88 [hns3]
      [  266.009683]  pci_device_remove+0x48/0xf0
      [  266.009692]  device_release_driver_internal+0x114/0x1e8
      [  266.030058]  device_driver_detach+0x28/0x38
      [  266.034224]  unbind_store+0xd4/0x108
      [  266.037784]  drv_attr_store+0x40/0x58
      [  266.041435]  sysfs_kf_write+0x54/0x80
      [  266.045081]  kernfs_fop_write+0x12c/0x250
      [  266.049076]  vfs_write+0xc4/0x248
      [  266.052378]  ksys_write+0x74/0xf8
      [  266.055677]  __arm64_sys_write+0x24/0x30
      [  266.059584]  el0_svc_common.constprop.3+0x84/0x270
      [  266.064354]  do_el0_svc+0x34/0xa0
      [  266.067658]  el0_svc+0x38/0x40
      [  266.070700]  el0_sync_handler+0x8c/0xb0
      [  266.074519]  el0_sync+0x140/0x180
      
      It looks like the BAR memory region had already been unmapped before we
      start clearing CMDQ registers in it, which is pretty bad and the kernel
      happily kills itself because of a Current EL Data Abort (on arm64).
      
      Moving the CMDQ uninitialization a bit early fixes the issue for me.
      
      Fixes: 862d969a
      
       ("net: hns3: do VF's pci re-initialization while PF doing FLR")
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1336d288
    • Aleksandr Nogikh's avatar
      netem: fix zero division in tabledist · 7fb8fbce
      Aleksandr Nogikh authored
      [ Upstream commit eadd1befdd778a1eca57fad058782bd22b4db804 ]
      
      Currently it is possible to craft a special netlink RTM_NEWQDISC
      command that can result in jitter being equal to 0x80000000. It is
      enough to set the 32 bit jitter to 0x02000000 (it will later be
      multiplied by 2^6) or just set the 64 bit jitter via
      TCA_NETEM_JITTER64. This causes an overflow during the generation of
      uniformly distributed numbers in tabledist(), which in turn leads to
      division by zero (sigma != 0, but sigma * 2 is 0).
      
      The related fragment of code needs 32-bit division - see commit
      9b0ed891 ("netem: remove unnecessary 64 bit modulus"), so switching to
      64 bit is not an option.
      
      Fix the issue by keeping the value of jitter within the range that can
      be adequately handled by tabledist() - [0;INT_MAX]. As negative std
      deviation makes no sense, take the absolute value of the passed value
      and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
      arithmetic in order to prevent overflows.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarAleksandr Nogikh <nogikh@google.com>
      Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fb8fbce
    • Ido Schimmel's avatar
      mlxsw: core: Fix memory leak on module removal · 25259932
      Ido Schimmel authored
      [ Upstream commit adc80b6cfedff6dad8b93d46a5ea2775fd5af9ec ]
      
      Free the devlink instance during the teardown sequence in the non-reload
      case to avoid the following memory leak.
      
      unreferenced object 0xffff888232895000 (size 2048):
        comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s)
        hex dump (first 32 bytes):
          00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de  ........".......
          10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff  .P.2.....P.2....
        backtrace:
          [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0
          [<00000000ee30129d>] devlink_alloc+0xff/0x760
          [<0000000092ab3e5d>] 0xffffffffa042e5b0
          [<000000004f3f8a31>] 0xffffffffa042f6ad
          [<0000000092800b4b>] 0xffffffffa0491df3
          [<00000000c4843903>] local_pci_probe+0xcb/0x170
          [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0
          [<00000000a8e0de75>] really_probe+0x2c5/0xf90
          [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340
          [<00000000bcc95e05>] device_driver_attach+0x294/0x300
          [<000000000e2bc177>] __driver_attach+0x167/0x2f0
          [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0
          [<000000003cd5a91e>] driver_attach+0x45/0x60
          [<000000000041ce51>] bus_add_driver+0x3b8/0x720
          [<00000000f5215476>] driver_register+0x230/0x4e0
          [<00000000d79356f5>] __pci_register_driver+0x190/0x200
      
      Fixes: a22712a9
      
       ("mlxsw: core: Fix devlink unregister flow")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reported-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Tested-by: default avatarOleksandr Shamray <oleksandrs@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25259932
    • Lijun Pan's avatar
      ibmvnic: fix ibmvnic_set_mac · d6f6e3f9
      Lijun Pan authored
      [ Upstream commit 8fc3672a8ad3e782bac80e979bc2a2c10960cbe9 ]
      
      Jakub Kicinski brought up a concern in ibmvnic_set_mac().
      ibmvnic_set_mac() does this:
      
      	ether_addr_copy(adapter->mac_addr, addr->sa_data);
      	if (adapter->state != VNIC_PROBED)
      		rc = __ibmvnic_set_mac(netdev, addr->sa_data);
      
      So if state == VNIC_PROBED, the user can assign an invalid address to
      adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
      
      The fix is to validate ethernet address at the beginning of
      ibmvnic_set_mac(), and move the ether_addr_copy to
      the case of "adapter->state != VNIC_PROBED".
      
      Fixes: c26eba03
      
       ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: default avatarLijun Pan <ljp@linux.ibm.com>
      Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6f6e3f9
    • Thomas Bogendoerfer's avatar
      ibmveth: Fix use of ibmveth in a bridge. · 4606d351
      Thomas Bogendoerfer authored
      [ Upstream commit 2ac8af0967aaa2b67cb382727e784900d2f4d0da ]
      
      The check for src mac address in ibmveth_is_packet_unsupported is wrong.
      Commit 6f227543 wanted to shut down messages for loopback packets,
      but now suppresses bridged frames, which are accepted by the hypervisor
      otherwise bridging won't work at all.
      
      Fixes: 6f227543
      
       ("ibmveth: Detect unsupported packets before sending to the hypervisor")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4606d351
    • Masahiro Fujiwara's avatar
      gtp: fix an use-before-init in gtp_newlink() · b520e574
      Masahiro Fujiwara authored
      [ Upstream commit 51467431200b91682b89d31317e35dcbca1469ce ]
      
      *_pdp_find() from gtp_encap_recv() would trigger a crash when a peer
      sends GTP packets while creating new GTP device.
      
      RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp]
      <SNIP>
      Call Trace:
       <IRQ>
       gtp_encap_recv+0xc2/0x2e0 [gtp]
       ? gtp1_pdp_find.isra.0+0x90/0x90 [gtp]
       udp_queue_rcv_one_skb+0x1fe/0x530
       udp_queue_rcv_skb+0x40/0x1b0
       udp_unicast_rcv_skb.isra.0+0x78/0x90
       __udp4_lib_rcv+0x5af/0xc70
       udp_rcv+0x1a/0x20
       ip_protocol_deliver_rcu+0xc5/0x1b0
       ip_local_deliver_finish+0x48/0x50
       ip_local_deliver+0xe5/0xf0
       ? ip_protocol_deliver_rcu+0x1b0/0x1b0
      
      gtp_encap_enable() should be called after gtp_hastable_new() otherwise
      *_pdp_find() will access the uninitialized hash table.
      
      Fixes: 1e3a3abd
      
       ("gtp: make GTP sockets in gtp_newlink optional")
      Signed-off-by: default avatarMasahiro Fujiwara <fujiwara.masahiro@gmail.com>
      Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com
      Signed-off-by: Ja...
      b520e574
    • Raju Rangoju's avatar
      cxgb4: set up filter action after rewrites · 9921e777
      Raju Rangoju authored
      [ Upstream commit 937d8420588421eaa5c7aa5c79b26b42abb288ef ]
      
      The current code sets up the filter action field before
      rewrites are set up. When the action 'switch' is used
      with rewrites, this may result in initial few packets
      that get switched out don't have rewrites applied
      on them.
      
      So, make sure filter action is set up along with rewrites
      or only after everything else is set up for rewrites.
      
      Fixes: 12b276fb
      
       ("cxgb4: add support to create hash filters")
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9921e777
    • Vinay Kumar Yadav's avatar
      chelsio/chtls: fix tls record info to user · b97638e0
      Vinay Kumar Yadav authored
      [ Upstream commit 4f3391ce8f5a69e7e6d66d0a3fc654eb6dbdc919 ]
      
      chtls_pt_recvmsg() receives a skb with tls header and subsequent
      skb with data, need to finalize the data copy whenever next skb
      with tls header is available. but here current tls header is
      overwritten by next available tls header, ends up corrupting
      user buffer data. fixing it by finalizing current record whenever
      next skb contains tls header.
      
      v1->v2:
      - Improved commit message.
      
      Fixes: 17a7d24a
      
       ("crypto: chtls - generic handling of data and hdr")
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b97638e0
    • Vinay Kumar Yadav's avatar
      chelsio/chtls: fix memory leaks in CPL handlers · eb592f2a
      Vinay Kumar Yadav authored
      [ Upstream commit 6daa1da4e262b0cd52ef0acc1989ff22b5540264 ]
      
      CPL handler functions chtls_pass_open_rpl() and
      chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
      so that caller function will do skb free to avoid leak.
      
      Fixes: cc35c88a
      
       ("crypto : chtls - CPL handler definition")
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb592f2a
    • Vinay Kumar Yadav's avatar
      chelsio/chtls: fix deadlock issue · c3208dec
      Vinay Kumar Yadav authored
      [ Upstream commit 28e9dcd9172028263c8225c15c4e329e08475e89 ]
      
      In chtls_pass_establish() we hold child socket lock using bh_lock_sock
      and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
      Remove bh_lock_sock in add_to_reap_list() as lock is already held.
      
      Fixes: cc35c88a
      
       ("crypto : chtls - CPL handler definition")
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3208dec
    • Vasundhara Volam's avatar
      bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally. · b334112f
      Vasundhara Volam authored
      
      
      [ Upstream commit 825741b071722f1c8ad692cead562c4b5f5eaa93 ]
      
      In the AER or firmware reset flow, if we are in fatal error state or
      if pci_channel_offline() is true, we don't send any commands to the
      firmware because the commands will likely not reach the firmware and
      most commands don't matter much because the firmware is likely to be
      reset imminently.
      
      However, the HWRM_FUNC_RESET command is different and we should always
      attempt to send it.  In the AER flow for example, the .slot_reset()
      call will trigger this fw command and we need to try to send it to
      effect the proper reset.
      
      Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b334112f
    • Vasundhara Volam's avatar
      bnxt_en: Re-write PCI BARs after PCI fatal error. · f739fc7e
      Vasundhara Volam authored
      [ Upstream commit f75d9a0aa96721d20011cd5f8c7a24eb32728589 ]
      
      When a PCIe fatal error occurs, the internal latched BAR addresses
      in the chip get reset even though the BAR register values in config
      space are retained.
      
      pci_restore_state() will not rewrite the BAR addresses if the
      BAR address values are valid, causing the chip's internal BAR addresses
      to stay invalid.  So we need to zero the BAR registers during PCIe fatal
      error to force pci_restore_state() to restore the BAR addresses.  These
      write cycles to the BAR registers will cause the proper BAR addresses to
      latch internally.
      
      Fixes: 6316ea6d
      
       ("bnxt_en: Enable AER support.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f739fc7e
    • Vasundhara Volam's avatar
      bnxt_en: Invoke cancel_delayed_work_sync() for PFs also. · 7fe9514c
      Vasundhara Volam authored
      
      
      [ Upstream commit 631ce27a3006fc0b732bfd589c6df505f62eadd9 ]
      
      As part of the commit b148bb238c02
      ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
      cancel_delayed_work_sync() is called only for VFs to fix a possible
      crash by cancelling any pending delayed work items. It was assumed
      by mistake that the flush_workqueue() call on the PF would flush
      delayed work items as well.
      
      As flush_workqueue() does not cancel the delayed workqueue, extend
      the fix for PFs. This fix will avoid the system crash, if there are
      any pending delayed work items in fw_reset_task() during driver's
      .remove() call.
      
      Unify the workqueue cleanup logic for both PF and VF by calling
      cancel_work_sync() and cancel_delayed_work_sync() directly in
      bnxt_remove_one().
      
      Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fe9514c
    • Vasundhara Volam's avatar
      bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one(). · bfbbfb50
      Vasundhara Volam authored
      
      
      [ Upstream commit 21d6a11e2cadfb8446265a3efff0e2aad206e15e ]
      
      A recent patch has moved the workqueue cleanup logic before
      calling unregister_netdev() in bnxt_remove_one().  This caused a
      regression because the workqueue can be restarted if the device is
      still open.  Workqueue cleanup must be done after unregister_netdev().
      The workqueue will not restart itself after the device is closed.
      
      Call bnxt_cancel_sp_work() after unregister_netdev() and
      call bnxt_dl_fw_reporters_destroy() after that.  This fixes the
      regession and the original NULL ptr dereference issue.
      
      Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfbbfb50
    • Michael Chan's avatar
      bnxt_en: Check abort error state in bnxt_open_nic(). · 0b17de4d
      Michael Chan authored
      [ Upstream commit a1301f08c5acf992d9c1fafddc84c3a822844b04 ]
      
      bnxt_open_nic() is called during configuration changes that require
      the NIC to be closed and then opened.  This call is protected by
      rtnl_lock.  Firmware reset can be happening at the same time.  Only
      critical portions of the entire firmware reset sequence are protected
      by the rtnl_lock.  It is possible that bnxt_open_nic() can be called
      when the firmware reset sequence is aborting.  In that case,
      bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
      abort if it is.  The configuration change that resulted in the
      bnxt_open_nic() call will fail but the NIC will be brought to a
      consistent IF_DOWN state.
      
      Without this patch, if bnxt_open_nic() were to continue in this error
      state, it may crash like this:
      
      [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
      [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0
      [ 1648.659813] Oops: 0000 [#1] SMP
      [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common
      [ 1648.660063]  tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse
      [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G           OE  ------------   3.10.0-1152.el7.x86_64 #1
      [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020
      [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000
      [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>]  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
      [ 1648.663171] RSP: 0018:ffff94f55df1fba8  EFLAGS: 00010202
      [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000
      [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0
      [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008
      [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0
      [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40
      [ 1648.667695] FS:  00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000
      [ 1648.668447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0
      [ 1648.669966] Call Trace:
      [ 1648.670730]  [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en]
      [ 1648.671496]  [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en]
      [ 1648.672263]  [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en]
      [ 1648.673031]  [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
      [ 1648.673793]  [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en]
      [ 1648.674550]  [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0
      [ 1648.675306]  [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0
      [ 1648.676061]  [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60
      [ 1648.676810]  [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0
      [ 1648.677548]  [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0
      [ 1648.678282]  [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500
      [ 1648.679016]  [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0
      [ 1648.679745]  [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a
      [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7
      [ 1648.681986] RIP  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
      [ 1648.682724]  RSP <ffff94f55df1fba8>
      [ 1648.683451] CR2: 0000000000000000
      
      Fixes: ec5d31e3
      
       ("bnxt_en: Handle firmware reset status during IF_UP.")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b17de4d
    • Michael Schaller's avatar
      efivarfs: Replace invalid slashes with exclamation marks in dentries. · c328793e
      Michael Schaller authored
      
      
      commit 336af6a4686d885a067ecea8c3c3dd129ba4fc75 upstream.
      
      Without this patch efivarfs_alloc_dentry creates dentries with slashes in
      their name if the respective EFI variable has slashes in its name. This in
      turn causes EIO on getdents64, which prevents a complete directory listing
      of /sys/firmware/efi/efivars/.
      
      This patch replaces the invalid shlashes with exclamation marks like
      kobject_set_name_vargs does for /sys/firmware/efi/vars/ to have consistently
      named dentries under /sys/firmware/efi/vars/ and /sys/firmware/efi/efivars/.
      Signed-off-by: default avatarMichael Schaller <misch@google.com>
      Link: https://lore.kernel.org/r/20200925074502.150448-1-misch@google.com
      
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c328793e
    • Dan Williams's avatar
      x86/copy_mc: Introduce copy_mc_enhanced_fast_string() · 61ececc8
      Dan Williams authored
      commit 5da8e4a658109e3b7e1f45ae672b7c06ac3e7158 upstream.
      
      The motivations to go rework memcpy_mcsafe() are that the benefit of
      doing slow and careful copies is obviated on newer CPUs, and that the
      current opt-in list of CPUs to instrument recovery is broken relative to
      those CPUs.  There is no need to keep an opt-in list up to date on an
      ongoing basis if pmem/dax operations are instrumented for recovery by
      default. With recovery enabled by default the old "mcsafe_key" opt-in to
      careful copying can be made a "fragile" opt-out. Where the "fragile"
      list takes steps to not consume poison across cachelines.
      
      The discussion with Linus made clear that the current "_mcsafe" suffix
      was imprecise to a fault. The operations that are needed by pmem/dax are
      to copy from a source address that might throw #MC to a destination that
      may write-fault, if it is a user page.
      
      So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
      the separate precautions taken on source and destination.
      copy_mc_to_kernel() is introduced as a non-SMAP version that does not
      expect write-faults on the destination, but is still prepared to abort
      with an error code upon taking #MC.
      
      The original copy_mc_fragile() implementation had negative performance
      implications since it did not use the fast-string instruction sequence
      to perform copies. For this reason copy_mc_to_kernel() fell back to
      plain memcpy() to preserve performance on platforms that did not indicate
      the capability to recover from machine check exceptions. However, that
      capability detection was not architectural and now that some platforms
      can recover from fast-string consumption of memory errors the memcpy()
      fallback now causes these more capable platforms to fail.
      
      Introduce copy_mc_enhanced_fast_string() as the fast default
      implementation of copy_mc_to_kernel() and finalize the transition of
      copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
      With this in place, copy_mc_to_kernel() is fast and recovery-ready by
      default regardless of hardware capability.
      
      Thanks to Vivek for identifying that copy_user_generic() is not suitable
      as the copy_mc_to_user() backend since the #MC handler explicitly checks
      ex_has_fault_handler(). Thanks to the 0day robot for catching a
      performance bug in the x86/copy_mc_to_user implementation.
      
       [ bp: Add the "why" for this change from the 0/2th message, massage. ]
      
      Fixes: 92b0729c
      
       ("x86/mm, x86/mce: Add memcpy_mcsafe()")
      Reported-by: default avatarErwin Tsaur <erwin.tsaur@intel.com>
      Reported-by: default avatar0day robot <lkp@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Tested-by: default avatarErwin Tsaur <erwin.tsaur@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61ececc8
    • Dan Williams's avatar
      x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() · a092869e
      Dan Williams authored
      
      
      commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.
      
      In reaction to a proposal to introduce a memcpy_mcsafe_fast()
      implementation Linus points out that memcpy_mcsafe() is poorly named
      relative to communicating the scope of the interface. Specifically what
      addresses are valid to pass as source, destination, and what faults /
      exceptions are handled.
      
      Of particular concern is that even though x86 might be able to handle
      the semantics of copy_mc_to_user() with its common copy_user_generic()
      implementation other archs likely need / want an explicit path for this
      case:
      
        On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
        >
        > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
        > >
        > > However now I see that copy_user_generic() works for the wrong reason.
        > > It works because the exception on the source address due to poison
        > > looks no different than a write fault on the user address to the
        > > caller, it's still just a short copy. So it makes copy_to_user() work
        > > for the wrong reason relative to the name.
        >
        > Right.
        >
        > And it won't work that way on other architectures. On x86, we have a
        > generic function that can take faults on either side, and we use it
        > for both cases (and for the "in_user" case too), but that's an
        > artifact of the architecture oddity.
        >
        > In fact, it's probably wrong even on x86 - because it can hide bugs -
        > but writing those things is painful enough that everybody prefers
        > having just one function.
      
      Replace a single top-level memcpy_mcsafe() with either
      copy_mc_to_user(), or copy_mc_to_kernel().
      
      Introduce an x86 copy_mc_fragile() name as the rename for the
      low-level x86 implementation formerly named memcpy_mcsafe(). It is used
      as the slow / careful backend that is supplanted by a fast
      copy_mc_generic() in a follow-on patch.
      
      One side-effect of this reorganization is that separating copy_mc_64.S
      to its own file means that perf no longer needs to track dependencies
      for its memcpy_64.S benchmarks.
      
       [ bp: Massage a bit. ]
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
      Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a092869e
    • Randy Dunlap's avatar
      x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled · 18703f74
      Randy Dunlap authored
      commit 035fff1f7aab43e420e0098f0854470a5286fb83 upstream.
      
      Fix build error when CONFIG_ACPI is not set/enabled by adding the header
      file <asm/acpi.h> which contains a stub for the function in the build
      error.
      
          ../arch/x86/pci/intel_mid_pci.c: In function ‘intel_mid_pci_init’:
          ../arch/x86/pci/intel_mid_pci.c:303:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
            acpi_noirq_set();
      
      Fixes: a912a758 ("x86/platform/intel-mid: Move PCI initialization to arch_init()")
      Link: https://lore.kernel.org/r/ea903917-e51b-4cc9-2680-bc1e36efa026@infradead.org
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Reviewed-by: default avatarJesse Barnes <jsbarnes@google.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org	# v4.16+
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jesse Barnes <jsbarnes@google.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18703f74
    • Nick Desaulniers's avatar
      arm64: link with -z norelro regardless of CONFIG_RELOCATABLE · 4b0a9591
      Nick Desaulniers authored
      commit 3b92fa7485eba16b05166fddf38ab42f2ff6ab95 upstream.
      
      With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n,
      CONFIG_RELOCATABLE=n, we observe the following failure when trying to
      link the kernel image with LD=ld.lld:
      
      error: section: .exit.data is not contiguous with other relro sections
      
      ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This
      was previously fixed, but only for CONFIG_RELOCATABLE=y.
      
      Fixes: 3bbd3db8
      
       ("arm64: relocatable: fix inconsistencies in linker script and options")
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20201016175339.2429280-1-ndesaulniers@google.com
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b0a9591
    • Marc Zyngier's avatar
      arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs · dfaa0f7d
      Marc Zyngier authored
      commit 39533e12063be7f55e3d6ae21ffe067799d542a4 upstream.
      
      Commit 606f8e7b ("arm64: capabilities: Use linear array for
      detection and verification") changed the way we deal with per-CPU errata
      by only calling the .matches() callback until one CPU is found to be
      affected. At this point, .matches() stop being called, and .cpu_enable()
      will be called on all CPUs.
      
      This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
      mitigated.
      
      In order to address this, forcefully call the .matches() callback from a
      .cpu_enable() callback, which brings us back to the original behaviour.
      
      Fixes: 606f8e7b
      
       ("arm64: capabilities: Use linear array for detection and verification")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfaa0f7d
    • Marc Zyngier's avatar
      arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs · 0ccd5c2c
      Marc Zyngier authored
      commit 18fce56134c987e5b4eceddafdbe4b00c07e2ae1 upstream.
      
      Commit 73f38166 ("arm64: Advertise mitigation of Spectre-v2, or lack
      thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most
      of the enabling code to the .matches() callback.
      
      This has the unfortunate effect that the workaround gets only enabled on
      the first affected CPU, and no other.
      
      In order to address this, forcefully call the .matches() callback from a
      .cpu_enable() callback, which brings us back to the original behaviour.
      
      Fixes: 73f38166
      
       ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ccd5c2c
    • Kees Cook's avatar
      fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum · 4720b25e
      Kees Cook authored
      commit 06e67b849ab910a49a629445f43edb074153d0eb upstream.
      
      The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It
      should not be distinguished separately from just "FIRMWARE", as this
      confuses the LSMs about what is being loaded. Additionally, there was
      no actual validation of the firmware contents happening.
      
      Fixes: e4c2c0ff
      
       ("firmware: Add new platform fallback mechanism and firmware_request_platform()")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Acked-by: default avatarScott Branden <scott.branden@broadcom.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20201002173828.2099543-3-keescook@chromium.org
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4720b25e
    • Ard Biesheuvel's avatar
      efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure · 8b23af0e
      Ard Biesheuvel authored
      
      
      commit d32de9130f6c79533508e2c7879f18997bfbe2a0 upstream.
      
      Currently, on arm64, we abort on any failure from efi_get_random_bytes()
      other than EFI_NOT_FOUND when it comes to setting the physical seed for
      KASLR, but ignore such failures when obtaining the seed for virtual
      KASLR or for early seeding of the kernel's entropy pool via the config
      table. This is inconsistent, and may lead to unexpected boot failures.
      
      So let's permit any failure for the physical seed, and simply report
      the error code if it does not equal EFI_NOT_FOUND.
      
      Cc: <stable@vger.kernel.org> # v5.8+
      Reported-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b23af0e
    • Rasmus Villemoes's avatar
      scripts/setlocalversion: make git describe output more reliable · 865013fc
      Rasmus Villemoes authored
      
      
      commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.
      
      When building for an embedded target using Yocto, we're sometimes
      observing that the version string that gets built into vmlinux (and
      thus what uname -a reports) differs from the path under /lib/modules/
      where modules get installed in the rootfs, but only in the length of
      the -gabc123def suffix. Hence modprobe always fails.
      
      The problem is that Yocto has the concept of "sstate" (shared state),
      which allows different developers/buildbots/etc. to share build
      artifacts, based on a hash of all the metadata that went into building
      that artifact - and that metadata includes all dependencies (e.g. the
      compiler used etc.). That normally works quite well; usually a clean
      build (without using any sstate cache) done by one developer ends up
      being binary identical to a build done on another host. However, one
      thing that can cause two developers to end up with different builds
      [and thus make one's vmlinux package incompatible with the other's
      kernel-dev package], which is not captured by the metadata hashing, is
      this `git describe`: The output of that can be affected by
      
      (1) git version: before 2.11 git defaulted to a minimum of 7, since
      2.11 (git.git commit e6c587) the default is dynamic based on the
      number of objects in the repo
      (2) hence even if both run the same git version, the output can differ
      based on how many remotes are being tracked (or just lots of local
      development branches or plain old garbage)
      (3) and of course somebody could have a core.abbrev config setting in
      ~/.gitconfig
      
      So in order to avoid `uname -a` output relying on such random details
      of the build environment which are rather hard to ensure are
      consistent between developers and buildbots, make sure the abbreviated
      sha1 always consists of exactly 12 hex characters. That is consistent
      with the current rule for -stable patches, and is almost always enough
      to identify the head commit unambigously - in the few cases where it
      does not, the v5.4.3-00021- prefix would certainly nail it down.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      865013fc
    • Matthew Wilcox (Oracle)'s avatar
      io_uring: Convert advanced XArray uses to the normal API · 6f4c9772
      Matthew Wilcox (Oracle) authored
      
      
      commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream.
      
      There are no bugs here that I've spotted, it's just easier to use the
      normal API and there are no performance advantages to using the more
      verbose advanced API.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f4c9772
    • Matthew Wilcox (Oracle)'s avatar
      io_uring: Fix XArray usage in io_uring_add_task_file · f7b24bee
      Matthew Wilcox (Oracle) authored
      
      
      commit 236434c3438c4da3dfbd6aeeab807577b85e951a upstream.
      
      The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
      allocate memory using GFP_NOWAIT, it would leak the reference to the file
      descriptor.  Also the node pointed to by the xas could be freed between
      the call to xas_load() under the rcu_read_lock() and the acquisition of
      the xa_lock.
      
      It's easier to just use the normal xa_load/xa_store interface here.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      [axboe: fix missing assign after alloc, cur_uring -> tctx rename]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7b24bee
    • Matthew Wilcox (Oracle)'s avatar
      io_uring: Fix use of XArray in __io_uring_files_cancel · efce965a
      Matthew Wilcox (Oracle) authored
      
      
      commit ce765372bc443573d1d339a2bf4995de385dea3a upstream.
      
      We have to drop the lock during each iteration, so there's no advantage
      to using the advanced API.  Convert this to a standard xa_for_each() loop.
      
      Reported-by: syzbot+27c12725d8ff0bfe1a13@syzkaller.appspotmail.com
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efce965a
    • Jens Axboe's avatar
      io_uring: no need to call xa_destroy() on empty xarray · 5ee3fea0
      Jens Axboe authored
      
      
      commit ca6484cd308a671811bf39f3119e81966eb476e3 upstream.
      
      The kernel test robot reports this lockdep issue:
      
      [child1:659] mbind (274) returned ENOSYS, marking as inactive.
      [child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive.
      [main] 10175 iterations. [F:7781 S:2344 HI:2397]
      [   24.610601]
      [   24.610743] ================================
      [   24.611083] WARNING: inconsistent lock state
      [   24.611437] 5.9.0-rc7-00017-g0f2122045b9462 #5 Not tainted
      [   24.611861] --------------------------------
      [   24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes:
      [   24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1
      [   24.613642] {SOFTIRQ-ON-W} state was registered at:
      [   24.614024]   lock_acquire+0x20c/0x29b
      [   24.614341]   _raw_spin_lock+0x21/0x30
      [   24.614636]   io_uring_add_task_file+0xe8/0x13a
      [   24.614987]   io_uring_create+0x535/0x6bd
      [   24.615297]   io_uring_setup+0x11d/0x136
      [   24.615606]   __ia32_sys_io_uring_setup+0xd/0xf
      [   24.615977]   do_int80_syscall_32+0x53/0x6c
      [   24.616306]   restore_all_switch_stack+0x0/0xb1
      [   24.616677] irq event stamp: 939881
      [   24.616968] hardirqs last  enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145
      [   24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e
      [   24.618321] softirqs last  enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a
      [   24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61
      [   24.619521]
      [   24.619521] other info that might help us debug this:
      [   24.620028]  Possible unsafe locking scenario:
      [   24.620028]
      [   24.620492]        CPU0
      [   24.620685]        ----
      [   24.620894]   lock(&xa->xa_lock#4);
      [   24.621168]   <Interrupt>
      [   24.621381]     lock(&xa->xa_lock#4);
      [   24.621695]
      [   24.621695]  *** DEADLOCK ***
      [   24.621695]
      [   24.622154] 1 lock held by ksoftirqd/0/7:
      [   24.622468]  #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155
      [   24.623106]
      [   24.623106] stack backtrace:
      [   24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f2122045b9462 #5
      [   24.624090] Call Trace:
      [   24.624284]  ? show_stack+0x40/0x46
      [   24.624551]  dump_stack+0x1b/0x1d
      [   24.624809]  print_usage_bug+0x17a/0x185
      [   24.625142]  mark_lock+0x11d/0x1db
      [   24.625474]  ? print_shortest_lock_dependencies+0x121/0x121
      [   24.625905]  __lock_acquire+0x41e/0x7bf
      [   24.626206]  lock_acquire+0x20c/0x29b
      [   24.626517]  ? xa_destroy+0x43/0xc1
      [   24.626810]  ? lock_acquire+0x20c/0x29b
      [   24.627110]  _raw_spin_lock_irqsave+0x3e/0x4e
      [   24.627450]  ? xa_destroy+0x43/0xc1
      [   24.627725]  xa_destroy+0x43/0xc1
      [   24.627989]  __io_uring_free+0x57/0x71
      [   24.628286]  ? get_pid+0x22/0x22
      [   24.628544]  __put_task_struct+0xf2/0x163
      [   24.628865]  put_task_struct+0x1f/0x2a
      [   24.629161]  delayed_put_task_struct+0xe2/0xe9
      [   24.629509]  rcu_process_callbacks+0x128/0x155
      [   24.629860]  __do_softirq+0x1a3/0x45a
      [   24.630151]  run_ksoftirqd+0x35/0x61
      [   24.630443]  smpboot_thread_fn+0x304/0x31a
      [   24.630763]  kthread+0x124/0x139
      [   24.631016]  ? sort_range+0x18/0x18
      [   24.631290]  ? kthread_create_worker_on_cpu+0x17/0x17
      [   24.631682]  ret_from_fork+0x1c/0x28
      
      which is complaining about xa_destroy() grabbing the xa lock in an
      IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt
      safe. This is really an xarray issue, since it should not assume the
      lock type. But for our use case, since we know the xarray is empty at
      this point, there's no need to actually call xa_destroy(). So just get
      rid of it.
      
      Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ee3fea0
    • Hillf Danton's avatar
      io-wq: fix use-after-free in io_wq_worker_running · 0ca6ce23
      Hillf Danton authored
      
      
      commit c4068bf898ddaef791049a366828d9b84b467bda upstream.
      
      The smart syzbot has found a reproducer for the following issue:
      
       ==================================================================
       BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
       BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
       BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline]
       BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
       Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771
      
       CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x198/0x1fd lib/dump_stack.c:118
        print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
        __kasan_report mm/kasan/report.c:513 [inline]
        kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
        check_memory_region_inline mm/kasan/generic.c:186 [inline]
        check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
        instrument_atomic_write include/linux/instrumented.h:71 [inline]
        atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
        io_wqe_inc_running fs/io-wq.c:301 [inline]
        io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
        schedule_timeout+0x148/0x250 kernel/time/timer.c:1879
        io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580
        kthread+0x3b5/0x4a0 kernel/kthread.c:292
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
       Allocated by task 7768:
        kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
        kasan_set_track mm/kasan/common.c:56 [inline]
        __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
        kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594
        kmalloc_node include/linux/slab.h:572 [inline]
        kzalloc_node include/linux/slab.h:677 [inline]
        io_wq_create+0x57b/0xa10 fs/io-wq.c:1064
        io_init_wq_offload fs/io_uring.c:7432 [inline]
        io_sq_offload_start fs/io_uring.c:7504 [inline]
        io_uring_create fs/io_uring.c:8625 [inline]
        io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694
        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
       Freed by task 21:
        kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
        kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
        kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
        __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
        __cache_free mm/slab.c:3418 [inline]
        kfree+0x10e/0x2b0 mm/slab.c:3756
        __io_wq_destroy fs/io-wq.c:1138 [inline]
        io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146
        io_finish_async fs/io_uring.c:6836 [inline]
        io_ring_ctx_free fs/io_uring.c:7870 [inline]
        io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954
        process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
        worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
        kthread+0x3b5/0x4a0 kernel/kthread.c:292
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
       The buggy address belongs to the object at ffff8882183db000
        which belongs to the cache kmalloc-1k of size 1024
       The buggy address is located 140 bytes inside of
        1024-byte region [ffff8882183db000, ffff8882183db400)
       The buggy address belongs to the page:
       page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db
       flags: 0x57ffe0000000200(slab)
       raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700
       raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
        ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      which is down to the comment below,
      
      	/* all workers gone, wq exit can proceed */
      	if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
      		complete(&wqe->wq->done);
      
      because there might be multiple cases of wqe in a wq and we would wait
      for every worker in every wqe to go home before releasing wq's resources
      on destroying.
      
      To that end, rework wq's refcount by making it independent of the tracking
      of workers because after all they are two different things, and keeping
      it balanced when workers come and go. Note the manager kthread, like
      other workers, now holds a grab to wq during its lifetime.
      
      Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker
      and do nothing for exiting wq.
      
      Cc: stable@vger.kernel.org # v5.5+
      Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com
      Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ca6ce23
    • Sebastian Andrzej Siewior's avatar
      io_wq: Make io_wqe::lock a raw_spinlock_t · 4863be65
      Sebastian Andrzej Siewior authored
      
      
      commit 95da84659226d75698a1ab958be0af21d9cc2a9c upstream.
      
      During a context switch the scheduler invokes wq_worker_sleeping() with
      disabled preemption. Disabling preemption is needed because it protects
      access to `worker->sleeping'. As an optimisation it avoids invoking
      schedule() within the schedule path as part of possible wake up (thus
      preempt_enable_no_resched() afterwards).
      
      The io-wq has been added to the mix in the same section with disabled
      preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping()
      acquires a spinlock_t. Also within the schedule() the spinlock_t must be
      acquired after tsk_is_pi_blocked() otherwise it will block on the
      sleeping lock again while scheduling out.
      
      While playing with `io_uring-bench' I didn't notice a significant
      latency spike after converting io_wqe::lock to a raw_spinlock_t. The
      latency was more or less the same.
      
      In order to keep the spinlock_t it would have to be moved after the
      tsk_is_pi_blocked() check which would introduce a branch instruction
      into the hot path.
      
      The lock is used to maintain the `work_list' and wakes one task up at
      most.
      Should io_wqe_cancel_pending_work() cause latency spikes, while
      searching for a specific item, then it would need to drop the lock
      during iterations.
      revert_creds() is also invoked under the lock. According to debug
      cred::non_rcu is 0. Otherwise it should be moved outside of the locked
      section because put_cred_rcu()->free_uid() acquires a sleeping lock.
      
      Convert io_wqe::lock to a raw_spinlock_t.c
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4863be65
    • Jens Axboe's avatar
      io_uring: reference ->nsproxy for file table commands · b6a6d1df
      Jens Axboe authored
      
      
      commit 9b8284921513fc1ea57d87777283a59b05862f03 upstream.
      
      If we don't get and assign the namespace for the async work, then certain
      paths just don't work properly (like /dev/stdin, /proc/mounts, etc).
      Anything that references the current namespace of the given task should
      be assigned for async work on behalf of that task.
      
      Cc: stable@vger.kernel.org # v5.5+
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6a6d1df
    • Jens Axboe's avatar
      io_uring: don't rely on weak ->files references · 511abcea
      Jens Axboe authored
      
      
      commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream.
      
      Grab actual references to the files_struct. To avoid circular references
      issues due to this, we add a per-task note that keeps track of what
      io_uring contexts a task has used. When the tasks execs or exits its
      assigned files, we cancel requests based on this tracking.
      
      With that, we can grab proper references to the files table, and no
      longer need to rely on stashing away ring_fd and ring_file to check
      if the ring_fd may have been closed.
      
      Cc: stable@vger.kernel.org # v5.5+
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      511abcea
    • Jens Axboe's avatar
      io_uring: enable task/files specific overflow flushing · fdc84c9b
      Jens Axboe authored
      
      
      commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream.
      
      This allows us to selectively flush out pending overflows, depending on
      the task and/or files_struct being passed in.
      
      No intended functional changes in this patch.
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdc84c9b
    • Jens Axboe's avatar
      io_uring: return cancelation status from poll/timeout/files handlers · 3de61f9b
      Jens Axboe authored
      
      
      commit 76e1b6427fd8246376a97e3227049d49188dfb9c upstream.
      
      Return whether we found and canceled requests or not. This is in
      preparation for using this information, no functional changes in this
      patch.
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3de61f9b
    • Jens Axboe's avatar
      io_uring: unconditionally grab req->task · f34e674f
      Jens Axboe authored
      
      
      commit e3bc8e9dad7f2f83cc807111d4472164c9210153 upstream.
      
      Sometimes we assign a weak reference to it, sometimes we grab a
      reference to it. Clean this up and make it unconditional, and drop the
      flag related to tracking this state.
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f34e674f
    • Jens Axboe's avatar
      io_uring: stash ctx task reference for SQPOLL · bf030598
      Jens Axboe authored
      
      
      commit 2aede0e417db846793c276c7a1bbf7262c8349b0 upstream.
      
      We can grab a reference to the task instead of stashing away the task
      files_struct. This is doable without creating a circular reference
      between the ring fd and the task itself.
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf030598