1. 15 Jul, 2021 2 commits
  2. 07 Apr, 2021 2 commits
  3. 23 Jan, 2021 1 commit
  4. 17 Dec, 2020 1 commit
  5. 13 Oct, 2020 3 commits
    • Darrick J. Wong's avatar
      xfs: annotate grabbing the realtime bitmap/summary locks in growfs · ace74e79
      Darrick J. Wong authored
      
      
      Use XFS_ILOCK_RT{BITMAP,SUM} to annotate grabbing the rt bitmap and
      summary locks when we grow the realtime volume, just like we do most
      everywhere else.  This shuts up lockdep warnings about grabbing the
      ILOCK class of locks recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.9.0-rc4-djw #rc4 Tainted: G           O
      --------------------------------------------
      xfs_growfs/4841 is trying to acquire lock:
      ffff888035acc230 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]
      
      but task is already holding lock:
      ffff888035acedb0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&xfs_nondir_ilock_class);
        lock(&xfs_nondir_ilock_class);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      ace74e79
    • Darrick J. Wong's avatar
      xfs: make xfs_growfs_rt update secondary superblocks · 7249c95a
      Darrick J. Wong authored
      
      
      When we call growfs on the data device, we update the secondary
      superblocks to reflect the updated filesystem geometry.  We need to do
      this for growfs on the realtime volume too, because a future xfs_repair
      run could try to fix the filesystem using a backup superblock.
      
      This was observed by the online superblock scrubbers while running
      xfs/233.  One can also trigger this by growing an rt volume, cycling the
      mount, and creating new rt files.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      7249c95a
    • Darrick J. Wong's avatar
      xfs: fix realtime bitmap/summary file truncation when growing rt volume · f4c32e87
      Darrick J. Wong authored
      
      
      The realtime bitmap and summary files are regular files that are hidden
      away from the directory tree.  Since they're regular files, inode
      inactivation will try to purge what it thinks are speculative
      preallocations beyond the incore size of the file.  Unfortunately,
      xfs_growfs_rt forgets to update the incore size when it resizes the
      inodes, with the result that inactivating the rt inodes at unmount time
      will cause their contents to be truncated.
      
      Fix this by updating the incore size when we change the ondisk size as
      part of updating the superblock.  Note that we don't do this when we're
      allocating blocks to the rt inodes because we actually want those blocks
      to get purged if the growfs fails.
      
      This fixes corruption complaints from the online rtsummary checker when
      running xfs/233.  Since that test requires rmap, one can also trigger
      this by growing an rt volume, cycling the mount, and creating rt files.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      f4c32e87
  6. 23 Sep, 2020 1 commit
    • Chandan Babu R's avatar
      xfs: Set xfs_buf's b_ops member when zeroing bitmap/summary files · c54e14d1
      Chandan Babu R authored
      
      
      In xfs_growfs_rt(), we enlarge bitmap and summary files by allocating
      new blocks for both files. For each of the new blocks allocated, we
      allocate an xfs_buf, zero the payload, log the contents and commit the
      transaction. Hence these buffers will eventually find themselves
      appended to list at xfs_ail->ail_buf_list.
      
      Later, xfs_growfs_rt() loops across all of the new blocks belonging to
      the bitmap inode to set the bitmap values to 1. In doing so, it
      allocates a new transaction and invokes the following sequence of
      functions,
        - xfs_rtfree_range()
          - xfs_rtmodify_range()
            - xfs_rtbuf_get()
              We pass '&xfs_rtbuf_ops' as the ops pointer to xfs_trans_read_buf().
              - xfs_trans_read_buf()
      	  We find the xfs_buf of interest in per-ag hash table, invoke
      	  xfs_buf_reverify() which ends up assigning '&xfs_rtbuf_ops' to
      	  xfs_buf->b_ops.
      
      On the other hand, if xfs_growfs_rt_alloc() had allocated a few blocks
      for the bitmap inode and returned with an error, all the xfs_bufs
      corresponding to the new bitmap blocks that have been allocated would
      continue to be on xfs_ail->ail_buf_list list without ever having a
      non-NULL value assigned to their b_ops members. An AIL flush operation
      would then trigger the following warning message to be printed on the
      console,
      
        XFS (loop0): _xfs_buf_ioapply: no buf ops on daddr 0x58 len 8
        00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        CPU: 3 PID: 449 Comm: xfsaild/loop0 Not tainted 5.8.0-rc4-chandan-00038-g4d8c2b9de9ab-dirty #37
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0x57/0x70
         _xfs_buf_ioapply+0x37c/0x3b0
         ? xfs_rw_bdev+0x1e0/0x1e0
         ? xfs_buf_delwri_submit_buffers+0xd4/0x210
         __xfs_buf_submit+0x6d/0x1f0
         xfs_buf_delwri_submit_buffers+0xd4/0x210
         xfsaild+0x2c8/0x9e0
         ? __switch_to_asm+0x42/0x70
         ? xfs_trans_ail_cursor_first+0x80/0x80
         kthread+0xfe/0x140
         ? kthread_park+0x90/0x90
         ret_from_fork+0x22/0x30
      
      This message indicates that the xfs_buf had its b_ops member set to
      NULL.
      
      This commit fixes the issue by assigning "&xfs_rtbuf_ops" to b_ops
      member of each of the xfs_bufs logged by xfs_growfs_rt_alloc().
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      c54e14d1
  7. 21 Sep, 2020 1 commit
    • Chandan Babu R's avatar
      xfs: Set xfs_buf type flag when growing summary/bitmap files · 72cc9513
      Chandan Babu R authored
      
      
      The following sequence of commands,
      
        mkfs.xfs -f -m reflink=0 -r rtdev=/dev/loop1,size=10M /dev/loop0
        mount -o rtdev=/dev/loop1 /dev/loop0 /mnt
        xfs_growfs  /mnt
      
      ... causes the following call trace to be printed on the console,
      
      XFS: Assertion failed: (bip->bli_flags & XFS_BLI_STALE) || (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF), file: fs/xfs/xfs_buf_item.c, line: 331
      Call Trace:
       xfs_buf_item_format+0x632/0x680
       ? kmem_alloc_large+0x29/0x90
       ? kmem_alloc+0x70/0x120
       ? xfs_log_commit_cil+0x132/0x940
       xfs_log_commit_cil+0x26f/0x940
       ? xfs_buf_item_init+0x1ad/0x240
       ? xfs_growfs_rt_alloc+0x1fc/0x280
       __xfs_trans_commit+0xac/0x370
       xfs_growfs_rt_alloc+0x1fc/0x280
       xfs_growfs_rt+0x1a0/0x5e0
       xfs_file_ioctl+0x3fd/0xc70
       ? selinux_file_ioctl+0x174/0x220
       ksys_ioctl+0x87/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x3e/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurs because the buffer being formatted has the value of
      XFS_BLFT_UNKNOWN_BUF assigned to the 'type' subfield of
      bip->bli_formats->blf_flags.
      
      This commit fixes the issue by assigning one of XFS_BLFT_RTSUMMARY_BUF
      and XFS_BLFT_RTBITMAP_BUF to the 'type' subfield of
      bip->bli_formats->blf_flags before committing the corresponding
      transaction.
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      72cc9513
  8. 16 Sep, 2020 2 commits
  9. 26 Jan, 2020 1 commit
  10. 24 Oct, 2019 1 commit
    • Brian Foster's avatar
      xfs: don't set bmapi total block req where minleft is · da781e64
      Brian Foster authored
      
      
      xfs_bmapi_write() takes a total block requirement parameter that is
      passed down to the block allocation code and is used to specify the
      total block requirement of the associated transaction. This is used
      to try and select an AG that can not only satisfy the requested
      extent allocation, but can also accommodate subsequent allocations
      that might be required to complete the transaction. For example,
      additional bmbt block allocations may be required on insertion of
      the resulting extent to an inode data fork.
      
      While it's important for callers to calculate and reserve such extra
      blocks in the transaction, it is not necessary to pass the total
      value to xfs_bmapi_write() in all cases. The latter automatically
      sets minleft to ensure that sufficient free blocks remain after the
      allocation attempt to expand the format of the associated inode
      (i.e., such as extent to btree conversion, btree splits, etc).
      Therefore, any callers that pass a total block requirement of the
      bmap mapping length plus worst case bmbt expansion essentially
      specify the additional reservation requirement twice. These callers
      can pass a total of zero to rely on the bmapi minleft policy.
      
      Beyond being superfluous, the primary motivation for this change is
      that the total reservation logic in the bmbt code is dubious in
      scenarios where minlen < maxlen and a maxlen extent cannot be
      allocated (which is more common for data extent allocations where
      contiguity is not required). The total value is based on maxlen in
      the xfs_bmapi_write() caller. If the bmbt code falls back to an
      allocation between minlen and maxlen, that allocation will not
      succeed until total is reset to minlen, which essentially throws
      away any additional reservation included in total by the caller. In
      addition, the total value is not reset until after alignment is
      dropped, which means that such callers drop alignment far too
      aggressively than necessary.
      
      Update all callers of xfs_bmapi_write() that pass a total block
      value of the mapping length plus bmbt reservation to instead pass
      zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
      requirement. This trades off slightly less conservative AG selection
      for the ability to preserve alignment in more scenarios.
      xfs_bmapi_write() callers that incorporate unrelated or additional
      reservations in total beyond what is already included in minleft
      must continue to use the former.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      da781e64
  11. 26 Aug, 2019 1 commit
  12. 29 Jun, 2019 1 commit
  13. 22 Dec, 2018 1 commit
  14. 13 Dec, 2018 1 commit
  15. 12 Dec, 2018 1 commit
    • Omar Sandoval's avatar
      xfs: cache minimum realtime summary level · 355e3532
      Omar Sandoval authored
      
      
      The realtime summary is a two-dimensional array on disk, effectively:
      
      u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]
      
      rsum[log][bbno] is the number of extents of size 2**log which start in
      bitmap block bbno.
      
      xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
      rsum[log][bbno] != 0 for any log level. However, the summary array is
      stored in row-major order (i.e., like an array in C), so all of these
      entries are not adjacent, but rather spread across the entire summary
      file. In the worst case (a full bitmap block), xfs_rtany_summary() has
      to check every level.
      
      This means that on a moderately-used realtime device, an allocation will
      waste a lot of time finding, reading, and releasing buffers for the
      realtime summary. In particular, one of our storage services (which runs
      on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
      spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
      xfs_trans_brelse() called from xfs_rtany_summary().
      
      One solution would be to also store the summary with the dimensions
      swapped. However, this would require a disk format change to a very old
      component of XFS.
      
      Instead, we can cache the minimum size which contains any extents. We do
      so lazily; rather than guaranteeing that the cache contains the precise
      minimum, it always contains a loose lower bound which we tighten when we
      read or update a summary block. This only uses a few kilobytes of memory
      and is already serialized via the realtime bitmap and summary inode
      locks, so the cost is minimal. With this change, the same workload only
      spends 0.2% of its CPU cycles in the realtime allocator.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      355e3532
  16. 26 Jul, 2018 2 commits
  17. 12 Jul, 2018 6 commits
  18. 08 Jun, 2018 1 commit
  19. 06 Jun, 2018 1 commit
    • Dave Chinner's avatar
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner authored
      
      
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  20. 01 Sep, 2017 1 commit
  21. 19 Jun, 2017 1 commit
    • Darrick J. Wong's avatar
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong authored
      
      
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      c8ce540d
  22. 18 Feb, 2017 1 commit
  23. 03 Aug, 2016 3 commits
  24. 05 Apr, 2016 1 commit
    • Christoph Hellwig's avatar
      xfs: better xfs_trans_alloc interface · 253f4911
      Christoph Hellwig authored
      
      
      Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
      that returns a transaction with all the required log and block reservations,
      and which allows passing transaction flags directly to avoid the cumbersome
      _xfs_trans_alloc interface.
      
      While we're at it we also get rid of the transaction type argument that has
      been superflous since we stopped supporting the non-CIL logging mode.  The
      guts of it will be removed in another patch.
      
      [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      253f4911
  25. 09 Feb, 2016 1 commit
  26. 11 Jan, 2016 1 commit
    • Eric Sandeen's avatar
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen authored
      
      
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      f6106efa
  27. 19 Aug, 2015 1 commit