1. 28 Aug, 2019 2 commits
  2. 27 Aug, 2019 1 commit
  3. 22 Aug, 2019 2 commits
  4. 20 Aug, 2019 2 commits
  5. 13 Aug, 2019 1 commit
  6. 12 Aug, 2019 1 commit
    • Guillaume Abrioux's avatar
      osd: copy systemd-device-to-id.sh on all osd nodes before running it · 81906344
      Guillaume Abrioux authored
      Otherwise it will fail when running rolling_update.yml playbook because
      of `serial: 1` usage.
      The task which copies the script is run against the current node being
      played only whereas the task which runs the script is run against all
      nodes in a loop, it ends up with the typical error:
      2019-08-08 17:47:05,115 p=14905 u=ubuntu |  failed: [magna023 -> magna030] (item=magna030) => {
          "changed": true,
          "cmd": [
          "delta": "0:00:00.004339",
          "end": "2019-08-08 17:46:59.059670",
          "invocation": {
              "module_args": {
                  "_raw_params": "/usr/bin/env bash /tmp/systemd-device-to-id.sh",
                  "_uses_shell": false,
                  "argv": null,
                  "chdir": null,
                  "creates": null,
                  "executable": null,
                  "removes": null,
                  "stdin": null,
                  "warn": true
          "item": "magna030",
          "msg": "non-zero return code",
          "rc": 127,
          "start": "2019-08-08 17:46:59.055331",
          "stderr": "bash: /tmp/systemd-device-to-id.sh: No such file or directory",
          "stderr_lines": [
              "bash: /tmp/systemd-device-to-id.sh: No such file or directory"
          "stdout": "",
          "stdout_lines": []
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209
      Signed-off-by: default avatarGuillaume Abrioux <gabrioux@redhat.com>
  7. 08 Aug, 2019 1 commit
  8. 07 Aug, 2019 1 commit
  9. 06 Aug, 2019 1 commit
    • Dimitri Savineau's avatar
      shrink-osd: Stop ceph-disk container based on ID · 343eec7a
      Dimitri Savineau authored
      Since bedc0ab6
       we now manage ceph-osd systemd unit scripts based on ID
      instead of device name but it was not present in the shrink-osd
      playbook (ceph-disk version).
      To keep backward compatibility on deployment that didn't do yet the
      transition on OSD id then we should stop unit scripts for both device
      and ID.
      This commit adds the ulimit nofile container option to get better
      performance on ceph-disk commands.
      It also fixes an issue when the OSD id matches multiple OSD ids with
      the same first digit.
      $ ceph-disk list | grep osd.1
       /dev/sdb1 ceph data, prepared, cluster ceph, osd.1, block /dev/sdb2
       /dev/sdg1 ceph data, prepared, cluster ceph, osd.12, block /dev/sdg2
      Finally removing the shrinked OSD directory.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
  10. 01 Aug, 2019 1 commit
  11. 31 Jul, 2019 1 commit
    • Dimitri Savineau's avatar
      ceph-osd: check container engine rc for pools · 4dffcfb4
      Dimitri Savineau authored
      When creating OpenStack pools, we only check if the return code from
      the pool list command isn't 0 (ie: if it doesn't exist). In that case,
      the return code will be 2. That's why the next condition is rc != 0 for
      the pool creation.
      But in containerized deployment, the return code could be different if
      there's a failure on the container engine command (like container not
      running). In that case, the return code could but either 1 (docker) or
      125 (podman) so we should fail at this point and not in the next tasks.
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit d549fffd)
  12. 30 Jul, 2019 1 commit
    • Dimitri Savineau's avatar
      tests: Update ooo-collocation scenario · bf8bd4c0
      Dimitri Savineau authored
      The ooo-collocation scenario was still using an old container image and
      doesn't match the requirement on latest stable-3.2 code. We need to use
      at least the container image v3.2.5.
      Also updating the OSD tests to reflect the changes introduced by the
      commit bedc0ab6
       because we don't have the OSD systemd unit script using
      device name anymore.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
  13. 26 Jul, 2019 2 commits
    • Dimitri Savineau's avatar
      Remove NBSP characters · 5463d730
      Dimitri Savineau authored
      Some NBSP are still present in the yaml files.
      Adding a test in travis CI.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit 07c6695d)
    • Dimitri Savineau's avatar
      ceph-osd: use OSD id with systemd ceph-disk · bedc0ab6
      Dimitri Savineau authored
      When using containerized deployment we have to create the systemd
      service unit based on a template.
      The current implementation with ceph-disk is using the device name
      as paramater to the systemd service and for the container name too.
      $ systemctl start ceph-osd@sdb
      $ docker ps --filter 'name=ceph-osd-*'
      CONTAINER ID IMAGE                        NAMES
      065530d0a27f ceph/daemon:latest-luminous  ceph-osd-strg0-sdb
      This is the only scenario (compared to non containerized or
      ceph-volume based deployment) that isn't using the OSD id.
      $ systemctl start ceph-osd@0
      $ docker ps --filter 'name=ceph-osd-*'
      CONTAINER ID IMAGE                        NAMES
      d34552ec157e ceph/daemon:latest-luminous  ceph-osd-0
      Also if the device mapping doesn't persist to system reboot (ie sdb
      might be remapped to sde) then the OSD service won't come back after
      the reboot.
      This patch allows to use the OSD id with the ceph-osd systemd service
      but requires to activate the OSD manually with ceph-disk first in
      order to affect the ID to that OSD.
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670734
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
  14. 12 Jul, 2019 1 commit
  15. 10 Jul, 2019 2 commits
    • Ramana Raja's avatar
      Install nfs-ganesha stable v2.7 · 9097f984
      Ramana Raja authored
      nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
      stable that is currently being maintained.
      Signed-off-by: default avatarRamana Raja <rraja@redhat.com>
      (cherry picked from commit dfff89ce)
    • Guillaume Abrioux's avatar
      validate: improve message printed in check_devices.yml · 1716eea5
      Guillaume Abrioux authored
      The message prints the whole content of the registered variable in the
      playbook, this is not needed and makes the message pretty unclear and
      "msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023
      (cherry picked from commit e6dc3ebd
      Signed-off-by: default avatarGuillaume Abrioux <gabrioux@redhat.com>
  16. 09 Jul, 2019 2 commits
  17. 08 Jul, 2019 1 commit
    • Dimitri Savineau's avatar
      ceph-handler: Fix rgw socket in restart script · 94cdef27
      Dimitri Savineau authored
      If the SOCKET variable isn't defined in the script then the test
      command won't fail because the return code is 0
      $ test -S
      $ echo $?
      There multiple issues in that script:
        - The default SOCKET value isn't defined.
        - Update the wget parameters because the command is doing a loop.
      We now use the same option than curl.
        - The check_rest function doesn't test the radosgw at all due to
      a wrong test command (test against a string) and always returns 0.
      This needs to use the DOCKER_EXEC variable in order to execute the
      $ test 'wget'
      $ echo $?
      Resolves: #3926
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit c90f605b)
  18. 07 Jul, 2019 1 commit
    • Dimitri Savineau's avatar
      ceph-handler: Fix radosgw_address default value · 9cc5d1e9
      Dimitri Savineau authored
      The rgw restart script set the RGW_IP variable depending on ansible
        - radosgw_address
        - radosgw_address_block
        - radosgw_interface
      Those variables have default values defined in ceph-defaults role:
      radosgw_interface: interface
      radosgw_address_block: subnet
      But in the rgw restart script we always use the radosgw_address value
      instead of the radosgw_interface when defined because we aren't testing
      the right default value.
      As a consequence, the RGW_IP variable will be set to even if
      the ip address associated to the radosgw_interface variable is set
      correctly. This causes the check_rest function to fail.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
  19. 25 Jun, 2019 1 commit
  20. 24 Jun, 2019 5 commits
  21. 21 Jun, 2019 2 commits
    • Dimitri Savineau's avatar
      ceph-handler: Fix OSD restart script · 2b492e3d
      Dimitri Savineau authored
      There's two big issues with the current OSD restart script.
      1/ We try to test if the ceph osd daemon socket exists but we use a
      wildcard for the socket name : /var/run/ceph/*.asok.
      This fails because we usually have multiple ceph osd sockets (or
      other ceph daemon collocated) present in /var/run/ceph directory.
      Currently the test fails with:
      bash: line xxx: [: too many arguments
      But it doesn't stop the script execution.
      Instead we can specify the full ceph osd socket name because we
      already know the OSD id.
      2/ The container filter pattern is wrong and could matches multiple
      containers resulting the script to fail.
      We use the filter with two different patterns. One is with the device
      name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
      ceph-osd-15, ..).
      In both case we could match more than needed.
      $ docker container ls
      CONTAINER ID IMAGE              NAMES
      958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
      589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
      46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
      877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
      $ docker container ls -q -f "name=sda"
      $ docker container ls
      CONTAINER ID IMAGE              NAMES
      2db399b3ee85 ceph-daemon:latest ceph-osd-5
      099dc13f08f1 ceph-daemon:latest ceph-osd-13
      5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
      d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
      $ docker container ls -q -f "name=ceph-osd-1"
      Adding an extra '$' character at the end of the pattern solves the
      Finally removing the get_container_osd_id function because it's not
      used in the script at all.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit 45d46541)
    • Dimitri Savineau's avatar
      ceph-volume: Set max open files limit on container · f4212b20
      Dimitri Savineau authored
      The ceph-volume lvm list command takes ages to complete when having
      a lot of LV devices on containerized deployment.
      For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
      Adding the max open files limit to the container engine cli when
      executing the ceph-volume command seems to improve a lot thee
      execution time ~30s.
      This was impacting the OSDs creation with ceph-volume (both filestore
      and bluestore) when using multiple LV devices.
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit b9875348)
  22. 19 Jun, 2019 1 commit
  23. 18 Jun, 2019 2 commits
  24. 17 Jun, 2019 2 commits
    • Dimitri Savineau's avatar
      remove ceph-agent role and references · 81de8a81
      Dimitri Savineau authored
      The ceph-agent role was used only for RHCS 2 (jewel) so it's not
      usefull anymore.
      The current code will fail on CentOS distribution because the rhscon
      package is only avaible on Red Hat with the RHCS 2 repository and
      this ceph release is supported on stable-3.0 branch.
      Resolves: #4020
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit 7503098c)
    • Dimitri Savineau's avatar
      tests: Update ansible ssh_args variable · ed9b594b
      Dimitri Savineau authored
      Because we're using vagrant, a ssh config file will be created for
      each nodes with options like user, host, port, identity, etc...
      But via tox we're override ANSIBLE_SSH_ARGS to use this file. This
      remove the default value set in ansible.cfg.
      Also adding PreferredAuthentications=publickey because CentOS/RHEL
      servers are configured with GSSAPIAuthenticationis enabled for ssh
      server forcing the client to make a PTR DNS query.
      Signed-off-by: default avatarDimitri Savineau <dsavinea@redhat.com>
      (cherry picked from commit 34f9d511)
  25. 13 Jun, 2019 1 commit
  26. 12 Jun, 2019 1 commit
  27. 10 Jun, 2019 1 commit