- 10 Sep, 2019 3 commits
-
-
Dimitri Savineau authored
In containerized deployment, the restart OSD handler couldn't be triggered in most ansible execution. This is due to the usage of run_once + a condition on the inventory hostname and the last filter. The run_once is triggered first so ansible will pick a node in the osd group to execute the restart task. But if this node isn't the last one in the osd group then the task is ignored. There's more probability that the task will be ignored than executed. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 5b1c1565)
-
Dimitri Savineau authored
The ceph-rbd-mirror role allows to copy the admin keyring via the copy_admin_key variable but there's actually no task in that role doing the job. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 1f505628)
-
Dimitri Savineau authored
The admin keyring isn't present by default on the rbd mirror nodes so the rbd commands related to the mirroring confguration will fail. Instead we can use the rbd mirror client keyring. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit a3d36df0)
-
- 09 Sep, 2019 2 commits
-
-
Giulio Fidente authored
Ganesha cannot be operated active/active, in those deployments where it is managed by pacemaker the container name can be different than the default. This change uses "ceph_nfs_service_suffix" where previously missing to ensure tasks will work with customized names. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005 Signed-off-by:
Giulio Fidente <gfidente@redhat.com> (cherry picked from commit d2a2bd7c)
-
Dimitri Savineau authored
The rbd mirror configuration was only available for non containerized deployment and was also imcomplete. We now enable the mirroring on the pool and add the remote peer in both scenarios. The default mirroring mode is set to 'pool' but can be configured via the ceph_rbd_mirror_mode variable. This commit also fixes an issue on the rbd mirror command if the ceph cluster name isn't using the default value (ceph) due to a missing --cluster parameter to the command. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 7e5e2174)
-
- 30 Aug, 2019 2 commits
-
-
Dimitri Savineau authored
5b29144b change the mgr node to a dedicated node instead of the first monitor node. But the change didn't update the switch-to-containers inventory which cause this playbook to fail. Also update the ubuntu inventory to have the same configuration. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
Dimitri Savineau authored
We don't have a reason to not apply firewall rules on the host when using a containerized deployment. The TripleO environments already manage the ceph firewall rules outside ceph-ansible and set the configure_firewall variable to false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 771f25b1)
-
- 28 Aug, 2019 2 commits
-
-
Dimitri Savineau authored
Like the OpenStack keyrings, we can use the profile rbd for the clients keyring (both mon and osd). Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 49aa05b9)
-
Dimitri Savineau authored
This reverts commit 2d955757 . The "osd blacklist" isn't an osd caps but should be used with mon caps. Also the correct caps for this is: 'allow command "osd blacklist"'. The current change is breaking the openstack and clients keyrings. By using the profile rbd (which is already used) we already rely on the ability to blacklist dead client. Resolves: #4385 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 717af834)
-
- 27 Aug, 2019 1 commit
-
-
Dimitri Savineau authored
On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1744390 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 9a4ac46d)
-
- 22 Aug, 2019 2 commits
-
-
Guillaume Abrioux authored
This can't be backported from master since there was too many modifications meantime. When mgr aren't all ready, sometimes the following error can show up: ``` stderr: 'Error ENOENT: all mgr daemons do not support module ''status'', pass --force to force enablement' ``` This commit adds a check so all mgr are available when we try to enable modules. Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
Guillaume Abrioux authored
ceph-volume will complain if gpt headers are found on devices. This commit checks whether a gpt header is present on devices passed in `devices` variable and fail early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 487d7016)
-
- 20 Aug, 2019 2 commits
-
-
Guillaume Abrioux authored
In `stable-4.0`, the group name `iscsi-gws` will go away and some rgw systemd service names will disappear as well: (`ceph-rgw@<hostname>.service`, `ceph-radosgw@<hostname>.service`, `ceph-radosgw@radosgw.<hostname>.service`, `ceph-radosgw@radosgw.gateway.service`) Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
Guillaume Abrioux authored
we shouldn't validate these two variables when `osd_auto_discovery` is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 243edfbc)
-
- 13 Aug, 2019 1 commit
-
-
Guillaume Abrioux authored
we must use the ids instead of device names in the tasks executed in `post_tasks` for the osd rolling update otherwise it ends up with old systemd units enabled. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
- 12 Aug, 2019 1 commit
-
-
Guillaume Abrioux authored
Otherwise it will fail when running rolling_update.yml playbook because of `serial: 1` usage. The task which copies the script is run against the current node being played only whereas the task which runs the script is run against all nodes in a loop, it ends up with the typical error: ``` 2019-08-08 17:47:05,115 p=14905 u=ubuntu | failed: [magna023 -> magna030] (item=magna030) => { "changed": true, "cmd": [ "/usr/bin/env", "bash", "/tmp/systemd-device-to-id.sh" ], "delta": "0:00:00.004339", "end": "2019-08-08 17:46:59.059670", "invocation": { "module_args": { "_raw_params": "/usr/bin/env bash /tmp/systemd-device-to-id.sh", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "item": "magna030", "msg": "non-zero return code", "rc": 127, "start": "2019-08-08 17:46:59.055331", "stderr": "bash: /tmp/systemd-device-to-id.sh: No such file or directory", "stderr_lines": [ "bash: /tmp/systemd-device-to-id.sh: No such file or directory" ], "stdout": "", "stdout_lines": [] } ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
- 08 Aug, 2019 1 commit
-
-
Guillaume Abrioux authored
let's deploy mgr on a dedicated node. This makes update job failing on stable-4.0 branch since there's a mismatch between the two inventories. Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
- 07 Aug, 2019 1 commit
-
-
Guillaume Abrioux authored
This commits adds the `osd blacklist` cap on all OSP clients keyrings. Fixes: #2296 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 2d955757)
-
- 06 Aug, 2019 1 commit
-
-
Dimitri Savineau authored
Since bedc0ab6 we now manage ceph-osd systemd unit scripts based on ID instead of device name but it was not present in the shrink-osd playbook (ceph-disk version). To keep backward compatibility on deployment that didn't do yet the transition on OSD id then we should stop unit scripts for both device and ID. This commit adds the ulimit nofile container option to get better performance on ceph-disk commands. It also fixes an issue when the OSD id matches multiple OSD ids with the same first digit. $ ceph-disk list | grep osd.1 /dev/sdb1 ceph data, prepared, cluster ceph, osd.1, block /dev/sdb2 /dev/sdg1 ceph data, prepared, cluster ceph, osd.12, block /dev/sdg2 Finally removing the shrinked OSD directory. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
- 01 Aug, 2019 1 commit
-
-
Dimitri Savineau authored
Allow to configure the rgw beast frontend in addition to civetweb (default value). Add rgw_thread_pool_size variable with 512 as default value and keep backward compatibility with num_threads option when using civetweb. Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size change. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733406 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit d17b1b48)
-
- 31 Jul, 2019 1 commit
-
-
Dimitri Savineau authored
When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit d549fffd)
-
- 30 Jul, 2019 1 commit
-
-
Dimitri Savineau authored
The ooo-collocation scenario was still using an old container image and doesn't match the requirement on latest stable-3.2 code. We need to use at least the container image v3.2.5. Also updating the OSD tests to reflect the changes introduced by the commit bedc0ab6 because we don't have the OSD systemd unit script using device name anymore. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
- 26 Jul, 2019 2 commits
-
-
Dimitri Savineau authored
Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 07c6695d)
-
Dimitri Savineau authored
When using containerized deployment we have to create the systemd service unit based on a template. The current implementation with ceph-disk is using the device name as paramater to the systemd service and for the container name too. $ systemctl start ceph-osd@sdb $ docker ps --filter 'name=ceph-osd-*' CONTAINER ID IMAGE NAMES 065530d0a27f ceph/daemon:latest-luminous ceph-osd-strg0-sdb This is the only scenario (compared to non containerized or ceph-volume based deployment) that isn't using the OSD id. $ systemctl start ceph-osd@0 $ docker ps --filter 'name=ceph-osd-*' CONTAINER ID IMAGE NAMES d34552ec157e ceph/daemon:latest-luminous ceph-osd-0 Also if the device mapping doesn't persist to system reboot (ie sdb might be remapped to sde) then the OSD service won't come back after the reboot. This patch allows to use the OSD id with the ceph-osd systemd service but requires to activate the OSD manually with ceph-disk first in order to affect the ID to that OSD. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670734 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
- 12 Jul, 2019 1 commit
-
-
Dimitri Savineau authored
Both ntp and chrony daemon use variable for the service name because it could be different depending on the GNU/Linux distribution. This has been update in 9d88d319 for chrony but only for the start part not for the handler. The commit fixes this for both ntp and chrony. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 0ae01931)
-
- 10 Jul, 2019 2 commits
-
-
Ramana Raja authored
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7 stable that is currently being maintained. Signed-off-by:
Ramana Raja <rraja@redhat.com> (cherry picked from commit dfff89ce)
-
Guillaume Abrioux authored
The message prints the whole content of the registered variable in the playbook, this is not needed and makes the message pretty unclear and unreadable. ``` "msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!" ``` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023 (cherry picked from commit e6dc3ebd ) Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
- 09 Jul, 2019 2 commits
-
-
Guillaume Abrioux authored
When shrinking an OSD, its corresponding 'prepare container' should be removed otherwise it prevent from redeploying a new osd because of this leftover. Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
Guillaume Abrioux authored
Removing the gpt header on devices will ease ceph-disk to ceph-volume migration when using shrink-osd + add-osd playbooks. ceph-disk requires GPT header where ceph-volume will complain if GPT header is present. That won't break ceph-disk (re)deployment since we check and add the GPT header if needed when deploying ceph-disk ODs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613735 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
- 08 Jul, 2019 1 commit
-
-
Dimitri Savineau authored
If the SOCKET variable isn't defined in the script then the test command won't fail because the return code is 0 $ test -S $ echo $? 0 There multiple issues in that script: - The default SOCKET value isn't defined. - Update the wget parameters because the command is doing a loop. We now use the same option than curl. - The check_rest function doesn't test the radosgw at all due to a wrong test command (test against a string) and always returns 0. This needs to use the DOCKER_EXEC variable in order to execute the command. $ test 'wget http://192.168.100.11:8080' $ echo $? 0 Resolves: #3926 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit c90f605b)
-
- 07 Jul, 2019 1 commit
-
-
Dimitri Savineau authored
The rgw restart script set the RGW_IP variable depending on ansible variables: - radosgw_address - radosgw_address_block - radosgw_interface Those variables have default values defined in ceph-defaults role: radosgw_interface: interface radosgw_address: 0.0.0.0 radosgw_address_block: subnet But in the rgw restart script we always use the radosgw_address value instead of the radosgw_interface when defined because we aren't testing the right default value. As a consequence, the RGW_IP variable will be set to 0.0.0.0 even if the ip address associated to the radosgw_interface variable is set correctly. This causes the check_rest function to fail. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
- 25 Jun, 2019 1 commit
-
-
Gabriel Ramirez authored
Alphabetized ceph_repository_uca keys due to errors validating when using UCA/queens repository on Ubuntu 16.04 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SchemaError: -> ceph_stable_repo_uca schema item is not alphabetically ordered Closes: #4154 Signed-off-by:
Gabriel Ramirez <gabrielramirez1109@gmail.com> (cherry picked from commit 82262c6e)
-
- 24 Jun, 2019 5 commits
-
-
Dimitri Savineau authored
789cef76 introduces a regression in the ganesha configuration file generation. The new config_template module version broke it. But the ganesha.conf file isn't an ini file and doesn't really need to use the config_template module. Instead we can use the classic template module. Resolves: #4045 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 616c4846)
-
Guillaume Abrioux authored
This tries to first unmount any cephfs/nfs-ganesha mount point on client nodes, then unmap any mapped rbd devices and finally it tries to remove ceph kernel modules. If it fails it means some resources are still busy and should be cleaned manually before continuing to purge the cluster. This is done early in the playbook so the cluster stays untouched until everything is ready for that operation, otherwise if you try to redeploy a cluster it could end up by getting confused by leftover from previous deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915 Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 20e48528)
-
Guillaume Abrioux authored
ceph-facts should be run before we play ceph-validate since it has reference to facts that are set in ceph-facts role. Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
Guillaume Abrioux authored
This was removed because of broken repositories which made the CI failing. That doesn't make sense anymore so adding back it Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com>
-
Dimitri Savineau authored
Same behaviour than ceph-volume (b9875348 ). The ceph-disk command runs faster when using ulimit nofile with container cli. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com>
-
- 21 Jun, 2019 2 commits
-
-
Dimitri Savineau authored
There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit 45d46541)
-
Dimitri Savineau authored
The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by:
Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit b9875348)
-
- 19 Jun, 2019 1 commit
-
-
Guillaume Abrioux authored
Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by:
Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 80875adb)
-