Ansible module to restart ec2 instances #6905
Conversation
contains documentation on what this module is going to do and some basic initialization, all the functionalities will be implemented in later commits
…so that they don't have to tossed around in every helper
| - Manages the running state of EC2 instances given an explicit list of | ||
| instance IDs. Supports four commands - describe, start, stop, stop_and_start, | ||
| and is idempotent (no API call is made if the instance is already in the requested state). | ||
| - Designed to run with delegate_to localhost. AWS credentials and the target region are picked up from the standard boto3 credential chain; in the commcare-cloud workflow the AWS_PROFILE and AWS_REGION environment variables are exported automatically before ansible runs. |
There was a problem hiding this comment.
I'm just not too familiar with this. Do you mind elaborating on the "delete_to localhost" workflow?
There was a problem hiding this comment.
It is delegate_to localhost. This is the command that we want to run locally on the host from where the command was run instead of running it on the target machine.
For this case the AWS credentials will be on our local system so we want to run the module from our local systems.
| bad = [i for i in instance_ids if not INSTANCE_ID_RE.match(i)] | ||
| if bad: |
There was a problem hiding this comment.
nit
| bad = [i for i in instance_ids if not INSTANCE_ID_RE.match(i)] | |
| if bad: | |
| bad_ids = [i for i in instance_ids if not INSTANCE_ID_RE.match(i)] | |
| if bad_ids: |
| except ImportError: | ||
| raise RuntimeError( | ||
| "boto3 is required by ec2_instance_state but is not installed." | ||
| ) |
There was a problem hiding this comment.
Why not call module.fail_json directly here, similar to what _get_region does, and then that way main doesn't need to wrap the call to _get_ec2_client in a try/except.
There was a problem hiding this comment.
Thats a good point. Will update it.
| """Per-run context shared by the flow helpers. | ||
|
|
||
| Bundles the EC2 client, the AnsibleModule, so these don't have to be | ||
| passed as arguments to every helper. | ||
| """ |
There was a problem hiding this comment.
| """Per-run context shared by the flow helpers. | |
| Bundles the EC2 client, the AnsibleModule, so these don't have to be | |
| passed as arguments to every helper. | |
| """ | |
| """ | |
| Bundles the EC2 client and Ansible module for convenience | |
| """ |
There was a problem hiding this comment.
| ) | ||
| return boto3.client('ec2', region_name=region) | ||
|
|
||
| class _Ctx: |
There was a problem hiding this comment.
Not my favorite name, but need to continue reviewing to offer suggestions.
There was a problem hiding this comment.
All of the functions that accept this type (like _do_start(ctx, ...)) look like instance methods. What do you think of renaming this to something like StopStarter and moving those functions to methods?
class StopStarter:
def __init__(self, client, module):
self.client = client
self.module = module
def describe(self, instance_ids):
...
def start(self, instance_ids, wait):
...
def stop(self, instance_ids, wait):
...
...Feel free to pick a different name, that was just the first thing that come to mind.
There was a problem hiding this comment.
Thanks @millerdev. Used EC2InstanceManager class name 616ad06
This looks much better.
| def _wait_for(ctx, waiter_name, wait_instances): | ||
| if not wait_instances or ctx.module.check_mode: | ||
| return | ||
| waiter = ctx.client.get_waiter(waiter_name) | ||
| try: | ||
| waiter.wait(InstanceIds=[i.instance_id for i in wait_instances]) | ||
| except Exception as e: # noqa: BLE001 - surface any waiter failure as module failure | ||
| ctx.module.fail_json( | ||
| msg=f"Waiter {waiter_name!r} failed for {_labels(wait_instances)}: {e}") | ||
| return |
There was a problem hiding this comment.
nit: wait_instances -> instances
There was a problem hiding this comment.
| return ', '.join(i.label for i in instances) | ||
|
|
||
|
|
||
| def _check_not_terminated(ctx, instances, action): |
There was a problem hiding this comment.
nit: my preference would be to not pass action into this method since when I first saw _check_not_terminated(ctx, instances, InstanceCommand.START), my first thought was "why would the action impact a check for terminated instances?". I guess the alternatives are generalize the error message or leave it up to the caller to call module.fail_json.
There was a problem hiding this comment.
While moving stuff to EC2InstanceManager, this was addressed.
616ad06
| ctx.module.fail_json(msg=f"StopInstances failed for {labels}: {e}") | ||
| return | ||
|
|
||
| wait_for_stopped = list(targets) + list(already_stopping) |
There was a problem hiding this comment.
targets and already_stopping are added together on line 336 as well. Could put this variable up with where targets and already_stopping are already defined. I also don't think these need to be wrapped in list(...) right?
| # Honor the user's `wait` choice for the final running wait. | ||
| start_payload = _do_start(ctx, instance_ids, wait=wait) | ||
|
|
||
| # Combine: previous_state = state before the stop; current_state = after start. |
| # unchanged = ids that were no-ops in BOTH phases. | ||
| # Highly unlikely to happen in practice, but we sort to make the result deterministic. |
There was a problem hiding this comment.
How would this be possible?
There was a problem hiding this comment.
This would only happen when the instance state would be terminated but given the fact we are checking for it already so it should never happen. But I kept it for to keep the output same.
… on _do_start and _do_stop to build payload and the output uses the output from _do_stop
https://dimagi.atlassian.net/browse/SAAS-19382
A redo of #6858, the changes were so much that it made no sense continuing on the existing branch.
This is part of the groundwork required to automate the machine restarts.
I have tested the command locally on staging
web13machine and the results were as expected, following are the output for each command -Environments Affected
All