moto/moto
Brian Pandola f7467164e4
Fix Race Condition in batch:SubmitJob (#3480)
* Extract Duplicate Code into Helper Method

DRY up the tests and replace the arbitrary `sleep()` calls with a more
explicit check before progressing.

* Improve Testing of batch:TerminateJob

The test now confirms that the job was terminated by sandwiching a `sleep`
command between two `echo` commands.  In addition to the original checks
of the terminated job status/reason, the test now asserts that only the
first echo command succeeded, confirming that the job was indeed terminated
while in progress.

* Fix Race Condition in batch:SubmitJob

The `test_submit_job` in `test_batch.py` kicks off a job, calls `describe_jobs`
in a loop until the job status returned is SUCCEEDED, and then asserts against
the logged events.

The backend code that runs the submitted job does so in a separate thread. If
the job was successful, the job status was being set to SUCCEEDED *before* the
event logs had been written to the logging backend.

As a result, it was possible for the primary thread running the test to detect
that the job was successful immediately after the secondary thread had updated
the job status but before the secondary thread had written the logs to the
logging backend.  Under the right conditions, this could cause the subsequent
logging assertions in the primary thread to fail.

Additionally, the code that collected the logs from the container was using
a "dodgy hack" of time.sleep() and a modulo-based conditional that was
ultimately non-deterministic and could result in log messages being dropped
or duplicated in certain scenarios.

In order to address these issues, this commit does the following:

* Carefully re-orders any code that sets a job status or timestamp
  to avoid any obvious race conditions.
* Removes the "dodgy hack" in favor of a much more straightforward
  (and less error-prone) method of collecting logs from the container.
* Removes arbitrary and unnecessary calls to time.sleep()

Before applying any changes, the flaky test was failing about 12% of the
time.  Putting a sleep() call between setting the `job_status` to SUCCEEDED
and collecting the logs, resulted in a 100% failure rate.  Simply moving
the code that sets the job status to SUCCEEDED to the end of the code block,
dropped the failure rate to ~2%.  Finally, removing the log collection
hack allowed the test suite to run ~1000 times without a single failure.

Taken in aggregate, these changes make the batch backend more deterministic
and should put the nail in the coffin of this flaky test.

Closes #3475
2020-11-18 10:49:25 +00:00
..
acm Tech Debt - Remove duplicate AWSError classes 2020-11-05 11:20:18 +00:00
apigateway List dependencies for services - add integration test to verify 2020-09-13 16:08:23 +01:00
applicationautoscaling Linting 2020-11-11 15:55:37 +00:00
athena Implemented Athena create_named_query, get_named_query (#1524) (#3065) 2020-06-11 17:27:29 +01:00
autoscaling Linting 2020-11-11 15:55:37 +00:00
awslambda fixed issue in update_configuration for lambda when setting VPC config property (#3479) 2020-11-18 08:45:31 +00:00
batch Fix Race Condition in batch:SubmitJob (#3480) 2020-11-18 10:49:25 +00:00
cloudformation Linting 2020-11-11 15:55:37 +00:00
cloudwatch Fix: Adding alarm arn to describe alarms response (#3409) 2020-11-02 08:56:18 +00:00
codecommit List dependencies for services - add integration test to verify 2020-09-13 16:08:23 +01:00
codepipeline List dependencies for services - add integration test to verify 2020-09-13 16:08:23 +01:00
cognitoidentity #2800 - CognitoIdentity - Fix format of Identity ID 2020-04-04 14:09:38 +01:00
cognitoidp added cognito idp function admin_set_user_password to the code (#3328) 2020-09-21 18:40:07 +01:00
config Linting 2020-11-11 15:55:37 +00:00
core Fix failures with latest responses library (0.12.1) (#3466) 2020-11-16 07:20:33 +00:00
datapipeline Iam cloudformation update, singificant cloudformation refactoring (#3218) 2020-08-27 10:11:47 +01:00
datasync Add missing regions to all services 2019-12-26 17:12:22 +01:00
dynamodb Decentralize cloudformation naming responsibilities (#3201) 2020-08-01 15:23:36 +01:00
dynamodb2 Add support for empty strings in non-key dynamo attributes (#3467) 2020-11-17 09:12:39 +00:00
dynamodbstreams Prevent JSON dumps error when dealing with complex types 2020-04-06 17:21:26 +10:00
ec2 Improve ec2:DescribeSubnets filtering (#3457) 2020-11-16 08:17:36 +00:00
ec2instanceconnect Fix deprecation warnings due to invalid escape sequences. (#3273) 2020-09-10 09:20:26 +01:00
ecr ecr: Fix "imageDigest" value in ecr.list_images() response (#3436) 2020-11-05 14:10:23 +00:00
ecs Fix missing properties when ecs:TaskDefinition created via CloudFormation (#3378) 2020-10-12 20:53:30 +01:00
elasticbeanstalk ElasticBeanstalk - Fix tests in Python2 and ServerMode 2020-03-30 16:28:36 +01:00
elb Decentralize cloudformation naming responsibilities (#3201) 2020-08-01 15:23:36 +01:00
elbv2 Iam cloudformation update, singificant cloudformation refactoring (#3218) 2020-08-27 10:11:47 +01:00
emr Added support for EMR Security Configurations and Kerberos Attributes. (#3456) 2020-11-17 10:54:34 +00:00
events EventBridge: put_rule and list_rules should store and retrieve EventBusName property (#3472) 2020-11-17 15:36:17 +00:00
forecast Refactor Forecast to also use shared AWSError class 2020-11-06 16:34:09 +00:00
glacier Fixed linter errors 2019-12-26 21:03:49 +01:00
glue change code style to pass black --check 2020-04-21 22:34:05 +02:00
iam Fix: Return Tags in iam:CreateUserResponse 2020-11-09 14:59:06 -08:00
instance_metadata Run black on moto & test directories. 2019-10-31 10:36:05 -07:00
iot iot:DeleteThingGroup should return success even for non-existent groups (#3367) 2020-10-09 15:57:00 +01:00
iotdata Back to Black 2020-11-10 14:12:38 +01:00
kinesis Add kinesisvideo (#3271) 2020-09-02 08:51:51 +01:00
kinesisvideo Add kinesisvideo archived media (#3280) 2020-09-04 12:14:48 +01:00
kinesisvideoarchivedmedia Add kinesisvideo archived media (#3280) 2020-09-04 12:14:48 +01:00
kms List dependencies for services - add integration test to verify 2020-09-13 16:08:23 +01:00
logs Fix: nextToken value in logs:DescribeLogGroups response (#3398) 2020-10-21 09:47:09 +01:00
managedblockchain Fix deprecation warnings due to invalid escape sequences. (#3273) 2020-09-10 09:20:26 +01:00
opsworks Fix the online status in OpsWorks 2020-05-07 10:57:27 +03:00
organizations added organizations detach_policy response, model, and tests, issue #… (#3278) 2020-09-25 16:55:29 +01:00
packages Back to Black 2020-11-10 14:12:38 +01:00
polly Fixed linter errors 2019-12-26 21:03:49 +01:00
ram RAM - implement CRUD endpoints (#3158) 2020-07-21 14:15:13 +01:00
rds Fix deprecation warnings due to invalid escape sequences. (#3273) 2020-09-10 09:20:26 +01:00
rds2 Fix: TagList missing in rds:DescribeDBInstance response (#3459) 2020-11-16 09:30:53 +00:00
redshift Decentralize cloudformation naming responsibilities (#3201) 2020-08-01 15:23:36 +01:00
resourcegroups Fix resource groups tests (#3204) 2020-07-31 07:18:52 +01:00
resourcegroupstaggingapi Add ec2.vpc resource support to Tagging API (#3375) 2020-10-10 19:05:21 +01:00
route53 Fix XML encoding in Route53 JInja2 Templates #3469 (#3473) 2020-11-18 07:23:49 +00:00
s3 Linting 2020-11-11 15:55:37 +00:00
s3bucket_path Run black on moto & test directories. 2019-10-31 10:36:05 -07:00
sagemaker Linting 2020-11-11 15:55:37 +00:00
secretsmanager Fix: describe/list attribute discrepancy in Secrets Manager (#3432) 2020-11-03 14:18:56 +00:00
ses SES: Fix sending email when use verify_email_address (#3242) 2020-08-25 13:51:58 +01:00
sns Fix: SNS Delete subscriptions on topic deletion (#3410) 2020-10-29 08:52:02 +00:00
sqs Fix SQS md5 attribute hashing. (#3403) 2020-10-27 12:13:47 +00:00
ssm Add ssm:SendCommand support for instance tag Targets 2020-11-08 00:06:35 -08:00
stepfunctions Merge pull request #3439 from bblommers/techdebt-remove-duplicate-awserrors 2020-11-09 19:59:02 -06:00
sts Add AssumeRoleWithSAML response to responses.py. 2020-04-16 11:47:30 -07:00
swf Add SWF domain and type undeprecation 2020-03-05 23:37:17 +10:00
templates Add about page. 2017-03-12 19:58:40 -04:00
transcribe Transcribe Medical Support (#3299) 2020-09-30 13:18:26 +01:00
utilities Rename DockerUtilities to differentiate from docker-dependency 2020-11-09 16:31:18 +00:00
xray Tech Debt - Remove duplicate AWSError classes 2020-11-05 11:20:18 +00:00
__init__.py Adds some basic endpoints for Amazon Forecast (#3434) 2020-11-06 08:23:47 +00:00
backends.py Adds some basic endpoints for Amazon Forecast (#3434) 2020-11-06 08:23:47 +00:00
compat.py Fix linter errors. 2019-12-17 21:35:52 +05:30
server.py Enable CORS from everywhere using flask-cors. (#3316) 2020-09-19 10:07:17 +01:00
settings.py Run black on moto & test directories. 2019-10-31 10:36:05 -07:00