-
Notifications
You must be signed in to change notification settings - Fork 153
Payload limit configuration and validation #1288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bd14d85 to
04a1c6d
Compare
cretz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, this is much cleaner IMO than before, though I still think server limits should be wholly unrelated to SDK ones
| namespace_info = await self._bridge_worker.validate() | ||
| payload_error_limits = ( | ||
| _PayloadErrorLimits( | ||
| memo_upload_error_limit=namespace_info.Limits.memo_size_limit_error, | ||
| payload_upload_error_limit=namespace_info.Limits.blob_size_limit_error, | ||
| ) | ||
| if namespace_info.HasField("limits") | ||
| else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make the general comment that I still think it's confusing for users to understand that we have different error limits/approaches for clients and workers for the same payloads. We're ok w/ server validated sometimes and SDK validated other times in an inconsistent manner.
I still think it would have been worth exploring the alternative of divorcing SDK-side limits from server ones, which only has negative effect for self-hosted users, but we are already breaking compatibility for even cloud users with proxies.
| memo_upload_warning_limit: int = 2 * 1024 | ||
| """The limit (in bytes) at which a memo size warning is logged.""" | ||
|
|
||
| payload_upload_error_disabled: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, reviewing how this works, I was going to first suggest that be called worker_payload_upload_error_disabled (because it has nothing to do with general payload upload), but now I'm wondering if this specific error option should be a worker option, not a data converter one. And if set, just ignore the namespace info that comes back.
I also think you only need one option for disabling the error check, not sure it should be split between memos and not.
I also am a bit concerned at the term "upload" as I think that will have a different meaning with external payload stuff, maybe "create" or "encode"?
| """The limit (in bytes) at which a memo size warning is logged.""" | ||
|
|
||
| payload_upload_error_disabled: bool = False | ||
| """Field indiciating that the payload size checks should be disabled in the SDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """Field indiciating that the payload size checks should be disabled in the SDK. | |
| """Field indicating that the payload size checks should be disabled in the SDK. |
| "[TMPRL1103] Attempted to upload payloads with size that exceeded the warning limit.", | ||
| ) | ||
|
|
||
| def _validate_limits( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this has value as an instance method
| warning_limit: int, | ||
| warning_message: str, | ||
| ): | ||
| total_size = sum(payload.ByteSize() for payload in payloads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically if the server was comparing sizes of the Payloads message wrapper it is 2-3 bytes larger, but that is probably negligible, though still technically backwards incompatible if this sum is like 3.99999 MB heh.
|
|
||
| if warning_limit > 0 and total_size > warning_limit: | ||
| # TODO: Use a context aware logger to log extra information about workflow/activity/etc | ||
| warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider a custom Warning category class here
| total_size = sum(payload.ByteSize() for payload in payloads) | ||
|
|
||
| if error_limit > 0 and total_size > error_limit: | ||
| raise temporalio.exceptions.PayloadSizeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should prepend a 💥 emoji to this PR title and clarify in description it is technically a backwards incompatible change to eagerly error client side on payload failures (even though they can turn it off). It's always good to be safer when pointing out these incompatibilities in our release notes.
Where an error occurs can have significant effect on existing logic, e.g. finally clauses that may never have triggered before (though we recognize the unrecoverability of server-side validation of this before).
Also it should be noted that some users have proxies (which will proxy the namespace capabilities) which already do external storage or encryption today, and they are going to start seeing errors they didn't otherwise which can cause havoc on their workflows that were working just fine today. We need to really get out the message that those users should be disabling this error handling.
There is of course the alternative which is to stop using server information that only helps with half the payloads anyways.
What was changed
Update the SDK to check the size of payload collections and issue warnings and errors when the size exceeds limits:
This is done by:
PayloadLimitsConfigclass for configuring the warning and error limits. The error limit are not defined by default whereas the warning limit is set to 512 KiB.payload_limitsproperty toDataConverterof typePayloadLimitsConfig.DataConverterfor encoding and decoding payloads in all forms. The encoding methods call_validate_payload_limitsbefore returning.DataConveterinstead of usingpayload_codecdirectly.Examples
Log output when an activity attempts to return a result that exceeds the error limit:
Log output when a workflow attempts to provide an activity input that exceeds the error limit:
Note that the above example is missing the extra context that the activity result failure example has. This is due to the available logging infra where these errors are raised and can be fixed separately with some log refactoring (see deferred items).
Log output when a workflow attempts to provide activity input that is above the warning threshold but below the error limit:
Same note about the missing extra context.
Deferred
Work that has been deferred to later PRs (unless requested to pull back in):
Use a newWorkflowTaskFailedCauseto indicate the specific cause of the failure scenario. Pending Add a new workflow failure cause for oversized payloads api#697, integration into sdk-core, and sdk-core into sdk-python._validate_payload_limitsto get rich information about the execution context when issuing a warning._WorkflowWorker::_handle_activationto get rich information about the execution context when issuing a warning when exceeding the payload error limit.Get payload error limits from server to be used as defaults for workers.Why?
Users need to know when payload sizes are approaching or have exceeded size limits. This will help prevent workflow outages and inform users to adjust their workflows to make use of alternate storage methods or to break down their payloads more granularly.
Checklist
Closes Warn if the SDK tried to send a payload above a specific size #1284
Closes SDK should fail workflow task if payloads size it known to be too large #1285
How was this tested: Unit tests
Any docs updates needed? Yes