feat: compute SHA256 ETag in cache backend writers#329
Conversation
Add a reusable etagWriter that computes a SHA256 hash of written content and sets a quoted hex-encoded ETag header. Each backend (disk, memory, S3) hashes data in Write() and sets the ETag on Close(). S3 uses a post-upload CopyObject metadata-replace since headers are serialised before content is fully written. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6930f87239
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ReplaceMetadata: true, | ||
| Expires: expiresAt, | ||
| } | ||
| if _, err := s.client.CopyObject(ctx, dst, src); err != nil { |
There was a problem hiding this comment.
Avoid CopyObject for large S3 cache entries
When the S3 backend stores an object larger than 5 GB, this post-upload CopyObject call has to copy the object to itself just to add the computed ETag. AWS S3's single CopyObject API rejects objects over 5 GB (multipart copy is required), while the PutObject path above can upload large streaming/multipart objects, so Close now returns an error after the object has already been uploaded and leaves the cache entry in S3 without the computed headers. Set the ETag before upload or use a multipart-copy/update path for large objects.
Useful? React with 👍 / 👎.
| } | ||
| userMetadata["Headers"] = string(headersJSON) | ||
|
|
||
| src := minio.CopySrcOptions{Bucket: s.config.Bucket, Object: objectName} |
There was a problem hiding this comment.
Pin the metadata copy to this upload
When two writers for the same key close concurrently, this source points at whatever object is current at objectName, not necessarily the object this writer just uploaded. If another PutObject wins the key between this writer's upload and this copy-to-self, this copy can preserve the other writer's bytes while replacing its metadata with this writer's SHA256 ETag/TTL, so cached hits can return an ETag that does not match the body. Capture the uploaded version/ETag and use it as a source precondition, or avoid the second copy.
Useful? React with 👍 / 👎.
Add a reusable etagWriter that computes a SHA256 hash of written
content and sets a quoted hex-encoded ETag header. Each backend
(disk, memory, S3) hashes data in Write() and sets the ETag on
Close(). S3 uses a post-upload CopyObject metadata-replace since
headers are serialised before content is fully written.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com