🐛 Serve current snapshot when fetching by timestamp after latest op#709
Conversation
At the moment, `fetchSnapshotByTimestamp` only fetches the current snapshot directly when the requested timestamp is `null`. For any other timestamp it rebuilds the snapshot from the milestone snapshot plus ops. This means that if the requested timestamp is after the document's latest op (i.e. after the current snapshot's `mtime`), and the ops have since been deleted or TTLed away, the fetch fails with a "Missing ops" error - even though the current snapshot is intact and is exactly the snapshot we should be serving. This change always fetches the current snapshot first, and serves it directly when the requested timestamp is after its `mtime`. As well as fixing the missing-ops case, this avoids replaying ops whenever the timestamp is newer than the current version, at the cost of one extra snapshot lookup in the cases that do still need ops. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Build will be fixed in share/sharedb-mongo#171 |
| var mtime = currentSnapshot.m && currentSnapshot.m.mtime; | ||
| var shouldGetLatestSnapshot = timestamp === null || (mtime != null && timestamp > mtime); | ||
| if (shouldGetLatestSnapshot) { | ||
| // Strip the metadata that we only fetched in order to compare the mtime, | ||
| // so that the returned snapshot is consistent with the op-replayed path. | ||
| currentSnapshot.m = null; | ||
| return callback(null, currentSnapshot); | ||
| } |
There was a problem hiding this comment.
Could this be a performance hit? It looks like we now do an extra round trip (getSnapshot for the current version) on every timestamp request, including the cases where we still end up replaying ops.
I guess it depends on what the expected access pattern is. My intuition (numbers pulled from a hat 🎩) is that ~90% of timestamp requests are for an older point in time, where we'll want to fetch and replay the older ops anyway — so for those we now pay for the current-snapshot fetch on top of the op replay, with no benefit.
That said, I don't have a better alternative for the case this is solving (rebuilding when older ops have been TTLed), so this might just be the necessary trade-off. Mostly flagging it to check the assumption — do we have a sense of how often the requested timestamp is actually after the latest snapshot?
There was a problem hiding this comment.
Yes this is the tradeoff we're making. I think it's basically impossible to get numbers on this, since we have no idea how other consumers are using this.
My feeling is that because this is a historic snapshot fetch:
- You probably don't care too much about speed (fetching arbitrary numbers of ops and replaying them is already quite a slow path)
- Current snapshot fetch should be pretty optimized, and we already do it on every op submission
If we wanted to be super conservative about this change, I guess I could hide it behind an opt-in flag that would leave existing performance untouched, but allow users to be able to fetch snapshots in projects where ops are TTLed. We've done that in the past with sharedb mongo and strict op linking. The downside of this approach is that ShareDB won't work quite as smoothly out-of-the-box, and consumers will have to rummage through documentation to find this flag, which doesn't feel like great developer experience to me.
There was a problem hiding this comment.
We discussed this over a call and we'll go ahead and release without a flag: this is a bugfix and performance may get better or worse depending on use case.
If any consumers find this impacting you badly, please raise an issue with your use-case and we can add a flag to this (or improve in some other way).
At the moment,
fetchSnapshotByTimestamponly fetches the current snapshot directly when the requested timestamp isnull. For any other timestamp it rebuilds the snapshot from the milestone snapshot plus ops.This means that if the requested timestamp is after the document's latest op (i.e. after the current snapshot's
mtime), and the ops have since been deleted or TTLed away, the fetch fails with a "Missing ops" error - even though the current snapshot is intact and is exactly the snapshot we should be serving.This change always fetches the current snapshot first, and serves it directly when the requested timestamp is after its
mtime. As well as fixing the missing-ops case, this avoids replaying ops whenever the timestamp is newer than the current version, at the cost of one extra snapshot lookup in the cases that do still need ops.🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com