[DSC] Add a Strings table to the shared cache triage view#8277
Conversation
The table displays strings from every image in the shared cache. Double-clicking on a string loads its corresponding image or region and navigates to it. The entire shared cache is scanned asynchronously on the worker pool to discover strings when the Strings tab is first selected. Table filtering and sorting is performed on background threads due to the sheer amount of data this table contains (over 15 millions of rows). Data is discarded shortly after the strings table is hidden.
This gives it asynchronous loading, filtering, and sorting, along with lazy loading and releasing the table data after the view is hidden.
fuzyll
left a comment
There was a problem hiding this comment.
Lots of questions and a few potential bugs. Ironically, the two things I am most concerned about aren't even in these code changes and are just things I saw when scrolling around and reading. 🙃
Good work regardless, thanks for taking this on. 🙂
| size_t comparisons = 0; | ||
| try | ||
| { | ||
| std::sort(display.begin() + chunk.first, display.begin() + chunk.second, guarded(comparisons)); |
There was a problem hiding this comment.
Would it be possible to have non-deterministic ordering here? If I'm following this correctly, results will arrive in the order in which workers complete. Since there's no tie-break on address/length/something else, wouldn't this have ties ordered by worker completion?
(Even if I'm right, I don't know how often this would occur, whether this would be noticeable, etc. Just an observation.)
| // leaves the shared result empty, which commitSortJob never reads. | ||
| auto future = QtConcurrent::run( | ||
| [this, filterGeneration, sortGeneration, comparator, result, display = m_display]() mutable { | ||
| if (auto sorted = parallelSort(std::move(display), m_rows, comparator, filterGeneration, |
There was a problem hiding this comment.
Because parallelSort uses blockingMap, if we're thread-starved, I think this could wind up in a state where this run would occupy one thread and then not be able to start the inner thread for parallelSort? Does Qt handle this situation for us? I genuinely don't know how QtConcurrent works and https://doc.qt.io/qt-6/qtconcurrentmap.html wasn't super helpful when trying to figure this out.
| const uint64_t filterGeneration = m_filterGeneration.load(); | ||
| PredicateFactory factory = m_filterFactory; | ||
|
|
||
| constexpr size_t chunkSize = 1024 * 1024; |
There was a problem hiding this comment.
Should this be defined somewhere more visible/configurable?
| raw_length: int | ||
| text: str | ||
| region_start: int | ||
| image_start: int |
There was a problem hiding this comment.
In sharedcache.cpp, you state that "A zeroed imageStart means the region is not associated with an image". Should this detail be surfaced to Python as well in some capacity (even if just a doc comment)?
| case Utf16String: | ||
| { | ||
| char* converted = BNUnicodeUTF16ToUTF8(data, std::min(ref.length, maxLength * 2)); | ||
| std::string text(converted); |
There was a problem hiding this comment.
If we fail to convert here, we'd try to construct a std::string from a nullptr. This seems like it might only happen if we fail to allocate the string? Which, is probably not likely to happen, but we still might want to check.
This caused me to feel like a fraud because it turns out I have no idea what actually happens if you try and construct a string from nullptr. Fortunately, it appears C++ also had no idea what should happen, but C++23 appears to be fixing this.
| if (!region.has_value()) | ||
| return; | ||
|
|
||
| QPointer<QFutureWatcher<bool>> watcher = new QFutureWatcher<bool>(this); |
There was a problem hiding this comment.
Do we leak this watcher here?
| } | ||
|
|
||
| dialog->exec(); | ||
| promptToLoadImage(image->name, string.address, string.address); |
There was a problem hiding this comment.
As far as I can tell, this works just fine. But, I was surprised it wasn't a bug. I think this could be clearer/more correct if we passed *string.imageStart or image->headerAddress since we're loading the image.
| { | ||
| case Qt::DisplayRole: | ||
| { | ||
| const auto& string = stringAt(index.row()); |
There was a problem hiding this comment.
Is checking .isValid() enough to prevent issues here? https://doc.qt.io/qt-6/qmodelindex.html#isValid suggests this just means it's non-negative, not that it actually fits within the correct upper bound. I don't know that this could ever crash, but it might be possible to return wrong results if the model is ever stale?
| { | ||
| auto symbol = symbolAt(index.row()); | ||
| auto symbolType = GetSymbolTypeAsString(symbol.type); | ||
| const auto& symbol = symbolAt(index.row()); |
There was a problem hiding this comment.
Same comment here regarding .isValid.
| QString ImageNameLookup::widestImageName() const | ||
| { | ||
| QString widest; | ||
| for (const auto& [_, name] : m_state->imageNames) |
There was a problem hiding this comment.
The image column can display region names for non-image rows, but we only consider image names for sizing. In practice, this is probably fine..?
The table displays strings from every image in the shared cache. Double-clicking on a string loads its corresponding image or region and navigates to it.
The entire shared cache is scanned asynchronously on the worker pool to discover strings when the Strings tab is first selected. Table filtering and sorting is performed on background threads due to the sheer amount of data this table contains (over 15 millions of rows). Data is discarded shortly after the strings table is hidden.
This change introduces a
BackgroundSortFilterRowshelper class that handles sorting and filtering of flat UI models on background threads, and aTriageTablePanelthat encapsulates a lot of the logic that is common across DSC's different panes. These are both used by the new strings table, and I updated the DSC's symbols table to use them as well.