Avoid relying on a hashtable iteration order. #3960

chandlerc · 2024-05-20T01:17:04Z

The toolchain was iterating a map of package name to import list in a couple of places to do things beyond counting or other order-invariant operations. The result was that the specific hashtable iteration order influenced the order of SemIR generated (and other behaviors like diagnostic emission I suspect, but the source location sorting probably hid this). Switching to a hashtable implementation with any order seeding that changes run-to-run immediately shows the SemIR order fluctuating without this.

This PR fixes that by instead accumulating the package imports data in a vector and using a map to vector indices. This is also slightly more efficient, although that seems unlikely to be an important factor here.

There was only one insertion point so I've just hand coded the management of the indices and map, but happy to take a different approach or use an abstraction here if desired.

One awkward aspect of this is that some of the loops need access to the package name as well. I've just added storage for that as these seem unlikely to be huge arrays of 10s of 1000s of imported packages, so the double storage of the identifier ID seems likely OK. But again, happy to take a different approach here if desired. It also seems like it might be possible to work out the identifier from the node, but I kept the patch more direct for simplicity.

I've also not used the LLVM MapVector abstraction of this pattern. This was mostly to avoid adding another layer of abstractions to our data structures, and because this is the first time we've hit this really. My experience is also that it is reasonably often that there is a more efficient way to orient the vector and map than what is automatically provided. But that may just be my experience.

The toolchain was iterating a map of package name to import list in a couple of places to do things beyond counting or other order-invariant operations. The result was that the specific hashtable iteration order influenced the order of SemIR generated (and other behaviors like diagnostic emission I suspect, but the source location sorting probably hid this). Switching to a hashtable implementation with any order seeding that changes run-to-run immediately shows the SemIR order fluctuating without this. This PR fixes that by instead accumulating the package imports data in a vector and using a map to vector indices. This is also slightly more efficient, although that seems unlikely to be an important factor here. There was only one insertion point so I've just hand coded the management of the indices and map, but happy to take a different approach or use an abstraction here if desired. One awkward aspect of this is that some of the loops need access to the package name as well. I've just added storage for that as these seem unlikely to be huge arrays of 10s of 1000s of imported packages, so the double storage of the identifier ID seems likely OK. But again, happy to take a different approach here if desired. It also seems like it might be possible to work out the identifier from the node, but I kept the patch more direct for simplicity. I've also not used the LLVM `MapVector` abstraction of this pattern. This was mostly to avoid adding another layer of abstractions to our data structures, and because this is the first time we've hit this really. My experience is also that it is reasonably often that there is a more efficient way to orient the vector and map than what is automatically provided. But that may just be my experience.

jonmeow

Sorry, thanks!

The toolchain was iterating a map of package name to import list in a couple of places to do things beyond counting or other order-invariant operations. The result was that the specific hashtable iteration order influenced the order of SemIR generated (and other behaviors like diagnostic emission I suspect, but the source location sorting probably hid this). Switching to a hashtable implementation with any order seeding that changes run-to-run immediately shows the SemIR order fluctuating without this. This PR fixes that by instead accumulating the package imports data in a vector and using a map to vector indices. This is also slightly more efficient, although that seems unlikely to be an important factor here. There was only one insertion point so I've just hand coded the management of the indices and map, but happy to take a different approach or use an abstraction here if desired. One awkward aspect of this is that some of the loops need access to the package name as well. I've just added storage for that as these seem unlikely to be huge arrays of 10s of 1000s of imported packages, so the double storage of the identifier ID seems likely OK. But again, happy to take a different approach here if desired. It also seems like it might be possible to work out the identifier from the node, but I kept the patch more direct for simplicity. I've also not used the LLVM `MapVector` abstraction of this pattern. This was mostly to avoid adding another layer of abstractions to our data structures, and because this is the first time we've hit this really. My experience is also that it is reasonably often that there is a more efficient way to orient the vector and map than what is automatically provided. But that may just be my experience.

github-actions bot requested a review from jonmeow May 20, 2024 01:17

github-actions bot added the toolchain label May 20, 2024

jonmeow approved these changes May 20, 2024

View reviewed changes

jonmeow added this pull request to the merge queue May 20, 2024

Merged via the queue into carbon-language:trunk with commit bc370a7 May 20, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid relying on a hashtable iteration order. #3960

Avoid relying on a hashtable iteration order. #3960

chandlerc commented May 20, 2024

jonmeow left a comment

Avoid relying on a hashtable iteration order. #3960

Avoid relying on a hashtable iteration order. #3960

Conversation

chandlerc commented May 20, 2024

jonmeow left a comment

Choose a reason for hiding this comment