Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid relying on a hashtable iteration order. #3960

Merged
merged 1 commit into from
May 20, 2024

Conversation

chandlerc
Copy link
Contributor

The toolchain was iterating a map of package name to import list in a couple of places to do things beyond counting or other order-invariant operations. The result was that the specific hashtable iteration order influenced the order of SemIR generated (and other behaviors like diagnostic emission I suspect, but the source location sorting probably hid this). Switching to a hashtable implementation with any order seeding that changes run-to-run immediately shows the SemIR order fluctuating without this.

This PR fixes that by instead accumulating the package imports data in a vector and using a map to vector indices. This is also slightly more efficient, although that seems unlikely to be an important factor here.

There was only one insertion point so I've just hand coded the management of the indices and map, but happy to take a different approach or use an abstraction here if desired.

One awkward aspect of this is that some of the loops need access to the package name as well. I've just added storage for that as these seem unlikely to be huge arrays of 10s of 1000s of imported packages, so the double storage of the identifier ID seems likely OK. But again, happy to take a different approach here if desired. It also seems like it might be possible to work out the identifier from the node, but I kept the patch more direct for simplicity.

I've also not used the LLVM MapVector abstraction of this pattern. This was mostly to avoid adding another layer of abstractions to our data structures, and because this is the first time we've hit this really. My experience is also that it is reasonably often that there is a more efficient way to orient the vector and map than what is automatically provided. But that may just be my experience.

The toolchain was iterating a map of package name to import list in
a couple of places to do things beyond counting or other order-invariant
operations. The result was that the specific hashtable iteration order
influenced the order of SemIR generated (and other behaviors like
diagnostic emission I suspect, but the source location sorting probably
hid this). Switching to a hashtable implementation with any order
seeding that changes run-to-run immediately shows the SemIR order
fluctuating without this.

This PR fixes that by instead accumulating the package imports data in
a vector and using a map to vector indices. This is also slightly more
efficient, although that seems unlikely to be an important factor here.

There was only one insertion point so I've just hand coded the
management of the indices and map, but happy to take a different
approach or use an abstraction here if desired.

One awkward aspect of this is that some of the loops need access to the
package name as well. I've just added storage for that as these seem
unlikely to be huge arrays of 10s of 1000s of imported packages, so the
double storage of the identifier ID seems likely OK. But again, happy to
take a different approach here if desired. It also seems like it might
be possible to work out the identifier from the node, but I kept the
patch more direct for simplicity.

I've also not used the LLVM `MapVector` abstraction of this pattern.
This was mostly to avoid adding another layer of abstractions to our
data structures, and because this is the first time we've hit this
really. My experience is also that it is reasonably often that there is
a more efficient way to orient the vector and map than what is
automatically provided. But that may just be my experience.
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, thanks!

@jonmeow jonmeow added this pull request to the merge queue May 20, 2024
Merged via the queue into carbon-language:trunk with commit bc370a7 May 20, 2024
9 checks passed
CJ-Johnson pushed a commit to CJ-Johnson/carbon-lang that referenced this pull request May 23, 2024
The toolchain was iterating a map of package name to import list in a
couple of places to do things beyond counting or other order-invariant
operations. The result was that the specific hashtable iteration order
influenced the order of SemIR generated (and other behaviors like
diagnostic emission I suspect, but the source location sorting probably
hid this). Switching to a hashtable implementation with any order
seeding that changes run-to-run immediately shows the SemIR order
fluctuating without this.

This PR fixes that by instead accumulating the package imports data in a
vector and using a map to vector indices. This is also slightly more
efficient, although that seems unlikely to be an important factor here.

There was only one insertion point so I've just hand coded the
management of the indices and map, but happy to take a different
approach or use an abstraction here if desired.

One awkward aspect of this is that some of the loops need access to the
package name as well. I've just added storage for that as these seem
unlikely to be huge arrays of 10s of 1000s of imported packages, so the
double storage of the identifier ID seems likely OK. But again, happy to
take a different approach here if desired. It also seems like it might
be possible to work out the identifier from the node, but I kept the
patch more direct for simplicity.

I've also not used the LLVM `MapVector` abstraction of this pattern.
This was mostly to avoid adding another layer of abstractions to our
data structures, and because this is the first time we've hit this
really. My experience is also that it is reasonably often that there is
a more efficient way to orient the vector and map than what is
automatically provided. But that may just be my experience.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants