Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to drop last n keyframes for delta_timestamps #129

Closed
wants to merge 3 commits into from

Conversation

alexander-soare
Copy link
Collaborator

@alexander-soare alexander-soare commented May 3, 2024

What does this PR do?

Fixes a discrepancy between data loading in the original diffusion policy work and LeRobot.

The functionality is summarized in diffusion.yaml. I will copy it here verbatim for reference:

  # For each training batch we want (consider n_obs_steps=2, horizon=16):
  # t           | -1,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14
  # action      |  a,  a,  a,  a,  a,  a,  a,  a,  a,  a,  a,   a,  a,  a,  a,  a
  # observation |  o,  o,   ,   ,   ,   ,   ,   ,   ,   ,   ,    ,   ,   ,   ,
  # Note that at rollout we only use some of the actions (consider n_action_steps=8):
  # action used |   ,  a,  a,  a,  a,  a,  a,  a,  a,   ,   ,    ,   ,   ,   ,
  delta_timestamps:
    observation.image: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1)]"
    observation.state: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1)]"
    action: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1 - ${policy.n_obs_steps} + ${policy.horizon})]"

  # The original implementation doesn't sample keyframes for the last 7 steps. This is because, as described
  # above, the last 7 actions from the diffusion model are not used.
  n_end_keyframes_dropped: ${policy.horizon} - ${policy.n_action_steps} - ${policy.n_obs_steps} + 1

How was it tested?

This change improves eval metrics slightly (blue = new).

image

I also ran a whole epoch of data loading with LeRobot and the original diffusion policy code side by side, checking batch equality with inter-process communication.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.


This change is Reviewable

@alexander-soare alexander-soare force-pushed the alexander-soare/add_drop_last_keyframes branch from 5ac046b to b699a2f Compare May 5, 2024 17:50
@Cadene Cadene added ✨ Enhancement New feature or request 🧠 Policies Something policies-related labels May 27, 2024
# we drop those indices pertaining to the last n frames of each episode.
self.index = []
for from_ix, to_ix in zip(*self.episode_data_index.values(), strict=True):
self.index.extend(list(range(from_ix, to_ix - n_end_keyframes_dropped)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leave self.hf_dataset["index'] and self.hf_dataset["episode_index"] in an inconsistent state with self.index and self.episode_data_index.

One problem is that we are duplicating the information that exists on the hf_dataset, namely hf_dataset["index"] so that we can mutate it. So instead of LeRobotDataset being a lightweight wrapper around an hf_dataset we now introduce this new state in between (self.index) that the maintainers and the users will have to know and remember about.

Another problem is that what should self.episode_data_index now reflect? self.episode_data_index now holds info for the data in hf_dataset but not for self.index, this can be confusing.

And the dependencies on this new introduced state only cascade down, where num_samples no longer depends directly on self.hf_dataset, etc.

So the intentions here are very good -- we want to make the life better for the users (so that they can train better models) and in some sense this implementation adds the new functionality in as straightforward way as possible, but the side effect here is that the complexity of the LeRobotDataset grows quite significantly and it is not clear anymore what what depends on. This can lead to subtle bugs and will make extending the functionality much harder down the road (and also new users to the codebase will have a much harder time understanding how things work -- for instance, what is the difference between self.hf_dataset["index"] and self.index and under what conditions they are different? Yeah, I know the answers to these questions but that is because I spent a good bit of time with this code now, but for new folks that might be quite challenging)

One potential way of tackling this might be to attack it via sampling -- in the same way as we have drop_last on the pytorch DataLoader. We could have a custom sampler we could pass to the dataloader that would drop the last n examples from each episode based on the episode_data_index.

The advantage here is that we would not be adding complexity to the LeRobotDataset. It would remain truer to its original intention, that is not as a way to modify data but as a way to add to a HuggingFace dataset in order to cater to the needs of training policies for controlling robots. But the dependency would be one way -- LeRobotDataset would depend on the hf_dataset it gets passed but the dependency would not go the other way. For all intents and purposes data contained in the hf_dataset would remain immutable, we would not be modifying the data in the hf_dataset nor would we be creating some state standing in for aspects of the hf_dataset. Our users would know where the single point of truth always is.

And also the full functionality would be localized to a Sampler. Yeah, if users would want to make avail of this functionality they would need to know of the existence of the Sampler (maybe a good place to introduce this would be as an example in the doc string for DiffusionPolicy) but we would not be adding extra complexity to the LeRobotDataset and arguably might be fighting complexity in the codebase via reducing coupling and doing something like dependency injection at the level of the dataloader.

Sorry, these are just some loose thoughts, though I understand that this problem is not easy to solve. But maybe some of this might be helpful.

I'll keep thinking about this and if anything comes to mind will comment again in his PR 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, here is a SubsetRandomSampler from pytorch.

Might be nice to inherit from it and instatiate it with episode_data_index and n_end_keyframes_dropped. A minimal implementation might be utils function that would return an index from a LeRobotDataset with the n_end_keyframes_dropped excluded.

Anyhow, sorry 🙂 Not sure if any of this is helpful, I might be completely off the mark with the intention behind this PR.

Either way, it is very good to know of this difference vs the original training on pusht, so thank you for pointing me to this PR, @alexander-soare! 🙌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radekosmulski I find your arguments very convincing :) Would you be interested in having a crack at the sampler approach by any chance? cc @Cadene who had this on his TODO list at some point but it got shoved back by other prios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing @alexander-soare, will give it a go! 🙂

@alexander-soare alexander-soare marked this pull request as draft May 30, 2024 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ Enhancement New feature or request 🧠 Policies Something policies-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants