-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Solving Pick-Cube from Pixels Only #667
Comments
Sorry we have not tuned SAC at the moment, only PPO with some proprioception data + one RGB camera. There is some example code with state based SAC, a simple vision based one will come eventually. TD-MPC2 is already integrated and supports learning from pixels, does need much tuning. If there's a lot of value in testing algorithms with visual only inputs we can try and help set it up in the future, we have some DM control environments benchmarked with PPO with an option to use visual only inputs. |
I see, thanks for letting me know! I think having some baselines of end-to-end pixel to action policies would be useful. I am currently using SAC for my project but may also try out other algos in the future. |
Is GPU parallelization important in your case? Or are you working more on e.g. sample-efficiency. I can have some members on the team look to try and tune a RGB/RGBD SAC version. |
It's not important, but if it makes policy convergence faster I'm for GPU parallelization. Sample efficiency is not an issue atm. I appreciate you all looking into this! |
Hey! Just wanted to check in and see if this is in the pipeline and if so, if you guys have an expected release date on it. Thanks! |
Currently working on it! Fixing up the SAC state and RGBD implementations now. will provide baseline for PickCube and maybe a few other tasks |
Ok @SumeetBatra new baseline uploaded. I only checked it works for PushCube and PickCube from pixels. the suggested script to run
was tested and converged after about 1-1.5 hours on a 4090. The SAC code can run faster if I add torch compile/cudagraphs support and add some shared memory optimization for observation storage but that will be done in the future. 31.mp4tiny 64x64 image in each corner is what the policy sees. Policy also sees any relevant state information (like goal position for the cube and agent joint positions). See the SAC baseline readme: https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac/README.md I'm sure the other tasks work fine with just the same hyperparameters as PickCube training if trained long enough and appropriate controller is used. |
@StoneT2000 Thank you so much!! I'll take a look and follow up if I have any questions. |
@StoneT2000 I had a chance to look over the sac_rgbd baseline and it looks like state information is in there by default. Is it possible to solve the task without having any proprioceptive state information, just rgb(d) observations only? EDIT: For extra context, what I'm trying to avoid is needing a perception pipeline to estimate low dimensional state information when working on real hardware. If any state information is present, ideally it should come from somewhere else like inverse kinematics and not a noisy / brittle perception system. Now that I think about it, joint angles can come from IK, so maybe this solution avoids the need for a perception pipeline? I haven't worked with these systems before, so let me know if I'm misunderstanding something. |
Hi @SumeetBatra So generally when it comes to sim2real / real2sim or testing if something might work in the real world at all, the state data that is accessible and quite accurate is
We by default also give qvel values but these require estimation and are harder to align between sim and real so I would definitely just remove that (you don't need it to solve tasks usually, it might help with sample efficiency at times). If you plan to do sim2real you will need to make modifications to environments for transfer regardless. By default envs in ManiSkill unless stated otherwise are designed more for algorithm benchmarking. Also from image only is quite difficult, although maybe not impossible assuming the goal information is in the images somewhere (For PickCube it is not, but for peg insertion side or StackCube it is in the perceived image data). It is best to always include necessary goal information of the env, as well as qpos values and tcp poses if possible otherwise learning is slower. |
This is really helpful, thanks! What kind of modifications are needed to facilitate sim2real transfer? I'm guessing DR in the form of state observation noise and maybe some physics randomization at a minimum? Anything else I'm missing? And is there some existing pipeline for facilitating sim2real transfer in the repo? FYI I'm not concerned with the sim2real perception gap atm, mostly with sim2real physics gap and unmodeled dynamics. |
Hard to say, our lab is still finishing up some basic reproducible sim2real experiments that we will have relatively ready to share in a month or two I think. It is led by @Xander-Hinrichsen at the moment, he can comment a bit more on his own real experiences. At minimum
Then you easily train a RGB based policy in sim and do direct deployment in the real world for mostly simpler tasks of reaching/pushing/pulling type behaviors. Picking a cube is kind of hard still without more advanced tricks, @Xander-Hinrichsen and I are investigating how to make this as simple as possible without resorting to collecting real world demonstrations or combining RL with imitation learning. |
@Xander-Hinrichsen Wonder if you could comment on what you found works and if you have a pipeline you can share! |
Yes, as Stone has commented, I plan to have my pipeline posted in about a month for the Kochv1.1 arm, and possibly the SO-100 arm if time permits. Both are "affordable" robot arms from lerobot, though the pipeline is built to be extended by arbitrary robot arms in the future. The process is intended to reflect maniskill's fidelity, so only simple randomizations are used, and there is little/none done in regard to sim2real physics gap and unmodeled dynamics:
PickCube specific (for task example):
With these simple randomizations, I've had fairly successful policies zero shot for simple tasks like pickcube, and grab cube, using RGB and qpos observations only, and using the greenscreening overlay Stone mentioned earlier. Randomizations I have tried but found unnecessary thus far:
|
Ok, this is good to know, thanks! I'm guessing you're also using pd_joint_target_delta_pos controller as Stone mentioned? |
Yes, and target_qpos along with the qpos are used within the observations as well, but not qvel |
Alright thanks, I'll give this a try as well |
@Xander-Hinrichsen quick question. How did you go about randomizing the cube sizes? I instantiate the cube object first and then for each actor object try to modify it's rigid body and render body params (specifically half size), but this throws an error:
Is there another way I should be randomizing the cube half_size parameter for each env? EDIT: Could you also let me know what real-world image dataset you use for background randomizations? Thanks! |
A good example of per scene randomization is in the _load_scene function of peginsertionside task |
Ok. Do you have a sim2real training script you could share by any chance. It might be easier for me to see what's going on that way. |
Hey! I wanted to see if you guys had any reference code / hyperparameters for SAC solving any of the tabletop tasks using RGB(D) data only and no proprioceptive state information. Thanks!
The text was updated successfully, but these errors were encountered: