DinoV2 with registers backbone, transformer decoder, classifier free guidance film layers training script
☘️ Shoot an email to sebastian@mbodi.ai if you'd like to tackle this issue and I'll help as often as I can. Can provide A100 access once script is ready.
Starter Code
Example Doing Identical task but with MaxViT
Resources
Highly-Recommended Guide to Follow
Transformer Head Code
DinoV2 Source Code
Text Guidance with Film
RT1: Robotics Transformers paper
Tokenize Actions (x, y, z, roll, pitch, yaw, grasp)
Transform pattern: (b frames action) -> (b f a bins), bins=255
This is just simple classification not sequence to sequence modeling
-
Apply MinMax Scaler
-
Apply kbins
Apply film layers from classifier-free-guidance
Inference pattern: (b f c h w ), str --> (b f a bins)
Example Doing Identical task but with MaxViT
Details
Use the following losses:
Follow-On Work
DinoV2 with registers backbone, transformer decoder, classifier free guidance film layers training script
☘️ Shoot an email to sebastian@mbodi.ai if you'd like to tackle this issue and I'll help as often as I can. Can provide A100 access once script is ready.
Starter Code
Example Doing Identical task but with MaxViT
Resources
Highly-Recommended Guide to Follow
Transformer Head Code
DinoV2 Source Code
Text Guidance with Film
RT1: Robotics Transformers paper
Tokenize Actions (x, y, z, roll, pitch, yaw, grasp)
Transform pattern: (b frames action) -> (b f a bins), bins=255
This is just simple classification not sequence to sequence modeling
Apply MinMax Scaler
Apply kbins
Apply film layers from classifier-free-guidance
Inference pattern: (b f c h w ), str --> (b f a bins)
Example Doing Identical task but with MaxViT
Details
Use the following losses:
Follow-On Work