#sudo R1: Teaching Robots to Act, Starting from Simulation Alone
Embodied AI has learned to think, and is beginning to act — but not yet reliably. Large language models plan multi-step tasks, parse complex instructions, and reason about the physical world. But manipulation remains fragile: no existing system can reliably grasp unfamiliar objects across the full diversity of real-world conditions. Until that changes, the vast economic promise of physical AI — in manufacturing, logistics, agriculture, and eldercare — stays locked behind a capability that no amount of high-level intelligence can substitute for.
We introduce #sudo R1, a fully integrated robot system with self-developed hardware and software, powered by a manipulation-centric foundation model focused on object picking — the gateway primitive of physical manipulation. Nearly every useful manipulation task begins with a pick, so we operate on a simple belief: if picking doesn’t work reliably across the long tail of real-world objects, then many downstream tasks remain out-of-reach.
Our key results:
True production-grade performance remains ahead. Achieving any one of generalizability, agility, robustness, or spatial intelligence in isolation is already hard; achieving them simultaneously in a single policy is a fundamentally different challenge — and that is what #sudo R1 is built to pursue.
Overall Success Rate : 100.00%
What #sudo R1 Delivers — and Why It’s Harder Than It Looks
Zero-Shot Generalization Across Diverse Objects with Near-Perfect Robustness
Real environments — warehouses, kitchens, factory floors — present an effectively open-ended distribution of objects that no model has seen before. #sudo R1 successfully picks diverse objects never encountered during training, spanning rigid and deformable, opaque and transparent, matte and reflective — including transparent glass, soft fabric, reflective metal, and irregularly shaped items. One single model handles all of them, with no fine-tuning and no per-object adaptation.
Reliable deployment requires consistent performance regardless of visual conditions. We evaluated #sudo R1 under controlled lighting variations and used a TV screen behind the workspace to simulate a wide range of dynamic backgrounds. Across these conditions, pick success rates remained near-identical, with no environment-specific calibration or fine-tuning. This robustness stems from the massive visual randomization applied during simulation training: by exposing the policy to diverse lighting and background distributions in simulation, it learns to rely on the geometric and physical cues relevant to grasping rather than overfitting to any particular visual context.
True Closed-Loop Agility
#sudo R1 has a fully closed-loop policy in which every control step is conditioned on the robot’s latest observation. Compared with many existing VLA models, this design enables more frequent feedback during execution. Many prior approaches employ action chunking, where the model processes an observation, predicts a sequence of future actions, and executes that sequence before taking the next observation. For example, a system that nominally runs at 20 Hz with 20-step chunks effectively observes the environment only once per second during execution.
#sudo R1 makes every step observation-conditioned, at 15–25 Hz adaptive to the situation. This is what enables behaviors that many action-chunking architectures struggle with: tracking a target object as it moves, recovering from a perturbation mid-grasp, and adapting the movement trajectory when the scene changes during execution. It is also what makes the system fast enough to operate at production-relevant speeds.
Spatial Intelligence
#sudo R1 adapts its trajectory when surrounding objects or structures constrain the feasible approach — navigating around obstacles, avoiding collisions, and exploiting available free space. This is not a separate collision-avoidance module layered on top; it is an integrated capability of the learned policy.
Why Simulation Is the Answer That Existing Systems Miss
The field has made significant progress on generalization, dexterity, robustness, and high-frequency control individually. Achieving them simultaneously is the open problem — and the binding constraint is data. Exclusively relying on real-world collection is too slow, too expensive, and too narrow to cover the full distribution of object variation, systematically construct adversarial conditions, and generate dense obstacle-rich scenes at the scale that all four axes demand together. Simulation removes that constraint by scaling along all dimensions at once.
#sudo R1 is trained entirely on simulation data — no real-world demonstrations, no teleoperation, no manual labeling. The result is a system whose capability improves by generating more data, not by just scaling human labor — a fundamentally different scaling curve.
This is hard to do. The robotics community has long recognized simulation’s potential, but genuine zero-real-data transfer for contact-rich manipulation, at the reliability we report, requires closing every gap in the sim-to-real chain — physics fidelity, contact modeling, domain randomization, sensor simulation — simultaneously. These capabilities took years of dedicated engineering to build. We consider this a core differentiator, both in the performance it enables today and in the compounding cost and iteration speed advantages it creates over time.
Training Across the Full Distribution
#sudo R1 demonstrates that simulation alone — with no real-world demonstrations — can produce manipulation policies that approach production-grade reliability across generalizability, agility, robustness, and spatial intelligence simultaneously. We chose picking as the proving ground because it is the gateway primitive.
Picking is only the beginning. We are extending #sudo R1 to more and more skills — leveraging the same simulation-first paradigm that makes it possible to train across the full distribution of variation at scale.



























