DAIMON Robotics Unveils Daimon-Infinity: A Giant Leap in Robotic Touch

DAIMON Robotics, a Hong Kong-based startup, has released the Daimon-Infinity dataset—the world's largest omni-modal robotic dataset for physical AI. This initiative integrates high-resolution tactile sensing with data from over 80 real-world scenarios, aiming to give robot hands a true sense of touch. Backed by partners like Google DeepMind and Northwestern University, the company is open-sourcing 10,000 hours of data to accelerate embodied AI. Below, we explore the technology, vision, and implications behind this breakthrough.

What is the Daimon-Infinity dataset and why is it significant?

The Daimon-Infinity dataset is an unprecedented collection of robotic manipulation data that combines vision, language, action, and high-resolution tactile feedback. It encompasses million-hour scale multimodal data from 80+ real scenarios and 2,000+ human-performed skills, ranging from folding laundry at home to assembly line manufacturing. Its significance lies in providing a comprehensive foundation for training physical AI systems to handle delicate tasks that require a sense of touch—something previous datasets lacked. By open-sourcing 10,000 hours of this data, DAIMON enables researchers worldwide to develop robots that can interact with objects more dexterously and safely, bridging the gap between simulation and real-world application.

DAIMON Robotics Unveils Daimon-Infinity: A Giant Leap in Robotic Touch — Source: spectrum.ieee.org

Who are the key partners and contributors to the Daimon-Infinity project?

The dataset initiative is a collaborative effort involving leading institutions and enterprises. Key partners include Google DeepMind, Northwestern University, and the National University of Singapore. These organizations contribute expertise in AI, robotics, and tactile sensing. The collaboration also extends across China and globally, leveraging a distributed out-of-lab collection network capable of generating millions of hours of data annually. This diverse partnership ensures the dataset captures a wide variety of manipulation tasks and environments, making it robust for training general-purpose robotic systems.

What is DAIMON Robotics' core technology in tactile sensing?

DAIMON's foundational technology is a monochromatic, vision-based tactile sensor that fits within a fingertip-sized module. Despite its small size, it packs over 110,000 effective sensing units (taxels), providing ultra-high-resolution touch feedback. This sensor is designed to capture fine-grained contact information such as pressure, texture, and slippage. The company combines this hardware with its distributed data collection network to generate vast amounts of tactile data. By open-sourcing part of the Daimon-Infinity dataset, they aim to make high-quality tactile data accessible to the broader robotics community, fostering innovation in dexterous manipulation.

Who is Prof. Michael Yu Wang and what is his vision for robotic touch?

Prof. Michael Yu Wang is the co-founder and chief scientist of DAIMON Robotics. He earned his PhD at Carnegie Mellon University studying manipulation under Matt Mason, then founded the Robotics Institute at the Hong Kong University of Science and Technology. An IEEE Fellow and former Editor-in-Chief of IEEE Transactions on Automation Science and Engineering, he brings four decades of experience. His vision is to solve the problem of robot "insensitivity" in manipulation. He argues that current Vision-Language-Action (VLA) models ignore tactile feedback, limiting robots' ability to handle complex tasks. To change this, he pioneered the Vision-Tactile-Language-Action (VTLA) architecture, elevating touch to a modality as important as sight and language.

How does the VTLA architecture differ from traditional VLA models?

Traditional Vision-Language-Action (VLA) models primarily rely on visual and linguistic inputs to guide robotic actions. While effective for many tasks, they lack tactile feedback—the sense of touch—making robots clumsy when handling deformable or fragile objects. Prof. Wang's Vision-Tactile-Language-Action (VTLA) architecture adds tactile sensing as a core modality, on par with vision and language. This allows robots to process touch data—such as pressure, texture, and slip—in real time alongside visual and verbal commands. The result is a more nuanced understanding of physical interaction, enabling robots to perform delicate operations like folding laundry or assembling small parts without crushing or dropping items.

What real-world applications does DAIMON foresee for touch-enabled robots?

DAIMON Robotics envisions touch-enabled robots making early inroads in service industries across China, such as hotels and convenience stores. For example, robots could fold towels, handle food packaging, or restock shelves with delicate items. The high-resolution tactile feedback allows them to adapt to varying object shapes and textures without prior programming. As the Daimon-Infinity dataset trains models on diverse scenarios, these robots will become more reliable in unstructured environments. Ultimately, the technology aims to bridge the gap between current robotic capabilities and the dexterous manipulation needed for domestic chores, healthcare assistance, and precision manufacturing.

Why did DAIMON decide to open-source part of the dataset now?

DAIMON chose to release and open-source 10,000 hours of the Daimon-Infinity dataset to accelerate the real-world deployment of embodied AI. While the company continues its own product development—such as advanced tactile sensors—they recognize that progress in robotics requires community-wide collaboration. By providing a large, high-fidelity tactile dataset, they lower barriers for researchers and startups to train and test their own models. This move also establishes DAIMON as a leader in tactile data standards, potentially attracting more partners to refine the VTLA architecture. The strategy balances proprietary hardware advancements with shared data resources to drive the entire field forward.

Tags: