Robots that need to operate in physical environments they were not explicitly programmed for require training data that reflects the full complexity of human movement in those environments. Just as training a large language model required billions of text documents, training a robot to fold a shirt or navigate a cluttered room requires synchronized recordings of human bodies performing those tasks.
These recordings must be captured from multiple sensor modalities, precisely annotated, and collected at volumes that consumer video libraries or open-source motion datasets cannot supply. Generating synthetic substitutes that generalize to real physical environments remains an unsolved problem at production scale.
The industry response is now visible. Human motion data collection is becoming a structured sector, with instrumented facilities, specialized dataset companies, and regional labor pipelines under active development across Asia.
The market for human motion data has taken shape without disclosure requirements, standardized participant contracts, or established terms for how the residual value of collected data is distributed across the supply chain.
Key Takeaways
- Robotics companies require real-world human motion data at volumes that existing open-source datasets and synthetic generation methods cannot supply.
- China has built instrumented data factory facilities targeting millions of human motion recordings annually, using teleoperation and direct physical demonstration as primary collection methods.
- A Singapore-based physical AI company, Ropedia, has active operational roles posted in Greater Kuala Lumpur, placing Malaysia in the public record as a data collection node.
- Standard consent agreements for motion capture do not address residual rights if data is relicensed, resold, or embedded in multiple commercial applications.
- Malaysia’s 2025 data protection amendments classify biometric data as sensitive personal data, but no sector-specific guidance for physical AI data collection has been issued.
- In the current market structure, the capture operator controls the dataset, sets licensing terms, and determines downstream use; participants have no position in that chain after the initial session.
China’s Data Factory Sector
China’s robotics sector has moved to treat motion data collection as shared production infrastructure. The Shanghai National and Local Co-built Humanoid Robotics Innovation Center, as reported by People’s Daily, plans to build data infrastructure serving multiple robot platforms simultaneously.
The center targets 20,000 to 30,000 new data entries per day during initial phases, aiming for more than 10 million real-machine data entries within a year, enabling robots to replicate movements that human trainers demonstrate directly.
In April 2026, People’s Daily reported that humanoid robot training facilities had appeared across Beijing, Shanghai, and Shandong. One Beijing location houses 100 robots in active training, capable of logging at least 12,000 collection tasks daily.
Operators guide robots through movements using teleoperation, where a human controller directs robot actions in real time while sensor systems record the motion data. Direct physical demonstration, with a human trainer performing a task while capture systems record the motion, is used alongside teleoperation at multiple facilities.
Chinese industry media established the commercial framing before the larger public-sector centers were announced. Coverage by GeekPark described Noitom, a motion capture company, as positioned to benefit from rising demand for human demonstration data.
Noitom’s executives characterized the scaling of human demonstration data collection as capable of producing significant commercial returns, framing the data factory as a product category.
A March 2026 site visit by Beijing News described a Beijing facility in operational terms: approximately 5,000 square meters, with six replicated environment types including home, office, retail, and industrial settings. Human workers enter those environments, perform assigned tasks, and generate the capture data that constitutes the facility’s output.
The Chinese-language label that has attached to these operations is 数据工厂 (data factory). Industry publications apply the term to facilities that operate with defined throughput targets, structured collection tasks, and commercial output objectives.
This classification reflects how the sector characterizes human motion data collection: as a production process with its own infrastructure requirements, labor organization, and revenue model.
More Technology Articles
Geographic Arbitrage and the ASEAN Operational Node
The labor economics of human motion data collection follow a pattern established in earlier phases of AI development. Digital data annotation, content moderation, and image labeling work migrated to lower-cost labor markets in South and Southeast Asia during the 2010s.
That migration created a distributed labor infrastructure for AI systems; physical data collection is following the same pattern.
The Economic Times reported in 2026 that India has emerged as a physical-data back office for global AI and robotics companies. Workers there perform household and manufacturing tasks under capture conditions for downstream model training.
Malaysia appears in the public record as another operational location in this geography. The clearest available public evidence concerns Ropedia, a company headquartered in Singapore that describes its mission as capturing and structuring human experience at scale for physical intelligence applications.
The company positions itself as infrastructure for physical AI, operating between human activity in the physical world and the model training pipelines that consume that data. Its flagship public dataset, documented on Hugging Face as Xperience-10M, claims ten million human experience records spanning first-person video, full-body motion capture, hand motion capture, inertial sensor data, depth imaging, audio, and language annotation. The documentation references controlled access tiers and notes the availability of separate commercial datasets alongside the public research release.
Active job listings associated with Ropedia have placed operational roles in Greater Kuala Lumpur, referencing raw footage labeling, local storage infrastructure, field team coordination, and handover of recorded media. A listing for an IT Asset Inventory Executive cited the company's HOMIE and Xperience-10M projects by name. The listings place Ropedia's collection pipeline on the ground in Malaysia, with the supply chain extending from field capture through structured dataset products available under tiered commercial access.
Consent, Residual Value, and Structural Asymmetry
The market structure that has emerged from this collection activity concentrates control at the dataset level. The capture operator holds the data, sets the licensing terms, and determines which downstream buyers receive access and under what conditions. The participant, who generates the raw material through physical labor, has no position in any of those subsequent transactions.
Participation consent, obtained before a session begins, typically authorizes the operator to use, process, and train models on the recorded material. What happens to that material after initial training, whether it is incorporated into commercially licensed datasets or transferred to third parties, falls outside the scope of most standard consent agreements.
A human demonstrator paid a per-session fee generates data whose commercial value may increase substantially over time. If that data enters a licensed dataset sold to multiple robotics companies, or is embedded in model weights deployed across commercial applications, the gap between the original compensation and the total value derived grows with each subsequent use. No contractual instrument in the current market addresses that gap.
Malaysia's Personal Data Protection Act amendments, which came into force in 2025 according to Baker McKenzie, represent the most concrete statutory step in the region, classifying biometric data as sensitive personal data and tightening cross-border transfer requirements.
Whether motion capture streams, gait signatures, or egocentric video fall within that definition depends on identifiability assessments specific to each operation, and those determinations have not been settled by regulatory guidance or case law. The amendments establish a framework; enforcement practice for physical AI data collection remains to be developed.
The cross-border dimension compounds this. A dataset assembled in Kuala Lumpur may be hosted in Singapore, licensed to developers globally, and embedded in model weights redistributed across commercial applications, leaving participants with no legal relationship to any entity beyond the immediate collection operator. The reach of Malaysia's amended rules into that downstream chain has not been tested.
The structural conditions for a large-scale human motion data market are now in place. Demand is explicit from robotics companies, supply-side infrastructure is operational in China and extending into lower-cost labor markets, and the commercial layer connecting them carries no disclosure requirements specific to physical AI data collection.
Workers in instrumented facilities who generate this data through their physical labor have, in any jurisdiction where collection is currently underway, no binding instrument that defines what they are owed for it.
Sources
- People’s Daily Online. "China’s first heterogenous humanoid robot training facility set to put into use." People’s Daily Online, 2025.
- People’s Daily Online. "China’s humanoid robot training centers multiply as sector gains momentum." People’s Daily Online, 2026.
- GeekPark. "用动作捕捉技术建立人形机器人的「数据工厂」." GeekPark, 2024.
- Beijing News. "探访|人形机器人智能化的核心密码,藏在这座数据工厂里." Beijing News (Xin Jing Bao), 2026.
- Ropedia. "Ropedia — Physical Intelligence Data Infrastructure." Ropedia, 2026.
- Ropedia AI. "ropedia-ai/xperience-10m." Hugging Face, 2026.
- LinkedIn. "IT Operation Specialist at Ropedia — Greater Kuala Lumpur." LinkedIn, 2026.
- LinkedIn. "IT Asset Inventory Executive at Ropedia — Greater Kuala Lumpur." LinkedIn, 2026.
- The Economic Times. "Smile, your chores are going viral in a global robotics lab." The Economic Times, 2026.
- Baker McKenzie. "Malaysia: Personal Data Protection (Amendment) Act 2024 to Come into Force." InsightPlus — Baker McKenzie, 2025.
