
Researchers at the Massachusetts Institute of Technology (MIT) have unveiled a “speech-to-reality” system that lets a robotic arm create physical objects from spoken prompts, tells MIT News. Say “I want a simple stool,” and within minutes the system delivers a stool assembled from modular parts.
The pipeline works in several stages. First, speech recognition turns the user’s request into text. Next, a 3D generative AI builds a digital mesh of the requested object. That mesh is then broken down, or voxelized, into discrete, buildable components. A geometry processing step adjusts the design to meet real-world constraints (e.g., structural support, number of parts, connectability) before automated path planning directs the robotic arm to assemble the object.
So far, the system has produced furniture, such as stools, chairs, shelves, even a small table, and decorative items such as a dog-shaped statue. The whole process can finish in around five minutes, far faster than typical 3D printing or traditional fabrication workflows.
The implications are far-reaching. By combining natural language, AI design, and robotic fabrication, the project lowers the barrier to production, making it possible for someone with no CAD training or fabrication experience to turn ideas into real-world objects. As one of the project leads put it, this work bridges humans, AI, and robots to “co-create the world around us.”
Presented at the ACM Symposium on Computational Fabrication (SCF ’25), the research lays early groundwork for a future where everyday manufacturing could happen at the push of a voice command, reshaping the intersection of design, production, and accessibility.