Home 9 AEC 9 From Words to Things: MIT’s Robot Builds Furniture on Demand

From Words to Things: MIT’s Robot Builds Furniture on Demand

by | Dec 10, 2025

Speech-to-Reality marries generative AI and robotics to turn spoken requests into real objects in minutes.
A robotic arm builds a lattice-like stool after hearing the prompt “I want a simple stool,” demonstrating how the system translates speech into real-time fabrication (source: Alexander Kyaw and the researchers).

 

Researchers at the Massachusetts Institute of Technology (MIT) have unveiled a “speech-to-reality” system that lets a robotic arm create physical objects from spoken prompts, tells MIT News. Say “I want a simple stool,” and within minutes the system delivers a stool assembled from modular parts.

The pipeline works in several stages. First, speech recognition turns the user’s request into text. Next, a 3D generative AI builds a digital mesh of the requested object. That mesh is then broken down, or voxelized, into discrete, buildable components. A geometry processing step adjusts the design to meet real-world constraints (e.g., structural support, number of parts, connectability) before automated path planning directs the robotic arm to assemble the object.

So far, the system has produced furniture, such as stools, chairs, shelves, even a small table, and decorative items such as a dog-shaped statue. The whole process can finish in around five minutes, far faster than typical 3D printing or traditional fabrication workflows.

The implications are far-reaching. By combining natural language, AI design, and robotic fabrication, the project lowers the barrier to production, making it possible for someone with no CAD training or fabrication experience to turn ideas into real-world objects. As one of the project leads put it, this work bridges humans, AI, and robots to “co-create the world around us.”

Presented at the ACM Symposium on Computational Fabrication (SCF ’25), the research lays early groundwork for a future where everyday manufacturing could happen at the push of a voice command, reshaping the intersection of design, production, and accessibility.