
A team at Massachusetts Institute of Technology (MIT) has developed an AI system that interacts with full CAD software environments, clicking, dragging, and selecting tools, to convert a 2D sketch into a fully defined 3D object, tells MIT News.
At the heart of the work is a large dataset named VideoCAD. It contains over 41,000 annotated videos of designers building CAD models: every mouse click, drag, tool selection, zoom, and keystroke is recorded. This sets the stage for an agent that operates software rather than just a model that generates geometry. The team translates high-level commands (“extrude this sketch region”) into low-level UI actions (move cursor to (x,y), click tool, drag region) so the agent can mimic human interaction with software.
Once trained, the agent accepts a 2D sketch of a part or object and proceeds to drive the CAD software itself, opening menus, selecting tools, extruding, filleting, assembling, etc., to create the 3D model. The current demonstrations span simple shapes to more complex bracket- and house-like forms.
This tool could reduce the time and expertise needed to learn CAD, automate routine modeling workflows, and open up 3D design to non-specialist users without full CAD training. The researchers frame the agent as a “CAD co-pilot” rather than a replacement, suggesting it augments productivity and lowers the barrier to entry for design.
In the broader context of engineering content, this development signals a shift: design tools not only generate geometry but can control and navigate the software environment itself. For articles about generative design, digital manufacturing, CAD automation, or AI in engineering workflows, the MIT work offers a concrete example of an agent that understands both design intent and software mechanics.