GaussianWorld

About Us

We are building a system for robust 3D scene understanding and natural language interaction within complex environments. Our goal is to develop a foundation model that can tokenize complex 3D scenes, perform tasks such as instance detection, and enable reasoning to answer complex language queries about the state of the environment and support natural language interaction with the scene along with spatial content grounding.

Our goal is to make 3D scene understanding as accessible and powerful as its 2D counterpart.

Our Research Trajectory

Our projects build upon each other over time to expand the systems capabilities.

ShapeSplat

3DV'25 Oral

Object-level 3DGS.
Dataset of 65K objects.
Self-supervised encoding.

Dataset of 65K 3D objects in Gaussian Splatting format. Gaussian-MAE architecture as well as a self-supervised training strategy for encoding collected 3D objects.

SceneSplat

ICCV'25 Oral

Indoor 3DGS.
Dataset of 7k scenes.
Pseudo-labels for pre-training.

Dataset of 7K indoor 3D scenes in Gaussian Splatting format. Method for extracting 3D semantic-language pseud-labels leveraging 2D foundation models. Pre-trained scene-level encoder. SotA open vocabulary segementation perforamnce.

SceneSplat++

NeurIPS'25

Indoor & outdoor 3DGS.
Dataset of 49k scenes.
Comprehensive 3D benchmark.

Dataset scaled to 49K 3DGS scenes spanning indoor and outdoor environments. Comprehensive language-3DGS evaluation benchmark directly in 3D over 1060 scenes.

Chorus

CVPR'25

Indoor 3DGS & PC.
Multi-teacher distillation.
Wide task compatibility.

Extended pseudo-label pre-training to multiple 2D foundation model teachers. Pre-trained a 3DGS and a PC scene-level encoder. Demonstrated SotA after fine-tuning on main 3D downstream applications.