✨We use embodied visual tracking tasks to benchmark embodied vision agents in complex dynamic environments.

Introduction:

Embodied visual tracking is a challenging task that requires an agent to dynamically follow a moving target within a 3D environment, leveraging first-person vision and continuous interaction with the surroundings. We rely on the UnreaZOO platform to benchmark agents under realistic, ever-changing conditions to assess their robustness, adaptability, and generalization across diverse scenes.

UnrealZoo provides a wide collection of photorealistic, richly varied environments, as the playground for embodied AI agents. The diversity of settings and agents enables researchers to push embodied AI to new levels of capability, testing in scenarios that go beyond simple indoor navigation.

UnrealZoo Key Features for Embodied Visual Tracking

Evaluate Tracking Agents in UnrealZoo

Below are examples of the tracking agents' first-person views in four kinds of scenes:

📌Interior Scenes:

Agents deal with frequent low-level obstacles and narrow pathways, testing their occlusion handling and precise movements.

UndergroungParking          (8D)

UndergroungParking (8D)

            Bunker

        Bunker

      LV_Soul_Cave

  LV_Soul_Cave

     Storage House     (8D)

 Storage House     (8D)

📌Palaces:

Agents face multi-level structures and intricate designs, where mastering spatial awareness is essential.

      Brass Gardens

  Brass Gardens

   Eastern Gardens

Eastern Gardens

    Western Garden(2D)

Western Garden(2D)

         Modular            ScienceFiSeason(2D)

     Modular            ScienceFiSeason(2D)

📌**Wilds: ****