Large Language Model (LLM) powered robotic agent

Research on embodied AI currently often focuses on end-to-end Vision-Language-Action (VLA) models [1]. Given the vision & language inputs, the model will generate robot actions directly. But these end-to-end models require time-consuming training and tend to perform poorly in long-sequence robotic tasks.

On the other hand, the robotics community has developed many mature software packages. Motivated by the recent success of AI agents [2],  in this project, we will consider how to build an LLM-centered robotic agent that can utilize the existing developed tools (such as the nav2 package and segmentation models) to handle more general and long-sequence robotics tasks.

The goal of this project is to explore how to combine LLM models, the ROS system, and existing tools that can facilitate robotics tasks. More specifically, the project includes the following points:

  • Interface existing robotics libraries, including resources and tools the robot can access.
  • Set up a simulation environment and construct benchmarks to test the developed framework.
  • (potentially) Explore the usage of imitation learning (or Retrieval-Augmented Generation technique) and reinforcement learning for training the robotic agent.

Deliverables expected at the end of the thesis project include an open-source framework, the technical report, and a demo that can be run.

The applicant should have basic knowledge and experience with programming experience with Python/C++; be familiar with the use of bash and working on Linux.

[1] https://octo-models.github.io/

[2] https://github.com/browser-use/browser-use

Annonsuppgifter

Annonsör: Örebro universitet

Ansök senast:

Annonskategori: Examensarbete, praktik, uppsats

Intresseområde: Data och IT

Kontaktperson: Shuo Sun (Doktorand) shuo.sun@oru.se