Additional Reflections on Harnessing Generative AI for Human-Robot Interaction

Frame 8

A fundamental problem for robotics lies in the complexity of building and using robots, as well as the ability to prove the ROI for businesses and investors. These key challenges can potentially be solved by leveraging Generative AI for human-robot interaction to simplify workflows and make operations more intuitive.

The various concerns regarding the key problems for robotics include mobility, dexterity, autonomy, inflexibility, safety, logistics, cost, workforce training, societal readiness, and ethics. Although they aren’t going to disappear magically in a short time, these concerns can be alleviated gradually as we keep evolving robotics engineering toward better robots, showcasing how AI can be applied in a practical way.

Being a roboticist is an exhilarating career path for many of us because we are always trying to solve cross-domain challenges with cutting-edge technologies:

  • Making robots behave effectively and reliably, with safety as a bottom line.
  • Aiming at high machine efficiency and autonomy.
  • Leveraging human-in-the-loop advantages.

Robotics engineers excel at problem-solving, but the complexity of robot applications can be daunting for non-experts. No matter how advanced the robot functionalities might be, they would have little impact on our daily lives if they were hard for ordinary users to use. Human-robot interaction has emerged as an important problem with practical utility value. 

Here is a list of typical problems we’ve observed from the field in many robot system adoption and deployment processes:

  • Average users often find deep tech intimidating
  • There’s a steep learning curve for deploying and using robots for non-SMEs
  • The current workflow may need a significant change to fit for the robot’s deployment
  • Business owners need to see ROI and make sense of the technology to invest
  • They must also navigate disruptive tech migrations, adding to the adoption, deployment, and maintenance costs

Complexity and ROI are the two major factors in robot system adoption. Regardless of the autonomy level of any robot, a smooth and natural human-robot interaction would enhance the robot’s usability, reduce the complexity at the user (and operator) level, help achieve productivity in daily lives, and potentially enable positive ROI as the adoption accelerates.

Elisha Terada, Fresh's Technical Innovation Director, having a conversation with Jinny and exhibiting the power of Generative AI for Human-Robot Interaction.

The Solution: Creating Human-Robot Harmony via Generative AI

Let’s look at the complexity, which would adversely affect ROI in robot system operations. 

Robotics engineering development is a multi-disciplinary process. Its complexity has increased quickly due to the fast iterations of design methodologies, hardware components, software toolings, AI/ML integrations, and system operations in recent years. In most cases, robotics engineers use structured data to achieve engineering efficacy and ensure interoperability among components and systems. The structured data protocols represent a standard language system for robotics engineers to simplify tasks, keep things in order, and keep up with the fast technology iterations in the field.

It is hard for average users to use structured data protocols because it would take years of training and practice to acquire relevant domain knowledge and field experience. This should not be a major concern for human users when interacting with robots. Human users are good at using our language for natural, intuitive, and semantic communication without rigid structures.

A gap exists between unstructured data and structured data handling among humans and robots. It would be nice to hide the complexity on the robot’s side from human users so that robots can be much more useful to humans.

We need a solution that addresses robot usability, eventually bringing more meaningful integration among humans and robots, achieving Human-Robot Harmony.

We turn our eyes to Generative AI (Gen-AI) for the solution. Generative AI refers to deep-learning models that can take raw data and “learn” to generate statistically probable outputs when prompted, for instance, high-quality text, images, and other content based on the data they were trained on. [What is Generative AI?]

ChatGPT, a Gen-AI Large Language Model-supported chatbot service developed by OpenAI, offers convenient assistance for extracting structured parameters from unstructured conversations to work with robotics engineering conventions. This enables us to bring unstructured data into an intuitive human-robot interaction workflow.

The Jinny Experiment: A time-boxed hackathon to showcase the potential of Generative AI for robotics 

We built a proof of concept human-robot interaction workflow using a mobile robot with a ChatGPT-embedded and speech-enabled web app as its agent to handle the unstructured and structured data exchanges in a time-boxed hackathon in mid-2023. 

The technical stack in our Jinny experiment involved several key components to achieve an end-to-end workflow and showcase the value of Generative AI for human-robot interaction:

  • An mBot2 to showcase the major robotic modules in our experiment
  • HTTP REST APIs were utilized to connect with the mBot2 microcontroller’s APIs
  • ChatGPT was employed to generate API calls based on user input
  • Speechly facilitated text-to-speech interactions
  • ElevenLab’s Generative audio tools were used for synthesized voices
  • User experience (UX) designs were created to enhance user interactions

The core of the hackathon project was the creation of a semantic-level human-robot interaction pipeline, which not only served as the bridge between users and robots but also established an accessible platform for continuously developing human-robot interactions.

The hackathon wasn’t just a technical endeavor. The goal was to bring a multi-disciplinary innovation team together to lead the charge in making robots more accessible to those who aren’t experts—the end users who will be most affected by their widespread adoption. 

A presentation by Fresh’s Technical Innovation Director Elisha Terada at ROSCon 2023.

What’s next? Advancing robotics through Generative AI

Generative AI is expected to significantly impact the field of robotics, transforming how robots are designed, developed, and deployed. Human-robot interaction is one critical area to leverage and further advance Gen-AI’s power.

With the rapidly evolving Gen-AI landscape and the foundation model development, we’re advancing the unstructured semantic handling capabilities to include not only speech but also vision, as well as task management of long temporal sequences among multiple users and robots, to enable more natural and intuitive communication between humans and robots.

We have exciting plans and visions for the next.

In the near term, we will be using Generative AI for human-robot interaction to advance capabilities and grow pipeline sophistication:

  • Focusing on usability with long-term contextual interactions for complex robot tasks
  • Porting the learning to other robots that have more sophisticated capabilities
  • Handling multiple robots simultaneously using multiple LLM-agent implementations.

Our mid-term focus will be building a streamlined development workflow that is service-ready:

  • Integrating more Gen-AI tools, such as the emerging vision foundation models, to handle not only voice and text but also images and videos
  • Proposing interoperable unstructured data handling guidelines for development use
  • Developing a set of robot-OS agnostic HTTP APIs to reduce the development complexity for UI/UX developers.

Our long-term goal is to achieve sustainable growth within the ecosystem:

  • Commercialization by offering the services to our existing and potential clients
  • Building an ecosystem through industrial partnerships and customer engagements
  • Soliciting feedback from businesses to advance the vision.

Let’s join forces to keep exploring the track to enable more robot adoptions!

Steve Yin Picture

Steve Yin

Principal Software Engineer

After finishing his Ph.D. study in ECE from University of Illinois at Urbana-Champaign, Steve started his career in advanced electrical and biomedical engineering R&D work for clinical applications. He later became a tech entrepreneur in new product development and commercialization.

Steve’s specialties include R&D in signal and imaging processing, computer vision, and AI/ML, along with data analytics, program management, and continuous improvement. His most recent work is in the healthcare and education industries.

When he is not working, he loves traveling, reading, and outdoor activities like jogging and hiking.