The Language of Self-Driving Cars

Here at RESONIKS, just like our AI, we value constant improvement and the pursuit of perfection. To help steer us towards excellence, we have developed our own chronicle of AI-related technological advancements, Thursday Soundbytes.

Today's world is filled with constant questions concerning the overlap of artificial intelligence and humanism, and whether the two are mutually exclusive. At RESONIKS, we believe that humans can utilise AI to help us in ways we never thought possible. This concept is meant to educate not only ourselves, but also the community around us to share positive news and developments in AI.

The Language of Self-Driving Cars

Tired of cab-driver small talk? With new autonomous vehicles being trained in RT-1, it looks like we may never communicate with our drivers again!

A new robotic model called RT-1 has been introduced and shows great promise for real-world robotic control. This model is based on a Transformer architecture, which is also used in natural language processing. RT-1 can take instructions in natural language and perform actions in the real world. The authors tested RT-1 on a variety of tasks, including picking up objects, opening drawers, and following instructions. They found that RT-1 outperformed other models on many of these tasks. RT-1 can also be improved by incorporating data from different sources, such as simulations or other robots. Overall, the paper shows that RT-1 is a promising new model for real-world robotic control.

How Does RT-1 Work?

RT-1 is a multi-task model that tokenizes robot inputs and outputs actions. This means that it breaks down the robot’s inputs (such as camera images and task instructions) into smaller pieces, and then it generates outputs (such as motor commands) that correspond to those inputs.

RT-1 functions by taking a short sequence of images and a natural language task description as input. It then generates a corresponding action for the robot to perform at each time step. This process is accomplished through several key architectural components:

Visual Feature Extraction: First, RT-1 processes the images and text. It utilizes an ImageNet-pretrained convolutional neural network (EfficientNet) that has been conditioned on a pretrained instruction embedding using FiLM layers. This step extracts visual features directly relevant to the task.
Tokenization: The system employs a Token Learner module to compute a compact set of tokens from the extracted visual features.
Transformer Processing: A Transformer attends to these tokens, ultimately generating discretized action tokens.
Action Breakdown: Actions comprise seven dimensions for arm movement (x, y, z, roll, pitch, yaw, gripper opening), three dimensions for base movement (x, y, yaw), and a discrete dimension enabling switching between three modes: arm control, base control, and episode termination.
Closed-Loop Control: RT-1 operates in a closed-loop control fashion, issuing actions at 3Hz until the model generates a “terminate” action or a pre-set number of time steps is reached.

RT-1 and Large Language Models

RT-1 is similar to large language models (LLMs) in several ways. Both models are able to learn from a large amount of data and to generalize their knowledge to new tasks. However, there are also some important differences between the two models. LLMs are typically trained on text data, while RT-1 is trained on robotic data. This means that LLMs are better at understanding and generating language, while RT-1 is better at controlling robots.

The Future of Self-Driving Cars

The possibility of self-driving cars in the future opens up a lot of possibility, as well as a lot of risk. But the reason that large countries such as the United States and United Kingdom have passed legislation to permit autonomous vehicles is driven by benefits such as reduced traffic congestion, increased accessibility for those with disabilities, and improved safety. Using LLM's in self-driving vehicles, which can translate surroundings into natural language in conjnction with RT-1, which can compute natural language into robotic tasks, results in the perfect union of algorithms to drive our cars.

To read more about LLM's and RT-1 in autonomous vehciles, visit: https://kargarisaac.medium.com

To continue following our Thursday Soundbytes and other business updates, visit: https://www.linkedin.com/company/resoniks

And lastly, if you have something you think we should write about, do not hesitate to reach out to us with the latest tips! We can be reached through the above LinkedIn profile, or through info@resoniks.com.