Generative AI has, undoubtedly, become the talk of the town in the robotics sphere. This evolving technology holds the promise to redefine how we interact with machines and how they respond to the world. A myriad of visions and innovations are floating around, whether it’s about integrating natural language commands or reshaping design dynamics. On a recent trip to Nvidia’s South Bay office, I had the chance to chat with Deepu Talla, the esteemed vice president and general manager of Nvidia’s Embedded & Edge Computing. I was curious to know his take on this trend and where Nvidia stands in this generative AI narrative.
Talla was keen to emphasize the tangible outcomes generative AI has already delivered. “Look at how it aids in tasks like drafting emails,” he remarked. “Though it might not hit the bull’s eye every time, it gives me a solid starting point, covering a good 70% of the work. It’s these clear markers of progress and efficiency that highlight its transformative potential.” However, he candidly admits that it’s not flawless – he wouldn’t completely rely on it for tasks like reading and summarizing. But it’s the marked improvement from previous systems that stands out. And as it turns out, Nvidia had some major updates just around the corner related to this very domain.
Fast forward a few weeks, and Nvidia is in the spotlight with its announcement at ROSCon. This wasn’t an isolated piece of news. It was part of a more extensive narrative that also unveiled developments like the Nvidia Isaac ROS 2.0 and Nvidia Isaac Sim 2023 platforms. These systems are geared to fully harness the power of generative AI. The significance? This move can play a pivotal role in quickening its uptake among robotics enthusiasts and professionals. To understand its outreach, consider this: over 1.2 million developers have engaged with Nvidia AI and Jetson platforms, a list that boasts prominent names like AWS, Cisco, and John Deere.
Perhaps the most captivating element is the Generative AI Playground for Jetson. This playground is essentially a treasure trove for developers. Nvidia describes it as a space offering “optimized tools and tutorials” where developers can tap into open-source large language models. They can explore diffusion models to generate visually rich images, vision language models (VLMs), and vision transformers (ViTs) that bridge vision AI with natural language understanding, offering a holistic comprehension of scenes.
These advanced models are set to bridge the knowledge gap for systems, enabling them to navigate and make decisions in unfamiliar scenarios – a crucial aspect as simulations have their limitations. Even in structured environments like warehouses or factory floors, there are countless variables at play. The aim is twofold: to allow these robotic systems to adapt in real time and to offer a more intuitive language-based interface.