A의 도움을 받았다.I. 언어 모델, 구글의 로봇들이 똑똑해지고 있다

[NYT]Aided by A.I. Language Models, Google’s Robots Are Getting Smart

외팔 로봇이 테이블 앞에 서 있었습니다. 테이블 위에는 세 개의 플라스틱 조각상이 앉아 있었습니다: 사자, 고래, 그리고 공룡.

한 엔지니어가 로봇에게 “멸종된 동물을 주우라”는 지시를 내렸습니다

로봇은 잠시 윙윙거리다가 팔을 뻗고 발톱이 열리고 내려왔습니다. 그것은 공룡을 잡았습니다.

아주 최근까지, 지난주 캘리포니아 마운틴뷰에 있는 구글 로봇 부서에서 팟캐스트 인터뷰에서 목격한 이 시연은 불가능했을 것입니다. 로봇은 이전에 본 적이 없는 물체를 확실하게 조작할 수 없었고, 확실히 “멸종 동물”에서 “플라스틱 공룡”으로 논리적으로 도약할 수 없었습니다

그러나 로봇공학에서는 조용한 혁명이 진행되고 있습니다. 이는 ChatGPT, 바드 및 기타 챗봇에 전원을 공급하는 것과 같은 유형의 인공 지능 시스템인 이른바 큰 언어 모델의 최근 발전을 뒷받침합니다.

구글은 최근 로봇에 최첨단 언어 모델을 연결하여 인공 뇌와 동등한 수준의 언어를 제공하기 시작했습니다. 그 비밀스러운 프로젝트는 그 로봇들을 훨씬 더 똑똑하게 만들었고 그들에게 이해와 문제 해결의 새로운 능력을 주었습니다.

저는 RT-2라고 불리는 구글의 최신 로봇 모델을 개인적으로 시연하는 동안 그 진전을 엿볼 수 있었습니다. 금요일에 공개될 이 모델은 구글 임원들이 로봇이 제작되고 프로그래밍되는 방식의 큰 도약이라고 설명한 것을 향한 첫 걸음에 해당합니다.

“우리는 이 변화의 결과로 우리의 전체 연구 프로그램을 재고해야만 했습니다,” 라고 구글 딥마인드의 로봇 공학 책임자인 빈센트 반후케가 말했습니다. “이전에 작업했던 많은 작업이 완전히 무효화되었습니다.”

로봇은 여전히 인간 수준의 손재주에 미치지 못하고 일부 기본 작업에서 실패하지만 구글은 A를 사용합니다.I. 로봇에게 추론과 즉흥성의 새로운 기술을 주는 언어 모델은 유망한 돌파구라고 캘리포니아 버클리 대학의 로봇공학 교수인 켄 골드버그는 말했습니다.

“매우 인상적인 것은 그것이 의미론을 로봇과 연결시키는 방법입니다,”라고 그는 말했습니다. “로봇 공학에 있어서 이것은 매우 흥미로운 일입니다

이것의 크기를 이해하기 위해, 로봇이 어떻게 전통적으로 만들어졌는지에 대해 조금 알게 되는 것이 도움이 됩니다.

수년 동안, 구글과 다른 회사의 엔지니어들이 로봇들이 기계적인 작업을 하도록 훈련시킨 방식은, 예를 들어 햄버거를 뒤집는 것과 같은, 특정한 명령 목록을 그들에게 프로그래밍하는 것이었습니다. (주걱을 6.5인치 낮추고 저항이 생길 때까지 앞으로 밀어서 4.2인치 올리고 180도 회전합니다.) 그런 다음 로봇들은 그 일을 반복해서 연습하고 엔지니어들은 그들이 그것을 맞힐 때까지 매번 지시사항을 조정합니다.

이 접근 방식은 특정 용도에 제한적으로 적용되었습니다. 하지만 이런 방식으로 로봇을 훈련하는 것은 느리고 노동 집약적입니다. 실제 테스트에서 많은 데이터를 수집해야 합니다. 만약 여러분이 로봇에게 새로운 것을 가르치고 싶다면, 예를 들어 햄버거 대신 팬케이크를 뒤집는 방법을 가르쳐야 합니다. 여러분은 보통 처음부터 다시 프로그래밍을 해야 했습니다.

부분적으로 이러한 한계 때문에 하드웨어 로봇은 소프트웨어 기반 형제들보다 덜 빠르게 개선되었습니다. ChatGPT 제조업체인 OpenAI는 느린 진행과 고품질 교육 데이터 부족을 이유로 2021년 로봇 팀을 해체했습니다. 2017년, 구글의 모회사인 알파벳은 인수한 로봇공학 회사인 보스턴 다이내믹스를 일본의 기술 대기업 소프트뱅크에 매각했습니다. (보스턴 다이내믹스는 현재 현대에 의해 소유되고 있으며 주로 무서운 민첩성의 업적을 수행하는 휴머노이드 로봇의 바이럴 비디오를 제작하기 위해 존재하는 것으로 보입니다.)

최근 몇 년 동안, 구글의 연구원들은 한 가지 아이디어를 얻었습니다. 로봇이 특정 작업을 위해 하나씩 프로그래밍되는 대신 A를 사용할 수 있다면 어떨까요.I. 언어 모델 – 방대한 양의 인터넷 텍스트에 대해 훈련을 받은 언어 모델 – 스스로 새로운 기술을 배울 수 있습니까?

“우리는 약 2년 전에 이러한 언어 모델들을 가지고 놀기 시작했고, 그리고 나서 우리는 그 모델들이 많은 지식을 가지고 있다는 것을 깨달았습니다,” 라고 구글 연구 과학자 카롤 하우스만이 말했습니다. “그래서 우리는 로봇을 로봇에 연결하기 시작했습니다.”

언어 모델과 물리적 로봇에 합류하기 위한 구글의 첫 번째 시도는 작년에 공개된 PaLM-SayCan이라는 연구 프로젝트였습니다. 그것은 약간의 관심을 끌었지만, 그것의 유용성은 제한적이었습니다. 로봇은 이미지를 해석하는 능력이 부족했습니다. – 만약 여러분이 로봇이 세상을 돌아다닐 수 있기를 원한다면, 중요한 기술입니다. 그들은 다양한 작업에 대한 단계별 지침을 작성할 수 있었지만, 그 단계를 행동으로 옮길 수는 없었습니다.

구글의 새로운 로봇 모델인 RT-2는 그것을 할 수 있습니다. 그것은 그 회사가 “비전-언어-액션” 모델 또는 A라고 부르는 것입니다.단지 주변의 세계를 보고 분석할 수 있을 뿐만 아니라 로봇에게 움직이는 방법을 알려주는 능력을 가진 시스템.

로봇의 움직임을 일련의 숫자(토큰화라고 하는 프로세스)로 변환하고 해당 토큰을 언어 모델과 동일한 훈련 데이터에 통합함으로써 그렇게 합니다. 결국, ChatGPT나 바드가 시나 역사 에세이에서 어떤 단어가 다음에 와야 하는지 추측하는 것을 배우는 것처럼, RT-2는 로봇의 팔이 어떻게 움직여야 공을 집어 들거나 빈 탄산음료 캔을 재활용 통에 던져야 하는지 추측하는 것을 배울 수 있습니다.

“다시 말해서, 이 모델은 로봇을 말하는 법을 배울 수 있습니다,” 라고 하우스만 씨가 말했습니다.

한 시간 동안 진행된 시연에서 저의 팟캐스트 공동 진행자와 저는 RT-2가 인상적인 작업을 수행하는 것을 보았습니다. 하나는 “폴크스바겐을 독일 국기로 이동”과 같은 복잡한 지시를 성공적으로 따랐고, RT-2는 모델 VW 버스를 발견하여 몇 피트 떨어진 소형 독일 국기에 설치했습니다.

그것은 또한 영어 이외의 언어로 지시사항을 따를 수 있고 심지어 관련된 개념들을 추상적으로 연결할 수 있다는 것을 증명했습니다. 한번은, 제가 RT-2가 축구공을 주우길 원했을 때, 저는 “리오넬 메시를 주우라”고 지시했습니다 RT-2는 첫 번째 시도에서 정확하게 맞았습니다.

로봇은 완벽하지 않았습니다. 앞 테이블에 놓인 라크루아 캔의 맛을 잘못 식별했습니다. (캔은 레몬, RT-2는 오렌지로 추측했습니다.) 또 다른 때, 테이블 위에 어떤 종류의 과일이 있는지 질문을 받았을 때, 이 로봇은 “흰 색”이라고 간단히 대답했습니다. (그것은 바나나였습니다.) 구글 대변인은 로봇이 와이파이가 잠시 꺼졌기 때문에 이전 시험자의 질문에 대한 캐시된 답변을 사용했다고 말했습니다.

구글은 RT-2 로봇을 판매하거나 더 광범위하게 출시할 즉각적인 계획은 없지만, 연구원들은 이 새로운 언어가 장착된 기계들이 결국 단순한 허풍쟁이 이상으로 유용할 것이라고 믿고 있습니다. 언어 모델이 내장된 로봇은 창고에 투입되거나, 의약품에 사용되거나, 심지어 가정 보조원으로 배치될 수 있다고 그들은 말했습니다. 빨래를 개고, 식기세척기를 내리고, 집 주변에서 물건을 줍습니다.

“이것은 사람들이 있는 환경에서 로봇을 사용하는 것을 정말로 가능하게 합니다,”라고 Vanhouke씨는 말했습니다. “사무실 환경에서, 가정 환경에서, 해야 할 물리적 작업이 많은 모든 장소에서.”

물론 지저분하고 혼란스러운 물리적 세계에서 물건을 옮기는 것은 통제된 실험실에서 하는 것보다 어렵습니다. 그리고 그 A를 고려하면.I. 언어 모델은 로봇의 두뇌가 새로운 위험을 도입할 수 있기 때문에 실수를 하거나 연구자들이 환각이나 속임수라고 부르는 말도 안 되는 대답을 발명합니다.

하지만 버클리 로봇공학 교수인 골드버그 씨는 이러한 위험성은 아직 멀다고 말했습니다.

“우리는 이런 것들이 느슨해지도록 내버려 두는 것에 대해 말하는 것이 아닙니다,”라고 그는 말했습니다. “이러한 실험실 환경에서 그들은 단지 테이블 위에서 몇 가지 물체를 밀려고 할 뿐입니다.”

구글은 RT-2가 많은 안전 기능을 갖추고 있다고 말했습니다. 이 시스템은 모든 로봇의 뒷면에 있는 빨간색 큰 버튼 외에도 사람이나 물체와의 충돌을 방지하기 위해 센서를 사용합니다.

RT-2에 내장된 AI 소프트웨어는 로봇이 해로운 일을 하는 것을 막기 위해 사용할 수 있는 자체 안전 장치가 있습니다. 한 가지 긍정적인 예: 구글의 로봇은 물이 새면 하드웨어가 손상될 수 있기 때문에 물이 들어 있는 용기를 집어 들지 않도록 훈련 받을 수 있습니다.

당신이 A를 걱정하는 사람이라면.I. 불량배가 되고 있는 I. 그리고 할리우드는 우리에게 원작 “터미네이터”부터 작년의 “M3gan”에 이르기까지 시나리오를 두려워할 많은 이유를 주었습니다. 추론하고, 계획하고, 즉흥적으로 할 수 있는 로봇을 만드는 아이디어는 아마도 끔찍한 생각으로 여러분을 놀라게 할 것입니다.

하지만 구글에서는 연구원들이 축하하고 있는 아이디어입니다. 황무지에서 몇 년을 보낸 후, 하드웨어 로봇이 돌아왔습니다. 그리고 그들은 그들의 챗봇 두뇌에 감사를 표합니다.

A one-armed robot stood in front of a table. On the table sat three plastic figurines: a lion, a whale and a dinosaur.

An engineer gave the robot an instruction: “Pick up the extinct animal.”

The robot whirred for a moment, then its arm extended and its claw opened and descended. It grabbed the dinosaur.

Until very recently, this demonstration, which I witnessed during a podcast interview at Google’s robotics division in Mountain View, Calif., last week, would have been impossible. Robots weren’t able to reliably manipulate objects they had never seen before, and they certainly weren’t capable of making the logical leap from “extinct animal” to “plastic dinosaur.”

But a quiet revolution is underway in robotics, one that piggybacks on recent advances in so-called large language models — the same type of artificial intelligence system that powers ChatGPT, Bard and other chatbots.

Google has recently begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains. The secretive project has made the robots far smarter and given them new powers of understanding and problem-solving.

I got a glimpse of that progress during a private demonstration of Google’s latest robotics model, called RT-2. The model, which is being unveiled on Friday, amounts to a first step toward what Google executives described as a major leap in the way robots are built and programmed.

“We’ve had to reconsider our entire research program as a result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “A lot of the things that we were working on before have been entirely invalidated.”

Robots still fall short of human-level dexterity and fail at some basic tasks, but Google’s use of A.I. language models to give robots new skills of reasoning and improvisation represents a promising breakthrough, said Ken Goldberg, a robotics professor at the University of California, Berkeley.

“What’s very impressive is how it links semantics with robots,” he said. “That’s very exciting for robotics.”

To understand the magnitude of this, it helps to know a little about how robots have conventionally been built.

For years, the way engineers at Google and other companies trained robots to do a mechanical task — flipping a burger, for example — was by programming them with a specific list of instructions. (Lower the spatula 6.5 inches, slide it forward until it encounters resistance, raise it 4.2 inches, rotate it 180 degrees, and so on.) Robots would then practice the task again and again, with engineers tweaking the instructions each time until they got it right.

This approach worked for certain, limited uses. But training robots this way is slow and labor-intensive. It requires collecting lots of data from real-world tests. And if you wanted to teach a robot to do something new — to flip a pancake instead of a burger, say — you usually had to reprogram it from scratch.

Partly because of these limitations, hardware robots have improved less quickly than their software-based siblings. OpenAI, the maker of ChatGPT, disbanded its robotics team in 2021, citing slow progress and a lack of high-quality training data. In 2017, Google’s parent company, Alphabet, sold Boston Dynamics, a robotics company it had acquired, to the Japanese tech conglomerate SoftBank. (Boston Dynamics is now owned by Hyundai and seems to exist mainly to produce viral videos of humanoid robots performing terrifying feats of agility.)

In recent years, researchers at Google had an idea. What if, instead of being programmed for specific tasks one by one, robots could use an A.I. language model — one that had been trained on vast swaths of internet text — to learn new skills for themselves?

“We started playing with these language models around two years ago, and then we realized that they have a lot of knowledge in them,” said Karol Hausman, a Google research scientist. “So we started connecting them to robots.”

Google’s first attempt to join language models and physical robots was a research project called PaLM-SayCan, which was revealed last year. It drew some attention, but its usefulness was limited. The robots lacked the ability to interpret images — a crucial skill, if you want them to be able to navigate the world. They could write out step-by-step instructions for different tasks, but they couldn’t turn those steps into actions.

Google’s new robotics model, RT-2, can do just that. It’s what the company calls a “vision-language-action” model, or an A.I. system that has the ability not just to see and analyze the world around it, but to tell a robot how to move.

It does so by translating the robot’s movements into a series of numbers — a process called tokenizing — and incorporating those tokens into the same training data as the language model. Eventually, just as ChatGPT or Bard learns to guess what words should come next in a poem or a history essay, RT-2 can learn to guess how a robot’s arm should move to pick up a ball or throw an empty soda can into the recycling bin.

“In other words, this model can learn to speak robot,” Mr. Hausman said.

In an hourlong demonstration, which took place in a Google office kitchen littered with objects from a dollar store, my podcast co-host and I saw RT-2 perform a number of impressive tasks. One was successfully following complex instructions like “move the Volkswagen to the German flag,” which RT-2 did by finding and snagging a model VW Bus and setting it down on a miniature German flag several feet away.

It also proved capable of following instructions in languages other than English, and even making abstract connections between related concepts. Once, when I wanted RT-2 to pick up a soccer ball, I instructed it to “pick up Lionel Messi.” RT-2 got it right on the first try.

The robot wasn’t perfect. It incorrectly identified the flavor of a can of LaCroix placed on the table in front of it. (The can was lemon; RT-2 guessed orange.) Another time, when it was asked what kind of fruit was on a table, the robot simply answered, “White.” (It was a banana.) A Google spokeswoman said the robot had used a cached answer to a previous tester’s question because its Wi-Fi had briefly gone out.

Google has no immediate plans to sell RT-2 robots or release them more widely, but its researchers believe these new language-equipped machines will eventually be useful for more than just parlor tricks. Robots with built-in language models could be put into warehouses, used in medicine or even deployed as household assistants — folding laundry, unloading the dishwasher, picking up around the house, they said.

“This really opens up using robots in environments where people are,” Mr. Vanhoucke said. “In office environments, in home environments, in all the places where there are a lot of physical tasks that need to be done.”

Of course, moving objects around in the messy, chaotic physical world is harder than doing it in a controlled lab. And given that A.I. language models frequently make mistakes or invent nonsensical answers — which researchers call hallucination or confabulation — using them as the brains of robots could introduce new risks.

But Mr. Goldberg, the Berkeley robotics professor, said those risks were still remote.

“We’re not talking about letting these things run loose,” he said. “In these lab environments, they’re just trying to push some objects around on a table.”

Google, for its part, said RT-2 was equipped with plenty of safety features. In addition to a big red button on the back of every robot — which stops the robot in its tracks when pressed — the system uses sensors to avoid bumping into people or objects.

The A.I. software built into RT-2 has its own safeguards, which it can use to prevent the robot from doing anything harmful. One benign example: Google’s robots can be trained not to pick up containers with water in them, because water can damage their hardware if it spills.

If you’re the kind of person who worries about A.I. going rogue — and Hollywood has given us plenty of reasons to fear that scenario, from the original “Terminator” to last year’s “M3gan” — the idea of making robots that can reason, plan and improvise on the fly probably strikes you as a terrible idea.

But at Google, it’s the kind of idea researchers are celebrating. After years in the wilderness, hardware robots are back — and they have their chatbot brains to thank.

📰 관련 뉴스

댓글 남기기 취소