ウィスパリング同時通訳研究会コミュのGoogle Keynote (Google I/O ‘22)

Hi, everyone, and welcome. Actually, let’s make that welcome back! It’s great to return to Shoreline Amphitheatre after three years away. To the thousands of developers, partners and Googlers here with us, it’s great to see all of you. And to the millions more joining us around the world — we’re so happy you’re here, too.
Last year, we shared how new breakthroughs in some of the most technically challenging areas of computer science are making Google products more helpful in the moments that matter. All this work is in service of our timeless mission: to organize the world's information and make it universally accessible and useful.
I'm excited to show you how we’re driving that mission forward in two key ways: by deepening our understanding of information so that we can turn it into knowledge; and advancing the state of computing, so that knowledge is easier to access, no matter who or where you are.
Today, you'll see how progress on these two parts of our mission ensures Google products are built to help. I’ll start with a few quick examples. Throughout the pandemic, Google has focused on delivering accurate information to help people stay healthy. Over the last year, people used Google Search and Maps to find where they could get a COVID vaccine nearly two billion times.
We’ve also expanded our flood forecasting technology to help people stay safe in the face of natural disasters. During last year’s monsoon season, our flood alerts notified more than 23 million people in India and Bangladesh. And we estimate this supported the timely evacuation of hundreds of thousands of people.
In Ukraine, we worked with the government to rapidly deploy air raid alerts. To date, we’ve delivered hundreds of millions of alerts to help people get to safety. In March I was in Poland, where millions of Ukrainians have sought refuge. Warsaw’s population has increased by nearly 20% as families host refugees in their homes, and schools welcome thousands of new students. Nearly every Google employee I spoke with there was hosting someoneAdding 24 more languages to Google Translate.
In countries around the world, Google Translate has been a crucial tool for newcomers and residents trying to communicate with one another. We’re proud of how it’s helping Ukrainians find a bit of hope and connection until they are able to return home again.
Real-time translation is a testament to how knowledge and computing come together to make people's lives better. More people are using Google Translate than ever before, but we still have work to do to make it universally accessible. There’s a long tail of languages that are underrepresented on the web today, and translating them is a hard technical problem. That’s because translation models are usually trained with bilingual text — for example, the same phrase in both English and Spanish. However, there's not enough publicly available bilingual text for every language.
So with advances in machine learning, we’ve developed a monolingual approach where the model learns to translate a new language without ever seeing a direct translation of it. By collaborating with native speakers and institutions, we found these translations were of sufficient quality to be useful, and we'll continue to improve them.
Today, I’m excited to announce that we’re adding 24 new languages to Google Translate, including the first indigenous languages of the Americas. Together, these languages are spoken by more than 300 million people. Breakthroughs like this are powering a radical shift in how we access knowledge and use computers.
Taking Google Maps to the next level
So much of what’s knowable about our world goes beyond language — it’s in the physical and geospatial information all around us. For more than 15 years, Google Maps has worked to create rich and useful representations of this information to help us navigate. Advances in AI are taking this work to the next level, whether it’s expanding our coverage to remote areas, or reimagining how to explore the world in more intuitive ways.
Around the world, we’ve mapped around 1.6 billion buildings and over 60 million kilometers of roads to date. Some remote and rural areas have previously been difficult to map, due to scarcity of high-quality imagery and distinct building types and terrain. To address this, we’re using computer vision and neural networks to detect buildings at scale from satellite images. As a result, we have increased the number of buildings on Google Maps in Africa by 5X since July 2020, from 60 million to nearly 300 million.
We’ve also doubled the number of buildings mapped in India and Indonesia this year. Globally, over 20% of the buildings on Google Maps have been detected using these new techniques. We’ve gone a step further, and made the dataset of buildings in Africa publicly available. International organizations like the United Nations and the World Bank are already using it to better understand population density, and to provide support and emergency assistance.
We’re also bringing new capabilities into Maps. Using advances in 3D mapping and machine learning, we’re fusing billions of aerial and street level images to create a new, high-fidelity representation of a place. These breakthrough technologies are coming together to power a new experience in Maps called immersive view: it allows you to explore a place like never before.
Let’s go to London and take a look. Say you’re planning to visit Westminster with your family. You can get into this immersive view straight from Maps on your phone, and you can pan around the sights… here’s Westminster Abbey. If you’re thinking of heading to Big Ben, you can check if there's traffic, how busy it is, and even see the weather forecast. And if you’re looking to grab a bite during your visit, you can check out restaurants nearby and get a glimpse inside.
What's amazing is that isn't a drone flying in the restaurant — we use neural rendering to create the experience from images alone. And Google Cloud Immersive Stream allows this experience to run on just about any smartphone. This feature will start rolling out in Google Maps for select cities globally later this year.
Another big improvement to Maps is eco-friendly routing. Launched last year, it shows you the most fuel-efficient route, giving you the choice to save money on gas and reduce carbon emissions. Eco-friendly routes have already rolled out in the U.S. and Canada — and people have used them to travel approximately 86 billion miles, helping save an estimated half million metric tons of carbon emissions, the equivalent of taking 100,000 cars off the road.
I’m happy to share that we’re expanding this feature to more places, including Europe later this year. In this Berlin example, you could reduce your fuel consumption by 18% taking a route that’s just three minutes slower. These small decisions have a big impact at scale. With the expansion into Europe and beyond, we estimate carbon emission savings will double by the end of the year.
And we’ve added a similar feature to Google Flights. When you search for flights between two cities, we also show you carbon emission estimates alongside other information like price and schedule, making it easy to choose a greener option. These eco-friendly features in Maps and Flights are part of our goal to empower 1 billion people to make more sustainable choices through our products, and we’re excited about the progress here.
New YouTube features to help people easily access video content.
Beyond Maps, video is becoming an even more fundamental part of how we share information, communicate, and learn. Often when you come to YouTube, you are looking for a specific moment in a video and we want to help you get there faster.
Last year we launched auto-generated chapters to make it easier to jump to the part you’re most interested in.
This is also great for creators because it saves them time making chapters. We’re now applying multimodal technology from DeepMind. It simultaneously uses text, audio and video to auto-generate chapters with greater accuracy and speed. With this, we now have a goal to 10X the number of videos with auto-generated chapters, from eight million today, to 80 million over the next year.
Often the fastest way to get a sense of a video’s content is to read its transcript, so we’re also using speech recognition models to transcribe videos. Video transcripts are now available to all Android and iOS users.
Next up, we’re bringing auto-translated captions on YouTube to mobile. Which means viewers can now auto-translate video captions in 16 languages, and creators can grow their global audience. We’ll also be expanding auto-translated captions to Ukrainian YouTube content next month, part of our larger effort to increase access to accurate information about the war.
Helping people be more efficient with Google Workspace
Just as we’re using AI to improve features in YouTube, we’re building it into our Workspace products to help people be more efficient. Whether you work for a small business or a large institution, chances are you spend a lot of time reading documents. Maybe you’ve felt that wave of panic when you realize you have a 25-page document to read ahead of a meeting that starts in five minutes.
At Google, whenever I get a long document or email, I look for a TL;DR at the top — TL;DR is short for “Too Long, Didn’t Read.” And it got us thinking, wouldn’t life be better if more things had a TL;DR?
That’s why we’ve introduced automated summarization for Google Docs. Using one of our machine learning models for text summarization, Google Docs will automatically parse the words and pull out the main points.
This marks a big leap forward for natural language processing. Summarization requires understanding of long passages, information compression and language generation, which used to be outside of the capabilities of even the best machine learning models.
And docs are only the beginning. We’re launching summarization for other products in Workspace. It will come to Google Chat in the next few months, providing a helpful digest of chat conversations, so you can jump right into a group chat or look back at the key highlights.
And we’re working to bring transcription and summarization to Google Meet as well so you can catch up on some important meetings you missed.
Visual improvements on Google Meet.
Of course there are many moments where you really want to be in a virtual room with someone. And that’s why we continue to improve audio and video quality, inspired by Project Starline. We introduced Project Starline at I/O last year. And we’ve been testing it across Google offices to get feedback and improve the technology for the future. And in the process, we’ve learned some things that we can apply right now to Google Meet.
Starline inspired machine learning-powered image processing to automatically improve your image quality in Google Meet. And it works on all types of devices so you look your best wherever you are.
We’re also bringing studio quality virtual lighting to Meet. You can adjust the light position and brightness, so you’ll still be visible in a dark room or sitting in front of a window. We’re testing this feature to ensure everyone looks like their true selves, continuing the work we’ve done with Real Tone on Pixel phones and the Monk Scale.
These are just some of the ways AI is improving our products: making them more helpful, more accessible, and delivering innovative new features for everyone
Making knowledge accessible through computing
We’ve talked about how we’re advancing access to knowledge as part of our mission: from better language translation to improved Search experiences across images and video, to richer explorations of the world using Maps.
Now we’re going to focus on how we make that knowledge even more accessible through computing. The journey we’ve been on with computing is an exciting one. Every shift, from desktop to the web to mobile to wearables and ambient computing has made knowledge more useful in our daily lives.
As helpful as our devices are, we’ve had to work pretty hard to adapt to them. I’ve always thought computers should be adapting to people, not the other way around. We continue to push ourselves to make progress here.
Here’s how we’re making computing more natural and intuitive with the Google Assistant.
We're continually working to advance our conversational capabilities. Conversation and natural language processing are powerful ways to make computers more accessible to everyone. And large language models are key to this.
Last year, we introduced LaMDA, our generative language model for dialogue applications that can converse on any topic. Today, we are excited to announce LaMDA 2, our most advanced conversational AI yet.
We are at the beginning of a journey to make models like these useful to people, and we feel a deep responsibility to get it right. To make progress, we need people to experience the technology and provide feedback. We opened LaMDA up to thousands of Googlers, who enjoyed testing it and seeing its capabilities. This yielded significant quality improvements, and led to a reduction in inaccurate or offensive responses.
That’s why we’ve made AI Test Kitchen. It’s a new way to explore AI features with a broader audience. Inside the AI Test Kitchen, there are a few different experiences. Each is meant to give you a sense of what it might be like to have LaMDA in your hands and use it for things you care about
The first is called “Imagine it.” This demo tests if the model can take a creative idea you give it, and generate imaginative and relevant descriptions. These are not products, they are quick sketches that allow us to explore what LaMDA can do with you. The user interfaces are very simple.
Say you’re writing a story and need some inspirational ideas. Maybe one of your characters is exploring the deep ocean. You can ask what that might feel like. Here LaMDA describes a scene in the Mariana Trench. It even generates follow-up questions on the fly. You can ask LaMDA to imagine what kinds of creatures might live there. Remember, we didn’t hand-program the model for specific topics like submarines or bioluminescence. It synthesized these concepts from its training data. That’s why you can ask about almost any topic: Saturn’s rings or even being on a planet made of ice cream.
Staying on topic is a challenge for language models. Say you’re building a learning experience — you want it to be open-ended enough to allow people to explore where curiosity takes them, but stay safely on topic. Our second demo tests how LaMDA does with that.
In this demo, we’ve primed the model to focus on the topic of dogs. It starts by generating a question to spark conversation, “Have you ever wondered why dogs love to play fetch so much?” And if you ask a follow-up question, you get an answer with some relevant details: it’s interesting, it thinks it might have something to do with the sense of smell and treasure hunting
You can take the conversation anywhere you want. Maybe you’re curious about how smell works and you want to dive deeper. You’ll get a unique response for that too. No matter what you ask, it will try to keep the conversation on the topic of dogs. If I start asking about cricket, which I probably would, the model brings the topic back to dogs in a fun way.
This challenge of staying on-topic is a tricky one, and it’s an important area of research for building useful applications with language models.
These experiences show the potential of language models to one day help us with things like planning, learning about the world, and more.
Of course, there are significant challenges to solve before these models can truly be useful. While we have improved safety, the model might still generate inaccurate, inappropriate, or offensive responses. That’s why we are inviting feedback in the app, so people can help report problems.
We will be doing all of this work in accordance with our AI Principles. Our process will be iterative, opening up access over the coming months, and carefully assessing feedback with a broad range of stakeholders — from AI researchers and social scientists to human rights experts. We’ll incorporate this feedback into future versions of LaMDA, and share our findings as we go.Over time, we intend to continue adding other emerging areas of AI into AI Test Kitchen. You can learn more at: g.co/AITestKitchen.Advancing AI language models
LaMDA 2 has incredible conversational capabilities. To explore other aspects of natural language processing and AI, we recently announced a new model. It’s called Pathways Language Model, or PaLM for short. It’s our largest model to date and trained on 540 billion parameters.
PaLM demonstrates breakthrough performance on many natural language processing tasks, such as generating code from text, answering a math word problem, or even explaining a joke.
It achieves this through greater scale. And when we combine that scale with a new technique called chain-of- thought prompting, the results are promising. Chain-of-thought prompting allows us to describe multi-step problems as a series of intermediate steps.</p>
<p data-block-key="5bmqq">Let’s take an example of a math word problem that requires reasoning. Normally, how you use a model is you prompt it with a question and answer, and then you start asking questions. In this case: How many hours are in the month of May? So you can see, the model didn’t quite get it right.
In chain-of-thought prompting, we give the model a question-answer pair, but this time, an explanation of how the answer was derived. Kind of like when your teacher gives you a step-by-step example to help you understand how to solve a problem. Now, if we ask the model again — how many hours are in the month of May — or other related questions, it actually answers correctly and even shows its work.