【インタビュー連載】ティアゴ・ラマル, リードリサーチサイエンティスト – 機械学習の研究と汎用人工知能への挑戦
In the second installment of our interview series with Tiago Ramalho, Lead Research Scientist at Cogent Labs, we ask him about AI and machine learning research. He explains his own research work, while also touching on the overall research landscape and areas where exciting progress is being made.
- Topic 2: Machine Learning Research and the Path to Artificial General Intelligence
- Q: What research are you working on?
- Q: For each thread, could you explain what specific work you are doing?
- Q: What about meta-learning?
- Q: And lastly uncertainty prediction?
- Q: Do you see these different threads converging?
- Q: What are some other research areas that you are interested in but are not working on yourself?
- Q: What do you think about the current state of AI and machine learning research?
Topic 2: Machine Learning Research and the Path to Artificial General Intelligence
Q: What research are you working on?
TR: I am working on methods to improve the generality of existing AI methods while at the same time making them more robust. To achieve this goal, I have been focusing on three specific topics: grounded language, meta-learning and uncertainty prediction.
Q: For each thread, could you explain what specific work you are doing?
TR: Sure. With grounded language, the broad motivation is to build models that understand language in a way more similar to how humans do. Current language models are trained by looking at millions of sentences and trying to predict a missing word. In this way they can’t learn anything more than the statistics of the language (i.e. after seeing a specific word, which words are more likely to follow). By contrast, a human can relate a sentence to a range of different experiences (visual, auditory, abstract concepts, etc.) which are not contained in the language itself.
To achieve this goal, we need to train AIs which can understand how things relate to each other. For example, if I see a bottle on a table I can predict what would happen if the bottle is moved to the left, to the right, or off the edge of the table. I can do this by relating multiple sensory streams such as visual, auditory and tactile information as well as the actions I take in this environment.
As a very early step towards building models with these richer representations of the world, we have simplified the problem down to two sensorial modalities, language and vision. We set up a task where the model is asked to draw a picture of a scene based on a language description, and vice versa. In a recent paper we showed that we can train a model to build a perspective-invariant representation of a scene purely from a language description. This means that the model can read about a scene and then imagine what it would look like from any perspective.
Q: What about meta-learning?
Most deep neural networks are trained to perform a task by looking at a dataset of known input – output pairs. For example, to classify cats and dogs you would collect a dataset with pictures of cats and associate the label ‘cat’ with those pictures and do the same for ‘dog’. After training, the AI model should be able to distinguish cats from dogs with high accuracy. However, in the real world there might be inputs we did not add to our training set. For example, our AI might be shown the picture of an elephant, in which case it would not know what to do.
A meta-learning model is an extension of classic supervised training techniques that can cope with new information appearing after training. The key idea is to endow the model with a memory, so that it can store previously seen experiences. Then, instead of memorizing the labels in the training set, we can train the network to associate the data it is seeing with whatever is stored in its memory, which can be extended as new data becomes available. In the example above, this would mean we could add the ‘elephant’ label to the model’s memory when we encounter an elephant picture. By continuously integrating new information, the AI model can be more efficient and robust when deployed in the real world.
There are still many challenges in implementing meta-learning models. In a recent paper, we have developed a new technique to store data in a meta-learning model as efficiently as possible. Ideally, you want the model to store as few items as necessary for its prediction accuracy to be high. To achieve this, the model needs to recognize when it sees a surprising input so that only the most informative input – output pairs are stored. We used information theory to determine a good measure of surprise and shown that we can improve memory use for this class of model.
Q: And lastly uncertainty prediction?
TR: Consider the example I gave before. An AI trained only on dogs and cats will only know about those labels. So the first time it sees an elephant, it will incorrectly label it as either a ‘dog’ or ‘cat’. A human would most likely answer ‘I don’t know’ if they see something significantly different to what they have learned before. So the question here is, how can the model know when it doesn’t know something?
This question also relates to the meta-learning topic we discussed before. Usually in the meta-learning scenario we assume the labels for new information are given. But in the real world the AI might encounter unlabeled examples in the wild. If it can determine those don’t match what it’s seen before it can ask human guidance only for a limited subset of data, which could make these algorithms more practical in a variety of scenarios.
We can also ask how certain the model needs to be before saying ‘I don’t know’ or risking a prediction. This balance can be rigorously defined using a utility function. This concept has been studied extensively in economics and neuroscience but it has not found its way to deep learning research yet. So I am excited about that direction for now.
Q: Do you see these different threads converging?
TR: Absolutely. A meta-learning system with uncertainty prediction can be deployed in real world situations with more confidence in its ability to not make catastrophic predictions. Furthermore, it can tag any outliers for humans to classify, which allows us to improve the dataset in a more efficient manner.
Under the current paradigm, humans create a dataset, train the model, and then deploy it. After that, we no longer have any influence over the model. What I hope to achieve to have a training phase and then creating a feedback loop between the model and humans which allows its performance to continuously improve.
Q: What are some other research areas that you are interested in but are not working on yourself?
TR: One area that I am following closely is curiosity and exploration. There are a lot of connections to the topics I am working on. Let’s say you have a model that knows when it does not know something, and it needs to get more information. In this scenario, a human might ask someone else for information, or alternatively we can perform experiments to test certain hypothesis. For an example when we see a new object we might pick it up to determine whether it is heavy or light, hot or cold, etc.
That is the basic idea behind exploration systems. We want not only to teach the model to know when it does not know something, but also to choose actions that will give it the right information. Recent reinforcement learning agents have learned to play complex video games by playing the same game over and over again for the equivalent of thousands of years. While they can reach superhuman performance on some games, a human can learn to play the same game in only a few minutes.
In the beginning, these agents start off by taking completely random actions. When they reach a state that gives them a reward, they increase the probability of the sequence of actions that led them to that state. But most of the time they end up in a state that they have seen before. When humans play a new game we act quite differently: when faced with a novel situation we take specific actions that give us maximum information about the environment.
The people doing work on curiosity and exploration are trying to develop machine learning models that emulate this behavior. To do so, the agents need to be able to estimate their uncertainty, figure out what actions will maximally reduce their uncertainty to build good models of the world, and integrate all multisensorial information into a meta-learning system that can build a consistent picture of the world.
Q: What do you think about the current state of AI and machine learning research?
TR: In 2014 the publication of AlexNet triggered a revolution in the field of AI. This paper showed that very large neural networks coupled with enough computational power and memory (that only had only recently become available) can solve real world problems with minimal feature engineering. In the years that followed researchers applied neural networks to almost all computational problems involving high dimensional, noisy data to varying degrees of success. Now we are coming to a period where all the low hanging fruit has been picked and the field is stabilizing a bit. The best practices have been established and we generally have an idea as to whether a neural network approach is going to work for a specific problem or not.
To reach a system closer to human reasoning we are probably going to need more breakthroughs (as opposed to just throwing more computation at the problem). To use the brain as a metaphor, current deep neural network approaches work like the areas in our brain that process sensorial streams. They can take very high dimensional and noisy natural signals such as images or sound and compress them into a small description that contains the signal relevant for whatever task they are solving. Humans however, do much more than that. We can apply inductive reasoning to the representations of the world and combine memory with abstract reasoning to predict what’s going to happen next in the world and plan actions accordingly.
Current deep neural networks still cannot do this end-to-end. Some systems which can perform some of these tasks require a lot of extra engineering work (human knowledge) baked in to achieve these goals. If some breakthrough allows us to train a system end-to-end to solve these problems, then we could see the next revolution of AI.
It is hard to predict when that could happen. And after such a breakthrough you need engineering and computational power to be able to scale it to large systems. So there is a danger that AI will be controlled by large corporations. With that in mind, I am excited to play a role in spreading the knowledge I’ve acquired at DeepMind to a smaller company like Cogent Labs, and democratize the use of AI throughout the world.