the Kaleidoscope hypothesis is this idea that uh the world in general and any domain in particular follows the same structure that it appears on the surface to be extremely rich and complex and uh infinitely novel with every passing moment but in reality it is made from a repetition and composition of just a few atoms of meaning
for instance for a few months after the original release of CH GPT if you asked what's heavier 10 kilos of steel or one kilo of feathers it would answer they weigh the same and it would answer that because uh the trick question what's heavier one kilo of steel or one kilo of feathers is found you know all over the Internet and the answer is of course that they way the same
skill and benchmarks are not the primary lens through which you should look at these systems so let's zoom out by a lot there have been historically two currents of thoughts to define the goals of AI first there's the Minsky style view which Echoes uh the current big Tech view that HG would be a system that can perform most economically valuable tasks so Minsky said AI is the science of making machines capable of performing tasks that would require intelligence if done by humans
generalization in AI not skill forget about skill forget about uh benchmarks and that's really the reason why using human exams to evaluate AI models is a terrible idea because exams were not designed with generalization in mind or rather you know they were designed with generalization assumptions that are appropriate for human beings but are not appropriate for machines
To make progress we need a feedback signal so in order to get there we need a clear understanding of what generalization means generalization is uh the relationship between the information you have like uh the prior that you're born with and the experience that you've acquired uh over the course of your lifetime and uh your operational area over uh the space of potential future situations that you might encounter as an agent and they are going to feature uncertainty they're going to F your novelty they're not going to be like the past and generalization is basically uh the efficiency with which you operationalize past information in order to deal with the future
Abstraction is the engine through which you produce generalization so let's take a look at abstraction in general and then we look at abstraction in LMs uh to understand abstraction you have to start by looking around uh zoom out look at the universe uh an interesting observation about the universe is that it's made of many different things that are all similar they're all analogous to each other like one human is similar to other humans because they have the shared origin uh electromagnetism is analogous to hydrodynamics is also analogous to gravity and so on so everything is similar to everything else we are surrounded by isomorphisms I call this Kaleidoscope hypothesis
There are two key categories of analogies uh from which arise two categories of abstraction there's value Centric abstraction and this program Centric abstraction and they're pretty similar to each other they mirror each other they're both about comparing things and then merging individual instances into common abstractions by erasing certain details uh about the instances that don't matter so you take a bunch of things you compare them among each other uh you erase the stuff that doesn't matter what you're left with is an abstraction
Transformers are actually great at type one at Value Centric abstraction uh they do everything that type one is effective for like perception intuition pattern cognition so in that sense Transformers represent a major break through in AI but they're not a good fit for type two abstraction and that is where all the limitations we listed came from this is why you cannot add numbers or why you cannot infer from a is B that b is a as well uh even with a Transformer that's straining all the data on the internet so how do you go forward from here how do you get to type two how do you solve problems like you know rgi right any any reasoning or planning problem
We know that machine learning and type one thinking are good at type one, but we need to merge them with type two thinking provided by program synthesis to go next.
You can embed discrete objects and their relationships into a geometric manifold where you can compare things via a continuous distance function and use that to make fast but approximate inferences about relationships.
There are two exciting research areas to combine deep learning and program synthesis: leveraging discrete programs that incorporate deep learning components and using deep learning models to inform discrete search and improve its efficiency.