Challenging the Limits of Information Extraction from Data

Takashi Isozaki

We aim to better understand our world, and even the universe, by utilizing data. To achieve this, it's crucial to extract useful and accurate information from data. The question then arises: does current science and technology provide us with sufficient means for this purpose? In my opinion, it is not yet sufficient, and there is still ample room for research. I conduct research for this purpose.

Finding Causation

A major theme of my research is the inference of causality from observational data. Statistical studies on causality are acknowledged as academically challenging tasks. It is believed that there is still a considerable gap in applying and utilizing these studies to the wide variety of data available in reality. Consequently, I have been conducting research on methods and algorithms that are both accurate and highly versatile.

The outcome of the research has been developed into a software tool for causal analysis, which began being applied to real data in 2014. This initiative has since expanded within the Sony Group as the CALC project, in cooperation with our headquarters' division. It is now utilized across a wide range of Sony's data analyses, including in the electronics, entertainment, finance, and other service sectors in Japan, the United States, and Europe. Together with many cooperating members and supporters, we are working to transform Sony's data analysis operations. Based on these results, the software license sales to external parties and commissioned analysis are also conducted in collaboration with external companies and the system is utilized in various fields in the manufacturing and service industries.

At the heart of this research lies the aspiration to deepen our understanding of the world and the universe by enhancing our ability to discern causal relationships from data. This necessity for the development of methods to more accurately identify causation from data is a driving force behind our work. An interesting aspect of this field from the perspective of basic science is that it extends beyond the confines of statistics alone. It is believed to intersect with the philosophy of science and physics, and we are pursuing considerations in these areas as well. Our research on inference methods, algorithms, and the utilization of causal information addresses both vast unexplored challenges and significant issues that emerge during practical application. The project includes researchers working together to confront the limits of estimation and inference from data. We are also exploring applications in other academic disciplines and advancing collaborative research with several universities.

Exploring the Fundamental Principles of Estimation

One of the other research topics I am working on, though very fundamental, concerns the concepts related to the estimation of probabilities and statistical models from data. It is commonly understood that probability estimation, in everyday terms referred to as a ratio or, in slightly more specialized language, as a relative frequency, is well integrated into our daily lives. The principle of maximizing the likelihood, which means making the estimation such that the quantity known as the likelihood in statistics is maximized, underpins the calculation of probability estimates based on this relative frequency. This principle, known as frequentism, forms the foundation of statistical thinking. However, it does not hold for any sample size and is particularly unreliable with a small number of samples. In other words, the principle of maximum likelihood can be considered a type of approximation.

With this in mind, I began researching the possibility of a more universal principle and was inspired by a concept based on thermodynamics in physics, which should be referred to as the principle of minimum free energy. This idea, initially proposed as an embryonic study in 2007, has since been refined and applied to methods for inferring causality. The application of this principle to statistical science is promising because it allows for a unified discussion from the estimation of probabilities to model selection. Moreover, if this concept is applicable, it could bridge the physical world with the extraction of information from finite data, suggesting a profound connection to the philosophy of science. Interestingly, a similar (yet distinct) principle was proposed in neuroscience around the same time, and related ideas have been utilized in techniques for large-scale language models in AI, underscoring the increasing value of further exploration in this area.

上へ戻る