Back to Home

Machine Reading the Web - Beyond Named Entity Recognition and Relation Extraction - ECMLPKDD2015 Tutorial

  • The Web is inundated with information in many different formats including semi-structured and unstructured data. Machine Reading is a research area that aims to build systems that can read natural-language-based information, extracting knowledge and storing it into knowledge bases. Thus, Machine Reading systems are developed to produce language- understanding technology that will automatically process text in affordable time. In this tutorial the idea of automatically reading the Web using Machine Reading techniques will be explored. Four of the most successful Machine Reading approaches in- tended to Read the Web (namely KnowItAll, Yago, NELL and DBPedia systems) will be presented and discussed. The principles, the subtleties, as well as current results of each approach will be addressed. On-line resources (from each approach) will be explored and the future directions in each system will be pointed out. YAGO, KnowItAll, NELL and DBPedia are not the only research efforts focusing on Reading the Web. They were selected, to be presented in this tutorial, because they show four different and very relevant approaches to this problem, but it does not mean they are the only relevant approaches at all. In spite of mainly focusing on the four aforementioned systems, some other independent contributions on the Read the Web idea will be mentioned and pointed out as related works.
  • Slides for the Tutorial are available in this link

    .
  • Estevam R. Hruschka Jr. has received his B.Sc. and M.Sc. degrees in Computer Science from State University of Lond- rina, Brazil, and from University of Brasilia, Brazil, in 1994 and 1997, respectively; and his Ph.D. degree in Computational Systems from Federal University of Rio de Janeiro, Brazil, 2003. He has been ”young research fellow” at FAPESP (Sao Paulo state research agency, Brazil) and, currently, he is ”research fellow” at CNPq (Brazilian research agency). Also, he is associate professor at Federal University of Sao Carlos (UFSCar), Brazil. Professor Estevam was visiting professor (2008-2010) at Carnegie Mellon University and has been working with Professor Tom Mitchel and Professor William Cohen (at the Carnegie Mellon Read the Web project group - http://rtw.ml.cmu.edu/rtw/people) since the beginning of the ReadTheWeb project (in 2008), helping to build the Never- Ending Language Learning (NELL) system that has started running on January 2010. His main research interests are never-ending learning, machine learning, probabilistic graphi- cal models and natural language understanding. He has worked with many international research teams collaborating with research groups from universities and also from companies.