TAC 2020 Tracks
Call for Participation
TAC 2020 Workshop
TAC 2020 WORKSHOP
Speaker: Dan Bikel
Title: Entities: What are they, where are they, and how should we model them?
It can be hard to define a term like "entity", but we won't let that stop us trying in this talk. We will also talk about several entities-related lines of research going on at Google, including:
- automatic (or semi-automatic) KG induction
- entity linking without alias tables
- truly multilingual entity linking
- incorporating entity knowledge into other natural language understanding models
- new approaches to entity discovery
Daniel M. Bikel (firstname.lastname@example.org) is a Research Scientist at Google Research. He graduated with honors from Harvard in 1993 with a degree in Classics–Ancient Greek and Latin. From 1994 to 1997, he worked at BBN Technologies on several natural language processing (NLP) problems, including development of the first high-accuracy, machine learning–based name-finder. He received M.S. and Ph.D. degrees in computer science from the University of Pennsylvania, in 2000 and 2004, respectively, discovering new properties of statistical parsing algorithms. From 2004 through 2010, he was a Research Staff Member at IBM Research, working on a wide variety of NLP problems, including parsing, semantic role labeling, information extraction, machine translation and question answering. From 2010 to May, 2015, Dr. Bikel worked on NLP and speech processing research at Google. As a researcher there, he built dynamic adaptation of language models for automatic YouTube captioning, built a generic framework for reranking and was a founding member of the team that built the semantic parser that lay behind Google Now, which has since become the Google Assistant. From 2015-2017, Dr. Bikel was Principal NLP Scientist & Senior Manager at LinkedIn. In that role, he built and led a team of NLP researchers and engineers, building text processing methods and models for use throughout the company, including deep learning approaches specific to LinkedIn’s problem areas. In September of 2017, Dr. Bikel re-joined Google, continuing to work on natural language processing problems, leading a research team focused on using NLP and machine learning to understand specialized domains, and co-leading research on entities. He has published numerous peer-reviewed papers in the leading NLP conference proceedings and journals, and has built software tools that have seen widespread use in the NLP community. He co-edited the book “Multilingual Natural Language Processing Applications: From Theory to Practice”, published by IBM Press/Pearson in 2012.
Speaker: Claire Cardie
Title: Information Extraction Through the Years: How Did We Get Here?
In this talk, I'll examine the state of the NLP subfield of information extraction from its inception almost 30 years ago to its current realization in neural network models. Which aspects of the original formulation of the task are more or less solved? In what ways are current state-of-the-art methods still falling short? What's up next for information extraction?
Claire Cardie is the John C. Ford Professor of Engineering in the Departments of Computer Science and Information Science at Cornell University. She has worked since the early 1990's on application of machine learning methods to problems in Natural Language Processing --- on topics ranging from information extraction, noun phrase coreference resolution, text summarization and question answering to the automatic analysis of opinions, argumentation, and deception in text. She has served on the executive committees of the ACL and AAAI and twice as secretary of NAACL. She has been Program Chair for ACL/COLING, EMNLP and CoNLL, and General Chair for ACL in 2018. Cardie was named a Fellow of the ACL in 2015 and a Fellow of the Association for Computing Machinery (ACM) in 2019. At Cornell, she led the development of the university's academic programs in Information Science and was the founding Chair of its Information Science Department.
Speaker: Luna Dong
Title: Ceres: Harvesting Knowledge from the Semi-structured Web
Knowledge graphs have been used to support a wide range of applications and enhance search and QA for Google, Amazon Alexa, etc. However, we often miss long-tail knowledge, including unpopular entities, unpopular relations, and unpopular verticals. In this talk we describe our efforts in harvesting knowledge from semi-structured websites, which are often populated according to some templates using vast volume of data stored in underlying databases. We describe our AutoCeres ClosedIE system, which improves the accuracy of fully automatic knowledge extraction from 60%+ of state-of-the-art to 90%+ on semi-structured data. We also describe OpenCeres, the first ever OpenIE system on semi-structured data, that is able to identify new relations not readily included in existing ontologies. In addition, we describe our other efforts in ontology alignment, entity linkage, graph mining, and QA, that allow us to best leverage the knowledge we extract for search and QA.
Xin Luna Dong is a Senior Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, and VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”. She serves in VLDB endowment and PVLDB advisory committee, and is a PC co-chair for WSDM'2022, VLDB'2021, KDD'2020 ADS Invited Talk Series.
Speaker: Dan Roth
Title: Natural Language Understanding with Incidental Supervision
In order to access natural language information at the content level, support knowledge extraction, natural language understanding, and natural language communication with computers, there is a need to move toward understanding natural language text at an appropriate level of abstraction, beyond the word level – understand semantics. Machine Learning and Inference methods have become ubiquitous in our attempt to do so. However, learning models that support robust natural language understanding decisions is challenging partly since generating supervision signals for it does not scale, and relying on task specific supervised learning datasets misleads us into thinking that we have solved the problem.
I will describe some of our research on moving out of the “standard” supervised learning paradigm by moving to rely on incidental supervision signals, providing examples from information and event extraction, semantic typing, and text classification. My focus will be on identifying and using incidental supervision signals in pursuing a range of semantics tasks, along with thinking about some of the theoretical issues and key challenges that could allow us to make progress in these directions.
Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, and a Fellow of the AAAS, the ACM, AAAI, and the ACL.
In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.”
Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). He has also served as Program Chair for ACL, AAAI, and CoNLL.
Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was sold to Reveal in 2020.
Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.
Speaker: Chengxiang Zhai
Title: Natural Language Processing Meets Information Retrieval: The Past, Present, and Future
Information Retrieval (IR) is one of the most influential applications of Natural Language Processing (NLP) techniques. Logically, better NLP techniques enable better IR systems. However, despite much progress has been made in NLP, the commercial search engines today remain using only minimum NLP techniques, and research in IR has also been somewhat disconnected with research in NLP. In this talk, I will provide a historical review of research on applying NLP to IR and analyze the reasons for the limited impact of NLP on IR so far. I will present a new vision called TextScope, in which we would integrate research in both IR and NLP naturally to build an intelligent interactive system that extends a search engine to support decision making. I will discuss the major challenges in realizing the vision of TextScope and identify multiple promising directions for future research in NLP and IR.
ChengXiang Zhai (http://czhai.cs.illinois.edu/) is a Donald Biggar Willett Professor in Engineering of the Department of Computer Science at the University of Illinois at Urbana-Champaign (UIUC), where he is also affiliated with the Carl R. Woese Institute for Genomic Biology, School of Information Sciences, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research aims to develop intelligent information systems to help people manage and make use of large amounts of text data with a focus on developing general models, algorithms, and tools that can be applied to all kinds of applications including Web search and mining, biomedical research and health, and intelligent education systems. He has developed many effective models/algorithms for information retrieval based on statistical language modeling that are used in many search engine systems. He has also developed many innovative algorithms for text mining, including contextualized topic analysis, opinion integration and analysis, and joint analysis of text and non-text data. He offers two MOOCs on Coursera on text retrieval and text mining respectively, and has published a textbook on Text Data Management and Analysis. He served as Associate Editors for major journals in multiple areas including information retrieval (ACM TOIS, IPM), data mining (ACM TKDD), and medical informatics (BMC MIDM), Program Co-Chairs of NAACL HLT 2007, SIGIR 2009, and WWW 2015, and Conference Co-Chairs of CIKM 2016, WSDM 2018, and IEEE BigData 2020. He is an ACM Fellow and a member of ACM SIGIR Academy. He received numerous awards, including ACM SIGIR Test of Time Paper Award (three times), the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Award, and UIUC Campus Award for Excellence in Graduate Student Mentoring.