主讲人:Neil R. Smalheiser教授
时间:4月26日(星期三) 下午2:00
地点:信息管理学院205会议室
主题:《From Topic-Centered IR to IR based on Person, Main Finding or Meaning》
摘要:I wil discuss the issues involved in modeling and predicting which individuals wrote which articles indexed in PubMed, a curated collection of biomedical literature. There are roughly 26 million articles and 9 million different authors, so this is a difficult text mining problem. It is made more difficult by the fact that there does not exist any comprehensive list of known individuals that covers PubMed, nor any gold standards for supervised training data. To solve this problem, we had to employ a pairwise disambiguation model (that is, we examine TWO articles that share a given name and predict whether the same person wrote both articles). We also used many features, careful feature engineering, including the use of implicit features and estimates of the prior probability of match for each last name. Despite the complexity of the model, it works extremely well. (Neural embedding approaches tend to assume that feature selection and feature engineering will be done automatically by the neural networks, but they are unlikely to discover the implicit features.) I will outline how a similar modeling approach might be used to identify a research article's main finding, or the meaning of an ambiguous word seen in text.
主讲人介绍:
Neil R. Smalheiser,男,美国伊利罗伊大学芝加哥分校精神医学系教授。从事神经科学研究30余年,近期研究包括小RNA基因组学、文本挖掘、生物信息学等,提出了多种理论模型,并开发多种应用软件。其研究领域跨度较大,主要从全新的角度围绕不同数据集、方法和科学问题等方面开展。先后在国际顶级期刊和杂志上发表学术论文160余篇,被引7000余次,影响因子704.74,受到同行学者的广泛关注。主要研究兴趣为:神经科学、RNA生物学、医疗信息、文本挖掘。