Abstract: Natural language processing (NLP) coversa large number of topics and tasks related to data andinformation management, leading to a complex andchallenging teaching process. Meanwhile, problem-basedlearning is a teaching technique specifically designed tomotivate students to learn efficiently, work collaboratively,and communicate effectively. With this aim, we developeda problem-based learning course for both undergraduateand graduate students to teach NLP. We providedstudent teams with big data sets, basic guidelines, cloudcomputing resources, and other aids to help differentteams in summarizing two types of big collections:Web pages related to events, and electronic theses anddissertations (ETDs). Student teams then deployeddifferent libraries, tools, methods, and algorithms to solvethe task of big data text summarization. Summarization isan ideal problem to address learning NLP since it involvesall levels of linguistics, as well as many of the tools andtechniques used by NLP practitioners. The evaluationresults showed that all teams generated coherent andreadable summaries. Many summaries were of high qualityand accurately described their corresponding eventsor ETD chapters, and the teams produced them alongwith NLP pipelines in a single semester. Further, bothundergraduate and graduate students gave statisticallysignificant positive feedback, relative to other coursesin the Department of Computer Science. Accordingly,we encourage educators in the data and informationmanagement field to use our approach or similar methodsin their teaching and hope that other researchers will alsouse our data sets and synergistic solutions to approachthe new and challenging tasks we addressed.
Keywords: information system education, computerscience education, problem-based learning, naturallanguage processing, NLP, big data text analytics,machine learning, deep learning.