Text mining and natural language processing (NLP) methods in LA use computational linguistic techniques to generate insight into student writing, learning dialogues, teacher talk and other forms of educational text. Start off with some overviews of the uses of text mining in learning analytics and then try out one of several tutorials for conducting text mining in either R or Python.
Thank you to Dr. Jelena Jovanovic for providing some of the resources in this section.
1. Get Oriented
Handbook of Learning Analytics Chapters
Three chapters in the 2017 Handbook of Learning Analytics provide complementary perspectives on the use of texting mining in learning analytics.
The Content Analytics (CA) chapter by Vitomar Kovanovic and colleagues describes a set of methods to evaluate, index, filter, recommend, and visualize digital learning content. Considering both educator-created learning materials and student-created learning products, they provide an overview of key text mining methods useful for analysis.
Read the HLA Content Analytics chapter [16 pages]
The Natural Language Processing (NLP) chapter by Danielle Macnamara and colleagues introduces a set of NLP tools commonly used in educational contexts and the linguistic features they index with specific examples of their application.
Read the HLA NLP chapter [12 pages]
The Discourse Analytics (DA) chapter by Carolyn Rosé and colleagues summarizes different representations of text and outlines popular unsupervised and supervised analytic methods for DA as well as their strengths and limitations.
Read the HLA Discourse Analytics chapter [10 pages]
2. Dive Deeper
Text Mining for Learning Content Analysis
This 2019 Learning Analytics Summer Institute (LASI) workshop by Jelena Jovanovic serves as an introduction to text mining through the lens of LA. It covers the main steps of the text mining flow, including text preprocessing, text transformation, feature selection, data mining, and interpretation/evaluation and provides concrete examples.
View the Text Mining for Learning Content Analysis presentation [68 slides] Access the Text Mining for Learning Content Analysis workshop materials
AcaWriter: A Learning Analytics Tool
This 2020 paper by Simon Knight, Antonette Shibani and colleagues introduces the open-source AcaWriter tool, which uses text mining techniques to provide feedback to students on their use of rhetorical moves in their academic writing. The paper describes the tool’s theoretical background and technical implementation as well as discusses three examples of its use.
Read the AcaWriter: A Learning Analytics Tool paper [46 pages]
3. Tools & Tutorials
Text Mining in R
Introduction to Text Analytics with R
This 2019 tutorial by Data Science Dojo provides introductory coverage of analyzing text using R. It focuses on the creation of predictive classification models based on textual features and uses practical examples to walk through all phases of the process. This includes pre-processing textual data, extracting textual features, developing and training classification models using these features, and evaluating accuracy of the trained classification models. All the code and the datasets used in the examples are made available in the description of each video.
Access the Introdiction to Text Analytics with R tutorial [12 parts]
Text as Data
This free course by Chris Bail covers a wide range of ways to obtain and analyze textual data. It offers a detailed introduction and tutorials with sample data and R codes for commonly used techniques in collecting, processing, and analyzing text-based data. Topics covered include mining data from application programming interfaces (APIs), screen-scraping, basic text analysis, dictionary-based text analysis, topic modeling, text networks, and word embeddings. Basic familiarity with R is required.
Access the Text as Data course [8 parts]
Text Mining with R – A Tidy Approach
This 2020 book by Julia Silge and David Robinson is a great introduction for those who are familiar with R but new to text mining. It introduces the tidy text format as a way to structure text data and offers methods to work with text in tidy format using the tidytext package and other tidy tools in R. Compelling examples of real text mining problems are provided to demonstrate the application of these methods.
Access the Text mining with R book [freely available online] Access the Text Mining with R GitHub repository
Text Mining in Python
Natural Language Processing in Python Tutorial
This 2018 introductory level tutorial by Alice Zhao covers text preprocessing techniques, machine learning techniques and Python libraries for Natural Language Processing. It walks you through all of the steps of a text analysis project using an example in Jupyter Notebook. Some familiarity with programming in Python is required.
Watch the NLP in Python tutorial [2 hours] Access the NLP in Python tutorial materials
Modern NLP in Python Tutorial
This 2016 tutorial by Patrick Harrinson offers an introduction of preparing, modeling, visualizing, and analyzing textual data in Python through examples run on a dataset about user reviews published by the business review service Yelp. It assumes a working knowledge of Python, but does not require any pre-knowledge of text analysis.
Watch the Modern NLP in Python video tutorial [1.5 hours] Access the Modern NLP in Python tutorial materials
Introducing Text Analytics and Natural Language Processing with Python
This free EdX course is a practical and scientific introduction to NLP with Python. It takes you through the overall text analytics process (from data collection and preprocessing to evaluation of the results) while also explaining the underlying science of NLP and how artificial intelligence views language differently from humans. It also discusses the limitations of and provides ethical guidelines for applying NLP to real-world problems. The course materials are useful to both those who are new to Python and those who have Python programming experience.
Enroll in the EdX Introduction Text Analytics and NLP with Python course