June 5, 2021
Virtual on Zoom
Because of the ongoing pandemic, the 2021 Annual Meeting of ISDSA was held online on June 5, 2021. Eight speakers shared their research during the meeting in an hour-long address.
To view the recording of the meeting, go to https://meeting.isdsa.org/2021/talks.php
The speakers are:
There was also be a pre-conference workshop on statistical power analysis by Prof. Johnny Zhang from the University of Notre Dame.
To view the recording of the workshop, go to https://meeting.isdsa.org/2021/workshop.php
Dr. Mike Cheung is a Professor at the Department of Psychology of the National University of Singapore. His research interests are quantitative methods, especially in the topics of meta-analysis, structural equation modeling, and multilevel modeling. His current research focus is integrating meta-analysis into the structural equation modeling framework.
Talk title: Integrating meta-analysis within the structural equation modeling framework
Abstract: Structural equation modeling (SEM) and meta-analysis are two powerful statistical methods in the educational, social, behavioral, and medical sciences. Researchers usually treat them as two unrelated topics in the literature. This presentation gives an overview of how many meta-analytic models, such as univariate, multivariate, and three-level meta-analyses, can be integrated under the SEM framework.
Kevin Grimm, Ph.D., is a Professor in the Department of Psychology at Arizona State University. He directs the Health and Developmental Research Methods Laboratory at Arizona State University. Grimm's current research focuses on data integration, the specification of growth models for binary and ordinal outcomes, longitudinal measurement invariance, and the development and application of data mining techniques for psychological science.
Talk title: A Multiple Imputation Approach for Handling Missing Data in Decision Trees
Abstract: Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and value that leads to the best prediction when the data are partitioned based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because the algorithm cannot place an observation with a missing value on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place such observations. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. A modified multiple imputation approach is proposed to handling missing data in DTs, and we compare this approach with listwise deletion, delete if selected, majority rule, surrogate splits, and single imputation via Monte Carlo Simulation. The proposed multiple imputation approach and surrogate splits showed superior performance with respect to prediction accuracy, variable selection, tree size. The proposed multiple imputation approach performed best in the severe MAR conditions (e.g., strong associations among predictors, multiple predictors of missing values, small sample sizes, etc.), whereas surrogate splits performed best in MCAR or mild MAR conditions (e.g., weak associations among predictors, etc.).
Dr. Qiwei (Britt) He is a Research Scientist for the National and International Assessment at Educational Testing Service (ETS), where she helps oversee research projects in international large-scale assessments such as PISA and PIAAC. Besides applying advanced techniques such as text mining and IRT in assessments, she has broadened her research to focus on developing new methods in analyzing big data, such as process data in log files, and to understand individuals' behavior during learning and testing.
Talk title: Leveraging Process Data in Large-Scale Assessments with Sequence Mining
Abstract: The increasing availability of data in computer-based learning and assessment environments brings a great opportunity to track big data in getting a deeper understanding about people’s behavioral patterns and cognitive process. These new data sources, in particular finer-grained process data, are often in complex and multidimensional form that would need to be analyzed with an integration of data-driven analytic approaches in addition to classical psychometric models. This talk presents recent explorations in process data analysis with sequence and text mining techniques and illustrate how to leverage process data in international large-scale assessments (e.g., PISA and PIAAC) to assist in understanding how respondents interact with the items administered, thus support test construction, improve validity of conclusions, and facilitate cross-national comparisons.
Dr. David Hunter is a Professor of Statistics at The Pennsylvania State University. Dr. Hunter has published widely on statistical models for networks and is a co-creator of the statnet suite of packages for network analysis in R. He co-proposed the MM algorithms and has written extensively on this and other EM-like algorithms. He has also extended the theory and computational practice of unsupervised clustering using nonparametric finite mixture models.
Talk title: Modeling Homophily in ERGMs for Bipartite Networks
Abstract: Bipartite networks, in which there are two disjoint sets of nodes and edges are only allowed that connect one set with the other, represent an important tool for modeling processes such as affiliations, collaborations, and co-location. Frequently, we would like to model the propensity of similar nodes to form links among themselves, a property referred to as homophily. This talk discusses homomphily models in the context of exponential-family random graph models (ERGMs). Ordinarily these models are straightforward, but in a bipartite network they become complicated due to the prohibition of direct ties between nodes of the same type. We discuss a novel method for modeling homophily in this framework and illustrate its use.
Dr. Nilam Ram is a Professor in the Departments of Communication and Psychology at Stanford University. Nilam’s research grows out of a history of studying change. His current projects include examinations of age-related change in children’s self- and emotion-regulation; patterns in minute-to-minute and day-to-day progression of adolescents’ and adults’ emotions; and change in contextual influences on well-being during old age. He is developing a variety of study paradigm that use recent developments in data science and the intensive data streams arriving from social media, mobile sensors, and smartphones to study change at multiple time scales.
Talk title: Screenomics: A New Venue for Discovering the Dynamics of Digital Life through Mining and Modeling of “Big Data” As “Small Data”
Abstract: We have recently developed and forwarded a new approach for capturing, visualizing, and analyzing the unique record of an individual’s everyday digital experiences – screenomics. In our quest to derive knowledge from and understand screenomes – ordered sequences of hundreds of thousands of smartphone and laptop screenshots obtained every five seconds for up to one year – the data have become a playground for learning about computational machinery used to processes images and text, machine learning algorithms, human-labeling of taxonomies, qualitative inquiry, and the tension between N = 1 and N = many approaches. Using a selection of empirical examples, we illustrate how engagement with these new data is reshaping what we know about behavioral change in a wide variety of domains and how we study the person-context transactions that drive individuals’ digital lives.
Doug Steinley, Ph.D., is a Professor at the University of Missouri. His research focuses on multivariate statistical methodology, with a primary interest in cluster analysis and social network analysis. His research in cluster analysis focuses on both traditional cluster analytic procedure (e.g., k-means cluster analysis) and more modern techniques (e.g., mixture modeling). In that the formulation of the general partitioning problem can be thought of in a graph theoretic nature, his research also involves combinatorics and social network analysis.
Talk title: Advanced Methods in Clustering and Scaling: Applications to Political Typologies and Media
Abstract: Advanced methods for K-means clustering are presented. A hypothesis testing approach for choosing the number of clusters is presented. Additionally, work for imposing an order-constrained clustering on a unidimensional factor is shown. Both methods are applied to develop political typology and highlight media usage among a random sample of voters. A new method for determining subsets of variables that define a cluster is presented as a future direction to explore.
Dr. Matthew Wilkens is an Associate Professor of information science at Cornell University. He uses quantitative and computational methods to study large-scale developments in literary and cultural history. His work has focused in particular on literary text mining, geolocation extraction, genre detection, and the cross-pollination of critical and social-scientific methods. He is the director of the Textual Geographies project, a co-investigator of the Text Mining the Novel project, a founding editorial board member of the Journal of Cultural Analytics, and the author of Revolution: The Event in Postwar Fiction.
Talk title: Data Science in the Humanities
Abstract: Humanities disciplines like literature, history, and philosophy aren't the first things people think of when their minds turn to data science. But these fields stand to benefit from data-driven methods, and the challenging problems that humanists explore could be of great interest to researchers in data science and computational social science. This talk presents recent examples of large-scale, data-intensive humanities work, covering problems such as literary genre detection, spatial evolution in books and newspapers, and the use of generative language models to assess textual novelty. It also offers some advice for data scientists, drawn from the notoriously complex questions that the humanities seek to answer.
Dr. Jerry Wu is currently Professor at National Yang Ming Chiao Tung University, Taiwan. He is a quantitative methodologist specializing in Multilevel Structural Equation Modeling (MSEM) with cross-sectional and longitudinal data. His research interests focus on students' online reading behavior and performance as well as factors that motivate or hinder students' selective attention during online learning.
Talk title: Learning Analytics within the technology-enhanced environment
Abstract: In the age of data deluge, people’s digital traces, such as log files, discourse, and interaction data, bring unparalleled potential to examine their learning from different facets. Growing interest has given to the development and use of advanced learning technology and social media to support Learning Analytics. In this talk, Dr. Jiun-Yu Wu will introduce a series of Human-centered Learning Analytics studies using data mining, supervised machine learning, and social network analysis techniques. These studies will show how learning analytics can be applied to monitor students’ learning progress, explore their interactions among peers and artifacts, and identify students at risk of failure within the Personal Learning Environment (PLE) premised on social media. The analytical findings will be discussed in line with the theoretical support and pedagogical design to build an effective learning environment for facilitating students’ learning in the post-pandemic era.