An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms

Alkabbany, Islam; Ali, Asem M.; Foreman, Chris; Tretter, Thomas; Hindy, Nicholas; Farag, Aly

doi:10.3390/s23031614

Open AccessArticle

An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms

by

Islam Alkabbany

^1,*,

Asem M. Ali

¹

,

Chris Foreman

¹,

Thomas Tretter

²,

Nicholas Hindy

³ and

Aly Farag

¹

Electrical and Computer Engineering Department, University of Louisville, Louisville, KY 40292, USA

²

College of Education and Human Development, University of Louisville, Louisville, KY 40292, USA

³

Department of Psychology, College of Charleston, Charleston, SC 29424, USA

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(3), 1614; https://doi.org/10.3390/s23031614

Submission received: 1 January 2023 / Revised: 19 January 2023 / Accepted: 21 January 2023 / Published: 2 February 2023

(This article belongs to the Special Issue Artificial Intelligence in Medical Imaging and Visual Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

The ability to measure students’ engagement in an educational setting may facilitate timely intervention in both the learning and the teaching process in a variety of classroom settings. In this paper, a real-time automatic student engagement measure is proposed through investigating two of the main components of engagement: the behavioral engagement and the emotional engagement. A biometric sensor network (BSN) consisting of web cameras, a wall-mounted camera and a high-performance computing machine was designed to capture students’ head poses, eye gaze, body movements, and facial emotions. These low-level features are used to train an AI-based model to estimate the behavioral and emotional engagement in the class environment. A set of experiments was conducted to compare the proposed technology with the state-of-the-art frameworks. The proposed framework shows better accuracy in estimating both behavioral and emotional engagement. In addition, it offers superior flexibility to work in any educational environment. Further, this approach allows a quantitative comparison of teaching methods.

Keywords:

student engagement; AI; behavioral engagement; emotional engagement

1. Introduction

The human face is an important tool for nonverbal social communication. Therefore, facial expression analysis is an active research topic for behavioral scientists. Due to its broad impact on several applications, such as pain assessment [1], diagnosis and treatment for autistic children [2] and detecting their emotional patterns [3], detecting distracted drivers [4], measuring students’ engagement [5], and human–computer interaction [6], in this work, we focus on student engagement. Facial expression analysis has attracted significant attention in the medical image processing community. There were early trials to study facial expressions. In 1862, G. Duchenne electrically stimulated facial muscles and concluded that movement of the muscles around the mouth, nose, and eyes constitute the facial expressions [7]. To express an internal emotional state, a person moves a set of facial muscles; see Figure 1.

Despite the urgent demand for graduates from science, technology, engineering, and mathematics (STEM) disciplines, large numbers of U.S. university students drop out of engineering majors [8]. Nearly one-half of students fail to complete an engineering program at the University of Louisville, which is consistent with national retention rates at large, public institutions [9,10]. This number is even higher for at-risk women, racial and ethnic minorities, and first-generation college students [11,12]. The greatest dropout from engineering occurs after the first year, following standard gateway mathematics courses such as calculus [13,14]. Dropout from the engineering major is strongly associated with performance in first-year mathematics courses [10,13]. Part of the difficulty, not limited to engineering, is the transition from secondary to college education in mathematics. Students often retain and apply only a surface-level knowledge of mathematics [15]. In addition, socio-psychological factors, such as perceptions of social belonging, motivation, and test anxiety, predict first-year retention [13,16,17,18].

Figure 1. Human facial muscles that are responsible for different expressions. This illustration was designed using images generated by ARTNATOMIA [19].

The ability to measure students’ engagement in an educational setting may improve their retention and academic success. This ability may reveal disinterested students or which segments of a lesson cause difficulties. The main goal of the proposed work is to provide the instructors with a tool that could help them in estimating both the average class engagement level and the individuals’ engagement levels while they give lectures in real-time. This system could help the instructors to take actions to improve students’ engagement. Additionally, it could be used by the instructor to tailor the presentation of material in class, identify course material that engages and disengages with students, and identify students who are engaged or disengaged and at risk of failure.

Currently, feedback on student performance relies almost exclusively on graded assignments, with in-class behavioral observation by the instructor as a distant second. In-class observation of engagement by the instructor is problematic because he/she is primarily occupied with delivering the learning material. Indeed, adaptive learning environments allow free-form seating, and the instructor may not be able to have direct eye contact with the students. Even in traditional classroom seating, an instructor would not be able to observe a large number of students while lecturing. Therefore, it is practically impossible for the instructor to watch all students all the time while recording these observations student by student and correlating them with the associated material and delivery method. Moreover, these types of feedback are linked to the in-class environment. In an e-learning environment, the instructor may lose any feedback to sense student engagement. Performance on assignments can also be ambiguous. Some students can be deeply engaged yet struggling, whereas other students can be only minimally engaged; both groups end up with poor performance. Other students may manage good performance while lacking a deeper understanding of the material, e.g., merely studying to memorize an exam without engagement in the learning process.

The education research community has developed various taxonomies describing student engagement. After analyzing many studies, Fredricks et al. [20] organized engagement into three categories. Behavioral engagement includes external behaviors that reflect internal attention and focus. It can be operationalized by body movement, hand gestures, and eye movement. Emotional engagement is broadly defined as how students feel about their learning, learning environment, teachers, and classmates. Operationalization of emotional engagement includes expressing interest, enjoyment, and excitement, all of which can be captured by facial expressions. Cognitive engagement is the extent to which the student is mentally processing the information, making connections with prior learning, and actively seeking to make sense of the key instructional ideas. The two former engagement categories can be easily sensed and measured. Cognitive engagement is generally less well-defined than the other two modes of engagement and is more difficult to externally operationalize due to its internalized nature [21,22]. The three components (shown in Figure 2) that comprise student engagement are behavior, emotion, and cognition [20]. These components work together to fully encompass the student engagement construct, and each component has been found to contribute to positive academic outcomes (e.g., [20,23]).

One of the significant obstacles to assessing the effect of engagement in student learning is the difficulty of obtaining a reliable measurement of engagement. Using biometric sensors (such as cameras, microphones, heart rate wristbands sensors, and EEG devices) is a more dynamic and objective approach for sensing. This work focuses primarily on measuring the emotional and behavioral components of the engagement and on designing a biometric sensor network and technologies for modeling and validating engagement in various class setups.

Literature Review

The literature uses a variety of terms for instructional approaches (e.g., active vs. passive; hands-on vs. passive; minds-on vs. shallow receptive mental modes) that could potentially lead to variation in cognitive engagement. Table 1 provides a summary of the psychological constructs [24,25,26,27,28] for the three types of engagement, which could be used to devise a computational counterpart to automate engagement monitoring.

Despite the advances in machine recognition of human emotion, there have been a small number of studies of facial expressions related to learning-centered cognitive affective states. Computer vision methodology can unobtrusively estimate a student’s engagement from facial cues; e.g., [5,29,30,31,32,33,34,35,36]. Such studies apply one or more of the following paradigms. Observation and annotation of affective behaviors, investigation of facial action units involved in learning-centered effect, and application of automated methods to detect affective states.

Kapoor and Picard [29] used a camera equipped with IR LEDs to track pupils and to extract other facial features: head nodding, head shaking, eye blinking, eye and eyebrow shapes, and mouth activities. Additionally, a sensing chair is used to extract information about the postures. Moreover, they recorded the action that the subject was doing on the computer. Then, a mixture of Gaussian processes combined all the information and predicted the current affective state. In their study, eight children (8–11 years) were enrolled. Children were asked to solve puzzles on a computer. For 20 min, the screen activity, side view, and frontal view were recorded. From the collected videos, 136 clips were extracted (up to 8 s long). Teachers were asked to observe and record the affective state at eight samples per second. The affective states under consideration were high, medium, and low interest, boredom, and “taking a break”. The recognition rates of an interest vs. disinterest SVM classifier (for 65 interest samples and 71 disinterest 71 samples) were 69.84% (using upper face information) and 57.06% (using lower face information). They got a 86.55% recognition rate by combining all information, not only the facial features, using a mixture of Gaussian processes.

To detect the emotions that accompany deep-level learning, McDaniel et al. [30] investigated facial features. The affective states under consideration were boredom, confusion, delight, flow, frustration, and surprise. To perform their study, they asked 28 undergraduate students to interact with AutoTutor. First, participants completed a pretest. Then, videos of the participants’ faces were captured while interacting with the AutoTutor system for 32 min. Finally, they completed a posttest. After that, the affective states’ annotation was done by the learner, a peer, and two trained judges. The ground truth of the data was obtained from the trained judges, with an interjudge reliability Cohen’s kappa of 0.49. After that, the data were sampled to 212 emotion video clips (3–4 s) with affective states of boredom, confusion, delight, frustration, and neutral. Finally, two trained coders coded participants’ facial expressions using Ekman’s Facial Action Coding System. They computed correlations to determine the extent to which each of the AUs was diagnostic of the affective states of boredom, confusion, delight, frustration, and neutral. Their analyses indicated that specific AUs could classify confusion, delight, and frustration from neutral, but boredom was indistinguishable from neutral.

In order to study the learning-centered effect, Grafsgaard et al. [32] used an automated facial expression recognition tool to analyze videos of computer-mediated human tutoring. They collected a dataset of 67 undergraduate students who are learned from an introductory engineering course using JavaTutor software. Participants took six sessions of 45 min. Each session started with a pretest; then came the teaching session and post-session surveys; and finally, there was the posttest. During the teaching session, database logs, webcam facial video, skin conductance, and Kinect depth video were collected. Two trained coders coded participants’ facial expressions using a Ekman’s Facial Action Coding System to annotate the data. They recorded the five most frequently occurring AUs (1, 2, 4, 7, and 14). The authors used the CERT toolbox [37] to extract these 5 AUs automatically. Additionally, they computed the normalized learning gain from the posttest and the pretest scores. They claimed the following conclusions: outer brow raise (AU2) was negatively correlated with learning gain. Brow lowering (AU4) was positively correlated with frustration. Mouth dimpling (AU14) was positively correlated with both frustration and learning gain. Additionally, facial actions during the first five minutes were significantly predictive of frustration and learning at the end of the tutoring session.

Whitehill et al. [33] introduced an approach for automatic recognition of engagement from students’ facial expressions. They claimed that human observers reliably agree when discriminating low and high degrees of engagement (Cohen’s k = 0.96). This reliability decreased to k = 0.56 for four distinct levels of engagement. Additionally, they claimed that static expressions contain the bulk of the information used by observers, not the dynamic expressions. This claim means that engagement labels of 10-s video clips can be reliably predicted from the average labels of their constituent frames (Pearson r = 0.85). They collected a dataset of 34 undergraduate students who trained using cognitive skill training software. Each session started with an explaining video (3 min). Then there was a pretest (3 min), a training video (35 min), and finally, a posttest. The participant’s face was recorded during the training. To annotate the data, the video frames were coded by seven labelers using a scale to rate the engagement: 1: not engaged, 2: nominally engaged, 3: engaged in the task, 4: very engaged, and X: unclear frame. Then, 24,285 frames were selected such that the difference between any two labelers did not exceed one, and no labeler assigned X to the frame. The “ground truth” label of a frame was the integer average of all labels. Gabor features were extracted from the detect face to generate a 40 × 48 × 48 feature vector. Then, four binary SVM classifiers were used to detect a level out of the four levels of engagement. Finally, a multinomial logistic regressor was used to combine the output of the four binary classifiers. They claimed that automated engagement detectors perform with comparable accuracy to humans.

Li and Hung [35] reported enhancement of student engagement by the fusion of facial expressions and body features. Fusion of more disparate data can also enhance engagement measures, such as video facial expression with wristband heart rate data [38], and posture with electrodermal activity data fusion [39]. The use of context was explored by Dhamija and Boult [40] in the area of online trauma recovery, and they and others have found significant evidence [33,38,41,42] that facial-expression-based estimation of engagement is nearly universal. Additional work by Svati and Boult [36] explored the influences of mood awareness on engagement classification, where the mood is the prevailing state of emotion independent of the current task, e.g., classroom learning. Emotion affects the domain in which facial expressions and other biometrics are collected, and the understanding of how emotion affects engagement serves to fine-tune the use of these biometrics.

Ahuja et al. introduced a framework to sense a set of engagement-related features (EduSense) [34]. They extracted facial landmarks and use them to find facial features such as head pose and smile detection. They also performed body segmentation and body-key-point extraction. They used this to extract features such as detection of hand raise and sit vs. stand detection. Furthermore, they performed speech detection to find the ratio between instructor speech time and student speech time.

Table 2 is a listing of various features used in recent literature. Table 3 is a comparison among various frameworks and experiments to measure student engagement in term of learning context, sensors, affictive state, dataset, and annotation. Our team developed a framework for measuring student engagement level using facial information for an e-leraning environment [5,43,44,45,46]. The main goal of this paper is to propose a framework to measure the student engagement level in the in-class environment. Estimating engagement level in the in-class environment is more complicated. Rather than the presence of only one target of interest (laptop screen) in the case of e-learning, there are multiple targets of interest in the in-class environment. The student may look at the instructor, the whiteboard, the projector, his/her laptop screen, or even one of his/her peers. Therefore, the framework should track where each student’s gaze is. Then, it should relate them together to estimate the students’ behavioral engagement.

2. Proposed Behavioral Engagement Module

Behavioral engagement consists of the actions that students take to gain access to the curriculum. These actions include self-directive behaviors outside of class, such as doing homework and studying; and other activities are related, such as shifting in the seat, hand movements, body movements, or other subs/conscious movements while observing lectures. Finally, one can participate cooperatively in in-class activities [72,73].

Head pose and eye gaze are the main metrics with which to measure the student’s behavioral engagement. By estimating the student’s point of gaze, it can be estimated whether he/she is engaged with the lecture. If the student is looking at his/her laptop or lecture notes, the whiteboard, the projector screen, or the lecturer, he/she is probably highly behaviorally engaged. If a student looks at other things, he/she is probably not engaged. In the proposed system, distracted and uninterested students are identified by a low behavioral engagement level regardless of the reason for this distraction. For a regular class setting with an assumption that students are in good health, this distraction is related to class content. On the other hand, a student’s illness can be detected by measuring the student’s vital signs using a wristband. Additionally, a student’s fatigue can be identified using his or her emotions. Moreover, other abnormalities, such as eye problems and nick movement problem, can be identified by the instructor at the beginning of the class. All these types of disengagement should not be included in class content evaluation.

In the proposed framework, two sources of video streams were used. The first source was a wall-mount camera that captured the whole class, and the second source was the dedicated webcam in front of each student. The proposed pipeline is shown in Figure 3. The first step in the framework is tracking key facial points and using them to extract the head pose [74]. It takes advantage of using a convolutional experts-constrained local model (CE-CLM), which uses a 3D representation of facial landmarks and projects them on the image using orthographic camera projection. This allows the framework to estimate the head pose accurately once the landmarks are detected. The resulting head pose could be represented in six degrees of freedom (DOF) (three degrees of freedom of head rotation (R)—yaw, pitch, and roll—and 3 degrees of translation (T)—X, Y, and Z). Eye gaze tracking is the process of measuring either the point of gaze or the motion of an eye relative to the head. The eye gaze could be represented as the vector from the 3D eyeball’s center to the pupil. In order to estimate the eye gaze using this approach, the eyelids, iris, and pupil are detected using the method in [75]. The detected pupil and eye location are used to compute the eye gaze vector for each eye. A vector from the camera origin to the center of the pupil in the image plane is drawn, and its intersection with the eye-ball sphere is calculated to get the 3D pupil location in world coordinates.

The wall-mounted camera provides the head pose only, as the faces size is too small to get accurate eye gaze from it, and the students’ cameras provide us with both head poses and eye gazes. Each camera provides the output in its world coordinates. Therefore, the second step is to align all the camera’s coordinates to get all students’ head poses and eye gazes in a common world-coordinate system. Given a well known class setup, the target planes could be found through one-time pre-calibration for the class. The intersections of the students’ head pose/eye gaze rays and the target planes are calculated. To eliminate noise, the feature was combined within a window of time of size T. Then, the mean point of gaze could be found on each plane in addition to the standard deviation for each window of time. The plane of interest in each window of time is the one with the least standard deviation of the students’ gaze. For each student, the student’s pose/gaze index could be calculated as the deviation of student’s gaze points from the mean gaze point in each window if time. This index is used to classify the average behavioral engagement within a window of time.

3. Proposed Emotional Engagement Detection Module

Emotional engagement is broadly defined as how students feel about their learning [76], learning environment [77], and instructors and classmates. Emotions include happiness or excitement about learning, boredom or disinterest in the material, and frustration due a struggle to understand [78]. In this section, a novel framework for the automatic measurement of the emotional engagement level of students in an in-class environment is proposed. The proposed framework captures the video of the user using a regular webcam; it tracks their faces throughout the video’s frames. Different features are extracted from the user’s face—e.g., facial landmark points and facial action units—as shown in Figure 4.

It is logical to assume that a low measure of attentiveness indicated by the behavioral engagement component will not be enhanced by the emotional engagement classifier. Therefore, the application of the emotional engagement classifier is predicated on evidence of behavioral engagement in overall engagement estimation. To measure emotional engagement, the proposed module uses the extracted faces from previous steps to extract 68 facial feature points using our approach presented in [79]. This approach’s performance depends on a well-trained model. The current model was trained on the multiview faces 300 Faces In-the-Wild database [80], which has faces with multi-PIE (pose, illumination, and expression); therefore, the model performs well on different poses. Such a model allows the framework to estimate students’ engagement even if their faces are not front-facing. This helps student to sit freely on their seats without restrictions. Next, it uses our proposed method for action-unit detection under pose variation [81]. It uses the detected facial points to extract the most significant 22 patches to be used for the action-unit detection, as discussed in our work in [81]. This AU detection technique exploits both the sparse nature of the dominant AUs regions and semantic relationships among AUs. To handle pose variations, this algorithm defines patches around facial landmarks instead of using a uniform grid, which suffers from displacement and occlusion problems; see Figure 5. Then, it used a deep region-based neural network architecture in a multi-label setting to learn both the required features and the semantic relationships of AUs. Moreover, the weighted loss function is used to overcome the imbalance problem in multi-label learning. Then, the extracted facial action units are used to estimate the affective states (boredom, confusion, delight, frustration, and neutral) by correlations using McDaniel et al.’s method [30]. This method determines the extent to which each of the facial features is used to feed a support vector machine (SVM) to classify the students’ emotional engagement into two categories, emotionally engaged or emotionally non-engaged.

4. Experiments

4.1. Hardware Setup

Using student webcams and machines to run the proposed client module raises many issues, especially with the huge variety that students have in terms of hardware and software. The camera quality cannot be guaranteed, and multiple versions of the software are needed to ensure that it runs on each operating system. Additionally, the student may fold his/her laptop and use it to take notes, which leads to the impossibility of capturing the student’s face. Therefore, a special hardware unit was designed and installed in the classroom to be used as our client module to capture students’ faces. This module is composed of a Raspberry Pi microcontroller connected to a webcam and a touch display; see Figure 6. The Raspberry Pi micro-controller runs a program that connects to the server, captures the video stream, applies the introduced pipelines to extract the feature vector, and sends that vector to the server. The program allows the students to adjust the webcam to ensure that the video has a good perspective of the face. This module is also used in the data collection phase. It captures a video stream of the student’s face during the lecture and processes it in real-time to obtain the required metric and send the features to a server/high-performance computing machine. The Raspberry Pi uses a TLS-encrypted connection to ensure students’ data security and privacy.

Our server is a high-performance computing machine that collects all the streams/features and classifies the engagement level in real time. The setup also includes 4K wall-mounted cameras to capture a stream of students’ faces to get their head poses. Additionally, the configuration provides high-bandwidth network equipment for both wire and wireless connections, see Figure 7. The server also provides the instructor with a web-based dashboard which allows the instructor to monitor the average class engagement level or the individual’s levels; see Figure 8. The instructor can monitor the dashboard on a separate screen without obstructing the dynamics of the class. The dashboard gives the instructor the average of class engagement in real-time. Thus, regardless of the class size, the dashboard is compact and has a simple illustration. In addition, more individual analysis can be shown offline after the class, if needed.

4.2. Data Collection

The hardware described in the previous section was used to capture subjects’ facial videos while attending four lectures. The facial videos were recorded during the lectures. The collected dataset consists of 10 students of 300-level stem classes. These data were annotated by professorial educators. Each lecture is 75 min in length and was divided into 2-min windows, which resulted in 1360 samples. These data were annotated by education experts. Three engagement levels were defined using a set of tokens, which are summarized in Figure 9. A sample of the annotation during a lecture is shown in Figure 10. At the beginning of the lecture, most students were engaged. In the middle of the lecture, students’ engagement dropped somewhat due to some students partially disengaging. Later, at the end of the lecture, half dropped off, due to some students mostly disengaging. These results provide strong evidence for common observable behaviors and/or emotions that reflect student engagement.

4.3. Evaluation

A high-performance computing machine could run the proposed framework at a high frame rate of 10–15 fps, depending on the number of students in class. Raspberry Pi micro-controllers are able to run the proposed framework and process video stream to extract the individual students’ features (Head pose, Eye gaze, Action units) with a rate of 2–3 frames per second. Within a 2 min time window, we can get 240 processed feature vectors. The collected dataset was used to train support vector machine (SVM) classifiers to classify the engagement components (the behavioral and emotional engagement). The leave-one-out cross-validation technique was used for evaluation. The result-agreement ratios for the disengaged and engaged in terms of behavioral engagement were

83 %

and

88 %

, respectively. The agreement ratios for the disengaged and engaged in terms of emotional engagement were

73 %

and

90 %

, respectively. Figure 11 shows the confusion matrices of the proposed behavioral and emotional engagement classification.

5. Conclusions and Future Work

In this paper, a novel framework for automatically measuring the student’s behavioral and emotional engagement levels in the class environment was proposed. This framework provides instructors with real-time estimation for both the average class engagement level and the engagement level of each individual, which will help the instructor make decisions and plans for the lectures, especially in large-scale classes or in settings in which the instructor cannot have direct eye contact with the students.

More features should be captured to enhance the behavioral and emotional engagement modules. The streams from the 4K cameras will be used to capture students’ bodies, then extract their body poses and body actions. These actions will help the behavioral module to classify the students’ behavioral engagement. Additionally, to enhance emotional engagement, we consider adding some features such as heart rate variability (HRV) and galvanic skin response (GSK).

A large-scale dataset should be collected for more students who attend multiple courses during the entire semester. This will help in the process of training and evaluating both behavioral and emotional engagement measurement modules. It will also allow the emotional engagement measurement module to become more complicated by classifying chunks of video (time window) rather than individual frames.

Additionally, this work did not discuss the estimation of the third component of engagement, which is the cognitive engagement. Measuring this component is too complicated, and using a sensor such as an electroencephalogram (EEG) headset is very intrusive. A study to relate the measured behavioral and emotional engagement levels to the third component will be performed.

We tested the system in multiple classes, for which the lecture duration was 75 min. The system keeps track of students’ engagement over time. Thus, the time index helps an instructor with analyzing the engagement to evaluate the course content, while ignoring the beginning and end of the lecture. As future work, we plan to test the system in different classes with different durations and different settings. Moreover in this framework, data were collected in STEM classes for junior students at U.S universities. However, this system has a modular architecture; the more modules you add the better results you get. Thus, once the concept is proved and the technology is adopted, these modules can be minimized, and a smaller number of sensors can be used. Then, the technology can be delivered at a low cost, which makes it available in other places.

Author Contributions

Methodology, I.A., A.M.A. and N.H.; Software, I.A. and A.M.A.; Investigation, All; Writing—original draft, I.A.; Writing—review and editing, I.A., A.M.A. and A.F.; Supervision A.F.; Project administration, A.F.; Funding acquisition, A.F. All authors have read and agreed to the published version of the manuscript.

Funding

NSF grand award 1900456.

Institutional Review Board Statement

This research study was conducted retrospectively under University of Louisville IRB# 19.0513. approved in 14 September 2021.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data is unavailable due to students privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xu, X.; de Sa, V.R. Exploring multidimensional measurements for pain evaluation using facial action units. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 786–792. [Google Scholar]
Tan, Z.; Zhou, A.; Hei, X.; Gao, Y.; Zhang, C. Towards Automatic Engagement Recognition of Autistic Children in a Machine Learning Approach. In Proceedings of the 2019 IEEE International Conference on Engineering, Technology and Education (TALE), Yogyakarta, Indonesia, 10–13 December 2019; pp. 1–8. [Google Scholar] [CrossRef]
Manfredonia, J.; Bangerter, A.; Manyakov, N.V.; Ness, S.; Lewin, D.; Skalkin, A.; Boice, M.; Goodwin, M.S.; Dawson, G.; Hendren, R.; et al. Automatic recognition of posed facial expression of emotion in individuals with autism spectrum disorder. J. Autism Dev. Disord. 2019, 49, 279–293. [Google Scholar] [CrossRef]
Badgujar, P.; Selmokar, P. Driver gaze tracking and eyes off the road detection. Mater. Today Proc. 2022, 72, 1863–1868. [Google Scholar] [CrossRef]
Alkabbany, I.; Ali, A.; Farag, A.; Bennett, I.; Ghanoum, M.; Farag, A. Measuring Student Engagement Level Using Facial Information. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3337–3341. [Google Scholar]
Palaniswamy, S. A Robust Pose & Illumination Invariant Emotion Recognition from Facial Images using Deep Learning for Human–Machine Interface. In Proceedings of the 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bengaluru, India, 20–21 December 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 4, pp. 1–6. [Google Scholar]
Duchenne, G.B. Mécanisme de la physionomie humaine ou analyse électro-physiologique de l’expression des passions; Librairie, J.-B., Ed.; Ve Jules Renouard, libraire: Paris, France, 1876. [Google Scholar]
Geisinger, B.N.; Raman, D.R. Why they leave: Understanding student attrition from engineering majors. Int. J. Eng. Educ. 2013, 29, 914. [Google Scholar]
Zhang, G.; Anderson, T.J.; Ohland, M.W.; Thorndyke, B.R. Identifying factors influencing engineering student graduation: A longitudinal and cross-institutional study. J. Eng. Educ. 2004, 93, 313–320. [Google Scholar] [CrossRef]
Patrick, A.D.; Prybutok, A. Predicting persistence in engineering through an engineering identity scale. Int. J. Eng. Educ. 2018, 34, 351–363. [Google Scholar]
Good, C.; Rattan, A.; Dweck, C.S. Why do women opt out? Sense of belonging and women’s representation in mathematics. J. Personal. Soc. Psychol. 2012, 102, 700. [Google Scholar] [CrossRef]
Whitcomb, K.M.; Singh, C. Underrepresented minority students receive lower grades and have higher rates of attrition across STEM disciplines: A sign of inequity? Int. J. Sci. Educ. 2021, 43, 1054–1089. [Google Scholar] [CrossRef]
Hieb, J.L.; Lyle, K.B.; Ralston, P.A.; Chariker, J. Predicting performance in a first engineering calculus course: Implications for interventions. Int. J. Math. Educ. Sci. Technol. 2015, 46, 40–55. [Google Scholar] [CrossRef]
Bego, C.R.; Barrow, I.Y.; Ralston, P.A. Identifying bottlenecks in undergraduate engineering mathematics: Calculus I through differential equations. In Proceedings of the 2017 ASEE Annual Conference & Exposition, Columbus, OH, USA, 25–28 June 2017. [Google Scholar]
Kajander, A.; Lovric, M. Transition from secondary to tertiary mathematics: McMaster University experience. Int. J. Math. Educ. Sci. Technol. 2005, 36, 149–160. [Google Scholar] [CrossRef]
Bellinger, D.B.; DeCaro, M.S.; Ralston, P.A. Mindfulness, anxiety, and high-stakes mathematics performance in the laboratory and classroom. Conscious. Cogn. 2015, 37, 123–132. [Google Scholar] [CrossRef]
Walton, G.M.; Logel, C.; Peach, J.M.; Spencer, S.J.; Zanna, M.P. Two brief interventions to mitigate a “chilly climate” transform women’s experience, relationships, and achievement in engineering. J. Educ. Psychol. 2015, 107, 468. [Google Scholar] [CrossRef] [Green Version]
Weaver, J.P.; DeCaro, M.S.; Hieb, J.L.; Ralston, P.A. Social Belonging and First-Year Engineering Mathematics: A Collaborative Learning Intervention. In Proceedings of the 2016 ASEE Annual Conference & Exposition, New Orleans, Louisiana, 26 June 2016. [Google Scholar]
ARTNATOMYA (Anatomical bases of facial expression learning tool) Copyright 2006–2023 Victoria Contreras Flores. SPAIN. Available online: www.artnatomia.net; www.artnatomy.com (accessed on 2 September 2017).
Fredricks, J.A.; Blumenfeld, P.C.; Paris, A.H. School engagement: Potential of the concept, state of the evidence. Rev. Educ. Res. 2004, 74, 59–109. [Google Scholar] [CrossRef]
Greene, B.A. Measuring cognitive engagement with self-report scales: Reflections from over 20 years of research. Educ. Psychol. 2015, 50, 14–30. [Google Scholar] [CrossRef]
Sinatra, G.M.; Heddy, B.C.; Lombardi, D. The challenges of defining and measuring student engagement in science. Educ. Psychol. 2015, 50, 1–13. [Google Scholar] [CrossRef]
Sinclair, M.; Christenson, S.; Lehr, C.; Anderson, A. Facilitating Student Engagement: Lessons Learned from Check & Connect Longitudinal Studies. Calif. Sch. Psychol. 2014, 8, 29–41. [Google Scholar] [CrossRef]
Craik, F.I.; Lockhart, R.S. Levels of processing: A framework for memory research. J. Verbal Learn. Verbal Behav. 1972, 11, 671–684. [Google Scholar] [CrossRef]
Marton, F.; Säljö, R. On qualitative differences in learning: I—Outcome and process. Br. J. Educ. Psychol. 1976, 46, 4–11. [Google Scholar] [CrossRef]
Posner, M.I.; Petersen, S.E. The attention system of the human brain. Annu. Rev. Neurosci. 1990, 13, 25–42. [Google Scholar] [CrossRef]
Watson, D.; Wiese, D.; Vaidya, J.; Tellegen, A. The two general activation systems of affect: Structural findings, evolutionary considerations, and psychobiological evidence. J. Personal. Soc. Psychol. 1999, 76, 820. [Google Scholar] [CrossRef]
Petersen, S.E.; Posner, M.I. The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 2012, 35, 73. [Google Scholar] [CrossRef]
Kapoor, A.; Picard, R.W. Multimodal Affect Recognition in Learning Environments. In Proceedings of the 13th Annual ACM International Conference on Multimedia, Singapore, 6–11 November 2005; pp. 677–682. [Google Scholar]
McDaniel, B.; D’Mello, S.; King, B.; Chipman, P.; Tapp, K.; Graesser, A. Facial features for affective state detection in learning environments. In Proceedings of the Annual Meeting of the Cognitive Science Society, Nashville, TN, USA, 1–4 August 2007; Volume 29. [Google Scholar]
D’Mello, S.K.; Graesser, A. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adapt. Interact. 2010, 20, 147–187. [Google Scholar] [CrossRef]
Grafsgaard, J.F.; Wiggins, J.B.; Boyer, K.E.; Wiebe, E.N.; Lester, J.C. Automatically Recognizing Facial Indicators of Frustration: A Learning-centric Analysis. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013. [Google Scholar]
Whitehill, J.; Serpell, Z.; Lin, Y.; Foster, A.; Movellan, J.R. The Faces of Engagement: Automatic Recognition of Student Engagement from Facial Expressions. IEEE Trans. Affect. Comput. 2014, 5, 86–98. [Google Scholar] [CrossRef]
Ahuja, K.; Kim, D.; Xhakaj, F.; Varga, V.; Xie, A.; Zhang, S.; Townsend, J.E.; Harrison, C.; Ogan, A.; Agarwal, Y. EduSense: Practical Classroom Sensing at Scale. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 1–26. [Google Scholar] [CrossRef]
Li, Y.Y.; Hung, Y.P. Feature fusion of face and body for engagement intensity detection. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3312–3316. [Google Scholar]
Dhamija, S.; Boult, T.E. Automated mood-aware engagement prediction. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
Littlewort, G.; Whitehill, J.; Wu, T.; Fasel, I.; Frank, M.; Movellan, J.; Bartlett, M. The computer expression recognition toolbox (CERT). In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 298–305. [Google Scholar]
Monkaresi, H.; Bosch, N.; Calvo, R.A.; D’Mello, S.K. Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans. Affect. Comput. 2016, 8, 15–28. [Google Scholar] [CrossRef]
Henderson, N.L.; Rowe, J.P.; Mott, B.W.; Lester, J.C. Sensor-based Data Fusion for Multimodal Affect Detection in Game-based Learning Environments. In Proceedings of the EDM (Workshops), Montréal, QC, Canada, 2–5 July 2019; pp. 44–50. [Google Scholar]
Dhamija, S.; Boult, T.E. Exploring contextual engagement for trauma recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 19–29. [Google Scholar]
Macea, D.D.; Gajos, K.; Calil, Y.A.D.; Fregni, F. The efficacy of Web-based cognitive behavioral interventions for chronic pain: A systematic review and meta-analysis. J. Pain 2010, 11, 917–929. [Google Scholar] [CrossRef]
Marks, S.U.; Gersten, R. Engagement and disengagement between special and general educators: An application of Miles and Huberman’s cross-case analysis. Learn. Disabil. Q. 1998, 21, 34–56. [Google Scholar] [CrossRef]
Booth, B.M.; Ali, A.M.; Narayanan, S.S.; Bennett, I.; Farag, A.A. Toward active and unobtrusive engagement assessment of distance learners. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 470–476. [Google Scholar]
Foreman, J.C.; Farag, A.; Ali, A.; Alkabbany, I.; DeCaro, M.S.; Tretter, T. Towards a multi-dimensional biometric approach to real-time measurement of student engagement in the STEM classroom. In Proceedings of the 2020 ASEE Virtual Annual Conference, Virtual conference, 22–26 June 2020. [Google Scholar]
Farag, A.A.; Ali, A.; Alkabbany, I.; Foreman, J.C.; Tretter, T.; DeCaro, M.S.; Hindy, N.C. Toward a Quantitative Engagement Monitor for STEM Education. In Proceedings of the 2021 ASEE Virtual Annual Conference, Virtual conference, 26– 29 July 2021. [Google Scholar]
Alkabbany, I.M.A.M.M. Biometric Features Modeling to Measure Students Engagement. Ph.D. Thesis, University of Louisville, Louisville, KY, USA, 2021. [Google Scholar]
O’Malley, K.J.; Moran, B.J.; Haidet, P.; Seidel, C.L.; Schneider, V.; Morgan, R.O.; Kelly, P.A.; Richards, B. Validation of an observation instrument for measuring student engagement in health professions settings. Eval. Health Prof. 2003, 26, 86–103. [Google Scholar] [CrossRef]
Lane, E.S.; Harris, S.E. A new tool for measuring student behavioral engagement in large university classes. J. Coll. Sci. Teach. 2015, 44, 83–91. [Google Scholar] [CrossRef]
Chang, C.; Zhang, C.; Chen, L.; Liu, Y. An ensemble model using face and body tracking for engagement detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder, CO, USA, 16–20 October 2018; pp. 616–622. [Google Scholar]
Kleinsmith, A.; De Silva, P.R.; Bianchi-Berthouze, N. Cross-cultural differences in recognizing affect from body posture. Interact. Comput. 2006, 18, 1371–1389. [Google Scholar] [CrossRef]
Meeren, H.K.; van Heijnsbergen, C.C.; de Gelder, B. Rapid perceptual integration of facial expression and emotional body language. Proc. Natl. Acad. Sci. USA 2005, 102, 16518–16523. [Google Scholar] [CrossRef]
Schindler, K.; Van Gool, L.; De Gelder, B. Recognizing emotions expressed by body pose: A biologically inspired neural model. Neural Netw. 2008, 21, 1238–1246. [Google Scholar] [CrossRef] [PubMed]
Mota, S.; Picard, R.W. Automated posture analysis for detecting learner’s interest level. In Proceedings of the 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, WI, USA, 16–22 June 2022; IEEE: Piscataway, NJ, USA, 2003; Volume 5, p. 49. [Google Scholar]
Arroyo, I.; Cooper, D.G.; Burleson, W.; Woolf, B.P.; Muldner, K.; Christopherson, R. Emotion sensors go to school. In Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED), Brighton, UK, 6–10 July 2009; IOS Press: Amsterdam, The Netherlands, 2009; pp. 17–24. [Google Scholar]
Goldin-Meadow, S. How gesture works to change our minds. Trends Neurosci. Educ. 2014, 3, 4–6. [Google Scholar] [CrossRef] [PubMed]
Goldin-Meadow, S. Taking a hands-on approach to learning. Policy Insights Behav. Brain Sci. 2018, 5, 163–170. [Google Scholar] [CrossRef]
Pease, B.; Pease, A. The Definitive Book of Body Language: The Hidden Meaning Behind People’s Gestures and Expressions; Bantam: New York, NY, USA, 2008. [Google Scholar]
Rocca, K.A. Student participation in the college classroom: An extended multidisciplinary literature review. Commun. Educ. 2010, 59, 185–213. [Google Scholar] [CrossRef]
Cook, S.W.; Goldin-Meadow, S. The role of gesture in learning: Do children use their hands to change their minds? J. Cogn. Dev. 2006, 7, 211–232. [Google Scholar] [CrossRef]
Raca, M.; Dillenbourg, P. System for assessing classroom attention. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, Leuven, Belgium, 8–13 April 2013; pp. 265–269. [Google Scholar]
Stiefelhagen, R. Tracking focus of attention in meetings. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces, Pittsburgh, PA, USA, 16 October 2022; IEEE: Piscataway, NJ, USA, 2002; pp. 273–280. [Google Scholar]
Zaletelj, J.; Košir, A. Predicting students’ attention in the classroom from Kinect facial and body features. EURASIP J. Image Video Process. 2017, 2017, 80. [Google Scholar] [CrossRef]
Raca, M. Camera-Based Estimation of Student’s Attention in Class; Technical report; École polytechnique fédérale de Lausann: Lausanne, Switzerland, 2015. [Google Scholar]
Li, B.; Li, H.; Zhang, R. Adaptive random network coding for multicasting hard-deadline-constrained prioritized data. IEEE Trans. Veh. Technol. 2015, 65, 8739–8744. [Google Scholar] [CrossRef]
Bosch, N.; Chen, Y.; D’Mello, S. It’s written on your face: Detecting affective states from facial expressions while learning computer programming. In Proceedings of the International Conference on Intelligent Tutoring Systems, Honolulu, HI, USA, 5–9 June 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 39–44. [Google Scholar]
Kapoor, A.; Burleson, W.; Picard, R.W. Automatic prediction of frustration. Int. J. Hum.-Comput. Stud. 2007, 65, 724–736. [Google Scholar] [CrossRef]
Bosch, N.; D’mello, S.K.; Ocumpaugh, J.; Baker, R.S.; Shute, V. Using Video to Automatically Detect Learner Affect in Computer-Enabled Classrooms. ACM Trans. Interact. Intell. Syst. 2016, 6. [Google Scholar] [CrossRef]
Chi, M. Active-Constructive-Interactive: A Conceptual Framework for Differentiating Learning Activities. Top. Cogn. Sci. 2009, 1, 73–105. [Google Scholar] [CrossRef]
Saneiro, M.; Santos, O.C.; Salmeron-Majadas, S.; Boticario, J.G. Towards Emotion Detection in Educational Scenarios from Facial Expressions and Body Movements through Multimodal Approaches. Sci. World J. 2014, 2014, 484873. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cano, D.R. The Effect of Engagement on at Risk Student Achievement: A Correlational Investigation; Dallas Baptist University: Dallas, TX, USA, 2015. [Google Scholar]
Ahuja, K.; Shah, D.; Pareddy, S.; Xhakaj, F.; Ogan, A.; Agarwal, Y.; Harrison, C. Classroom Digital Twins with Instrumentation-Free Gaze Tracking. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Hao, N.; Xue, H.; Yuan, H.; Wang, Q.; Runco, M. Enhancing creativity: Proper body posture meets proper emotion. Acta Psychol. 2017, 173, 32–40. [Google Scholar] [CrossRef] [PubMed]
Andolfi, V.; di nuzzo, C.; Antonietti, A. Opening the Mind through the Body: The Effects of Posture on Creative Processes. Think. Ski. Creat. 2017, 24, 20–28. [Google Scholar] [CrossRef]
Baltrusaitis, T.; Zadeh, A.; Lim, Y.C.; Morency, L.P. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 59–66. [Google Scholar]
Wood, E.; Baltrusaitis, T.; Zhang, X.; Sugano, Y.; Robinson, P.; Bulling, A. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3756–3764. [Google Scholar]
Skinner, E.; Belmont, M. Motivation in the Classroom: Reciprocal Effects of Teacher Behavior and Student Engagement Across the School Year. J. Educ. Psychol. 1993, 85, 571–581. [Google Scholar] [CrossRef]
Voelkl, K.E. Identification with school. Am. J. Educ. 1997, 105, 294–318. [Google Scholar] [CrossRef]
Appleton, J.; Christenson, S.; Kim, D.; Reschly, A. Measuring Cognitive and Psychological Engagement: Validation of the Student Engagement Instrument. J. Sch. Psychol. 2006, 44, 427–445. [Google Scholar] [CrossRef]
Mostafa, E.; Ali, A.A.; Shalaby, A.; Farag, A. A Facial Features Detector Integrating Holistic Facial Information and Part-Based Model. In Proceedings of the CVPR-Workshops, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Sagonas, C.; Antonakos, E.; Tzimiropoulos, G.; Zafeiriou, S.; Pantic, M. 300 Faces In-The-Wild Challenge: Database and results. Image Vis. Comput. 2016, 47, 3–18. [Google Scholar] [CrossRef] [Green Version]
Ali, A.M.; Alkabbany, I.; Farag, A.; Bennett, I.; Farag, A. Facial Action Units Detection Under Pose Variations Using Deep Regions Learning. In Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017. [Google Scholar]

Figure 2. A conceptual framework linking on-task/off-task behavioral, positive/negative emotions, and deep/shallow cognitive engagement.

Figure 3. The proposed behavioral engagement framework.

Figure 4. The proposed emotional engagement framework.

Figure 5. In the presence of pose, the uniform grid (a) suffers from a lack of correspondences (red and blue rectangles) due to displacement and occlusion. To minimize this lack of correspondence, facial landmarks are used to define a sparse set of patches (b).

Figure 6. The student hardware module.

Figure 7. The proposed biometric sensor network.

Figure 8. Instructor dashboard summarizes class students’ engagement in a clear and simplified way.

Figure 9. Dictionary of Tokens.

Figure 10. Engagement level of 10 students during a lecture.

Figure 11. The confusion matrix of the proposed engagement classification.

Table 1. Psychological Constructs for the Three Types of Engagement.

TYPE OF ENGAGMENT	COGNITIVE (C)	BEHAVIORAL (B)	EMOTIONAL (E)
Psychological Construct	Levels of processing [24,25]	Targets of attention [26,28]	Affective context [27]
Engaged State	Deep processing	On-task attention	Positive affect
Disengaged State	Shallow processing	Off-task attention	Negative affect
External Operationalization	Not directly observable	Eye gaze, head pose, etc	Facial Action Coding System

Table 2. Features of interest along with motivating literature.

Feature	Engagement Component	Motivation
Body gestures and postures	Behavioral and Emotion	There are common observable behaviors and/or emotions, which reflect student engagement [47,48]. Student engagement can be estimated by the fusion of facial expressions and body features [35,49]. Fusion of facial expressions with wristband heart rate data and/or posture with electrodermal activity data shows dramatic improvement in student engagement [38,39]. Body movement, posture, and gesture indicate the affective states [50,51,52,53,54]. Gestures are associated with learning by indexing moments of cognitive instability and reflecting thoughts not yet found in speech [55,56].
Hand movement	Behavioral and Emotion	Hand-over-face gestures is a subset of emotional body language [57]. Frequency and quantity of hands raises is a good indicator of student participation [58]. Students who spontaneously gesture as they work through new ideas tend to remember them longer than those who do not move their hands [59].
Head movement and eye gaze	Behavioral	Head orientation has been shown to be a proxy for gaze attention [60,61,62,63], e.g., toward the instructor, educational materials, and other classroom foci. Attention is a pre-requisite for learning [64].
Facial Action Units (FACS)	Emotion	Facial information reveals students’ affective state, such as engagement [33] and frustration [32,65,66], and off-task behaviors [67]. Student engagement can be estimated by the fusion of facial expressions and body features [35,49].
Mood	Emotion	Mood awareness influences engagement classification [36].
Heart rate	Emotion	There are significant evidence [33,38,41,42] that facial expression estimation of engagement is nearly universal.
Graded homework and weekly exams	Cognitive	Cognitive engagement is challenging to directly monitor [68], especially during class sessions.

Table 3. Comparison of motivating literature frameworks settings.

Research Group	Learning Context	Information Source	Affective States	Annotators	Dataset
MIT Media Lab, 2005 [29]	A person Solving puzzles on PC	Camera (head-nod and head-shake, Eye blinks, Mouth activities); Chair (Posture features), OS (screen activity)	high, medium and low interest, bored, and “taking a break”	teachers	8 children
U. of Memphis 2007 [30]	An individual using AutoTutor system on PC	Camera (Manual Facial Action Coding) AUs 1, 2, 4, 7, & 14.	boredom, confusion, frustration, delight, and neutral	Self, peer, and 2 judges	28 undergraduate students
U. Massachusetts Amherst, Arizona State Emotion Sensors, 2009 [54]	An Individual using a multimedia adaptive tutoring system for geometry on PC	Camera (facial expression), Chair (Posture features), and mouse (increasing amounts of pressure), Wrist (skin conductance)	Confident, Frustrated, Excited, and Interested	Pretest, posttest, and a survey	38 HS and 29 female Undergrad students
NC State - 2013 [32]	An individual using JavaTutor on PC	Kinect (5 most frequently occurring AUS 1, 2, 4, 7, 14.)	Frustration and Learning game	Two FACS coders	67 undergraduate students
UCSD, MP Lab and Dept. of Psychology vsu, 2014 [23]	An Individual using a cognitive skills training software	Camera (Facial image)	Not engaged, Nominally engaged, Engaged, and Very engaged	7 Labelers	34 undergraduate students
Artificial Intelligence Department, UNED, Spain, 2014 [69]	An Individual solving Math. problems On PC	Camera (facial image); Kinect (video and processed information); OS (participant’s activities) Sensors (physiological signals) Solutions, questionnaires/observer report.	Excited/Unexcited, Concentrated/Distracted, Worried/Unworried, interested/Uninterested	Self-report, Psychoeducational expert	75 undergraduate
Learning Research Center, Pittsburgh Psychology and Education, 2015 [70]	A Finland high school 9th to 11th grade in regular classroom	Student- and teacher-report survey; students’ perceived value, importance, and level of enjoyment with school; scale students’ levels of stress, etc.	Emotional engagement, School burnout, and Depression symptoms	GPA/teachers report	362 students
Notre Dame, Florida State and Columbia, 2016 [67]	An Individual solving Physics problems On PC	Camera (nineteen AUS, head pose, and gross body movement)	Off-task, on-task, delight, boredom, concentration, confusion, frustration.	Baker-Rodrigo, Observation, Method	137 8th and 9th grade students
CVIP lab, University of Louisville (EITL) [5]	E-learning for undergraduate student	Student webCams (33 AU, Head pose, eye gaze)	No Face, Not engaged, Look engaged, Engaged, and Very engaged	Researchers	13 students
Carnegie Mellon University, (Edusense), 2019 [34]	Undergrads regular classroom setting	12 tripod mounted cameras, Front (Student), Back (Teacher)	Raw Features: Body pose, head pose, smile, mouth open, hand rise, sit vs. stand, student vs. teacher speech (time ratio)	-	25 students for training and 687 for evaluation
Carnegie Mellon University, (Classroom digital Twins), 2021 [71]	Undergrads regular classroom setting	2 Wall mounted cameras: Front (Student), Back (Teacher)	Raw Features: Head Pose, Gaze point	Controlled experiment, (marker)	8 participants
CVIP lab, University of Louisville (Our proposed)	Stem classes for undergrads, regular classroom setting	Wall mount Camera(Head pose), Student Processing unit (Head pose, eye gaze and Facial AUs)	Behavioral and Emotional Engagement	Education experts	10 students

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alkabbany, I.; Ali, A.M.; Foreman, C.; Tretter, T.; Hindy, N.; Farag, A. An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms. Sensors 2023, 23, 1614. https://doi.org/10.3390/s23031614

AMA Style

Alkabbany I, Ali AM, Foreman C, Tretter T, Hindy N, Farag A. An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms. Sensors. 2023; 23(3):1614. https://doi.org/10.3390/s23031614

Chicago/Turabian Style

Alkabbany, Islam, Asem M. Ali, Chris Foreman, Thomas Tretter, Nicholas Hindy, and Aly Farag. 2023. "An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms" Sensors 23, no. 3: 1614. https://doi.org/10.3390/s23031614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms

Abstract

1. Introduction

Literature Review

2. Proposed Behavioral Engagement Module

3. Proposed Emotional Engagement Detection Module

4. Experiments

4.1. Hardware Setup

4.2. Data Collection

4.3. Evaluation

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI