# MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- an approach is proposed to implement MOOC video personalized classification in terms of difficulty and importance for students at different knowledge levels;
- the business process modeling idea is introduced into the modeling of MOOC video learning behaviors, and the process mining technique is used to mine the video watching behaviors of students;
- an approach of measuring difficulty and importance of MOOC videos based on a process model is proposed, by which the difficulty and importance of MOOC videos for students at different knowledge levels can be obtained automatically.

## 2. Related Works

## 3. MOOC Video Personalized Classification Framework

#### 3.1. Student Clustering

#### 3.2. Video Learning Behavior Process Model Mining

**Definition**

**1.**

_{1}, v

_{2}, …, v

_{n}} is a set of n MOOC videos and E = {e

_{1}, e

_{2}, …, e

_{m}} is a set of order relationships between MOOC videos, in which e

_{i}is the sequential relationship between two different videos in V.

_{1}exists in a loop structure, indicating that students watch v

_{1}repeatedly; v

_{2}and v

_{3}exist in a loop structure, which means students watch v

_{2}and v

_{3}in order repeatedly; and v

_{5}, v

_{6}, and v

_{7}exist in a skip structure, which means students can skip v

_{6}and watch v

_{7}directly after watching v

_{5}.

#### 3.3. MOOC Video Personalized Classification

## 4. MOOC Video Personalized Classification

#### 4.1. Student Clustering

Algorithm 1 Student clustering based on question answering vectors by K-means clustering. |

Input: Students set S = {s_{1}, s_{2}, …, s_{n}}; Students’ question answering vector set SV = {sv_{1}, sv_{2}, …, sv_{n}}; //sv_{i} represents the question answering vector of the i-th student; The number of clusters K; Max iteration times MT_{1}; Max times of cluster centers unchanging MT_{2}Output: K clusters: C_{1}, C_{2}, …, C_{K} |

1: init: C _{1} = C_{2} = … = C_{K} = {}, current iteration times CT_{1} = 0, current times of cluster centers unchanging CT_{2} = 0,2: CV = {cv_{1}, cv_{2}, …, cv_{K}} = Random(SV, K) //select the question answering vectors of K students as K initial cluster centers randomly3: while (CT_{1} < MT_{1} and CT_{2} < MT_{2}) //stop iteration when CT_{1} reach MT_{1} or CT_{2} reach MT_{2}4: for each sv_{i}∈SV and cv_{j}∈CV://traverse question answering vectors and cluster centers5: if (sv_{i} nearest to cv_{j}) then: //search the nearest cluster center to each students6: add s_{i} to C_{j} //add student s_{i} to the nearest cluster7: end if 8: end for 9: for each cv_{i}∈CV, each s_{j}∈C_{i}: //traverse cluster centers and students that belong to the cluster10: cv_{i} = avg(sv_{j}) //take the average value of every students’ question answering vectors in cluster i as new cluster centers of cluster i11: end for 12: CT_{1} = CT_{1}+1 //add 1 to max iteration times13: if (unchanged(CV)) then://if all cluster centers are unchanged14: CT_{2} = CT_{2}+1 //add 1 to max times of cluster centers unchanging15: end if16: end while17: output C_{1}, C_{2}, …, C_{K} |

_{1}, data

_{2}, …, data

_{n}), where n is the number of the clustering features and data

_{i}represents the i-th clustering feature of the question answering data. Algorithm 1 first randomly selects K objects as the initial cluster centers (line 2). Then, it calculates the distance between each object and each cluster center and assigns each object to the cluster center closest to it (lines 4–8). Next, the algorithm recalculates all cluster centers based on the existing objects in each cluster (lines 9–11) and updates the iteration conditions (lines 12–15). Repeat the process until the termination conditions are met (lines 3–16). Finally, the algorithm outputs the clusters (line 17).

#### 4.2. VLBP Model Mining

**Definition**

**2.**

_{1}, v

_{2}, v

_{3}, ..., v

_{n}>, where n is a natural number and v

_{i}denotes the i-th video in the sequence.

Algorithm 2 VLBP model mining based on heuristic mining. |

Input: Learning Sequence set LSS = {LS_{1}, LS_{2}, …, LS_{n}}; The threshold of the number of following directly Tf; The threshold of dependency Td;Output: VLBP = (V, E) |

1: init: the number of following directly matrix F[][]=0, dependency matrix D[][], all videos set Vall = {}, target videos set V = {}, order relations set E = {}2: for each LS∈LSS and each v_{i}∈LS://traverse videos belonging to LSS3: if (v_{i} not belong to V_{all}) then:4: add v_{i} to V_{all} //record videos that appear in LSS5: end if 6: end for 7: for each LS∈LSS and each v_{i}, v_{i+1}∈LS://traverse neighboring videos in each LS in LSS8: F[v_{i}][v_{i+1}] = F[v_{i}][v_{i+1}] + 1 //count the times that v_{i} follows v_{j} directly 9: end for 10: for each v_{i}, v_{j}∈V_{all}: //traverse every two videos 11: if (v_{i} == v_{j}) then:12: D[v_{i}][v_{j}]=F[v_{i}][v_{j}]/(F[v_{i}][v_{j}]+1) //calculate the dependency between v_{i} and itself13: end if14: if (v_{i} != v_{j}) then:15: D[v_{i}][v_{j}]=(F[v_{i}][v_{j}]-F[v_{j}][v_{i}])/(F[v_{i}][v_{j}]+F[v_{j}][v_{i}]+1)//calculate the dependency between v_{i} and v_{j}16: end if 17: if (F[v_{i}][v_{j}] >= Tf) and (D[v_{i}][v_{j}] >= Td) then://the number of following directly and the dependency between videos are all greater than or equal to the threshold 18: if (v_{i}, v_{j} not belong to V) then:19: add v_{i}, v_{j} to V//record videos that meet the conditions20: end if21: add (v_{i}, v_{j}) to E//record the order relationship between videos22: end if 23: end for 24: output (V, E) |

_{i}and v

_{i+1}in the learning sequence. Next, it calculates the dependency between every two videos v

_{i}and v

_{j}(lines 11–16). Then, if the number of directly following relationships and the dependence of v

_{i}and v

_{j}are greater than Tf and Td, v

_{i}and v

_{j}are added into the video set of the VLBP model and the sequence relationship of v

_{i}and v

_{j}is added into the relationship set of the VLBP model (lines 17–22).

#### 4.3. Video Classification Based on VLBP

#### 4.3.1. VLBP Structures for Video Difficulty and Importance Measure

_{1}. From the perspective of importance, the structure represents the repeated watching behavior of students. The behavior indicates that the content of the video is more important. From the perspective of difficulty, the centralized learning behavior of a single video also indicates that the content of the video is more difficult.

_{1}, v

_{2}, and v

_{3}. From the perspective of importance, the correlation between the videos in the structure is strong. This is to say, these videos are commonly watched together. The behavior of watching the videos repeatedly in the structure indicates that the content of these video is more important. From the perspective of video difficulty, the continuous learning behavior of a small number of correlated videos also indicates that the content of these videos is more difficult. As a result, these videos need to be watched many times.

_{1}–v

_{6}. From the perspective of importance, this structure indicates that students review a set of videos that they have watched. This structure usually appears after students have watched these videos for a long time. Reviewing videos shows that these videos are important. Through data analysis, we find that videos in a long-loop structure generally belong to multiple chapters of the course, and there may not be correlations among these videos. From the perspective of difficulty, the reviewing behaviors of videos that have no correlations are not because these videos are difficult. In practice, it is possibly because students have forgotten the content of these videos after long interval. Therefore, it does not mean that videos in this structure are difficult.

_{1}is the start node, v

_{2}and v

_{3}form the skip branch, and v

_{4}is the end node. In this example, v

_{4}can be directly watched after watching v

_{1}while ignoring v

_{2}and v

_{3}. It can be seen that the videos in the skip branch are less important.

#### 4.3.2. Video Importance and Difficulty Measure based on VLBP

_{i}is the i-th video, D

_{i}is the difficulty of v

_{i}, and I

_{i}is the importance of v

_{i}. EX

_{i}indicates whether v

_{i}appears in a VLBP model. Specifically, EX

_{i}= 1 indicates that v

_{i}appears in the VLBP model and EX

_{i}= 0 indicates that v

_{i}does not appear in the VLBP model. In addition, EX

_{i}(x) indicates whether v

_{i}exists in the x structure. Specifically, EX

_{i}(x) = 1 indicates that v

_{i}exists in the x structure and EX

_{i}(x) = 0 indicates that v

_{i}does not exist in the x structure (if x is a skip structure, x represents the skip branch in the skip structure). We give the calculation methods of D

_{i}and I

_{i}as Equations (1) and (2) to quantify the difficulty and importance of a video.

_{1}–v

_{1}

_{3}. All the structures, together with the difficulty and importance of videos calculated by Equations (1) and (2), are shown in Table 1. In Table 1, different videos with the same structure name indicate that these videos exist in the same structure (note that videos with the same skip structure indicate that these videos belong to the same skip branch). For example, v

_{1}is in the self-loop, v

_{2}and v

_{3}exist in the same short-loop, and v

_{6}and v

_{7}exist in the same short-loop. In addition, v

_{4}, v

_{5}, v

_{6}, v

_{7}, v

_{8}, and v

_{9}exist in the same long-loop and v

_{11}can be skipped. In addition, the video that does not appear in the VLBP model is v

_{13}.

#### 4.3.3. MOOC Video Classification

_{4}, v

_{5}, v

_{8}, v

_{9}, v

_{10}, v

_{11}, and v

_{12}and the videos with the difficulty value 2 are v

_{1}, v

_{2}, v

_{3}, v

_{6}, and v

_{7}. Second, the video with the importance value 0 is v

_{11}; the videos with the importance value 1 are v

_{10}and v

_{12}; the videos with importance 2 are v

_{1}, v

_{2}, v

_{3}, v

_{6}, and v

_{7}; and the videos with importance 3 are v

_{6}and v

_{7}. The final classification results are shown in Table 2.

## 5. Experiment and Evaluation

#### 5.1. Dataset

#### 5.2. Experimental Procedures

#### 5.2.1. Student Clustering

#### 5.2.2. VLBP Model Mining and Video Classification

_{1}–v

_{40}.

#### 5.3. Experiment Analysis and Verification

#### 5.3.1. Difficulty and Importance Analysis of Videos

#### 5.3.2. Effectiveness of Video Personalized Classification

_{1}–v

_{40}). We compare the evaluation results of students with the classification results obtained by our approach.

_{x}) and the average accuracy of video importance (denoted as IP

_{x}) in the student cluster x in EG.

_{x}is the number of students in cluster x, and we denote that ED

_{xi}is the difficulty evaluation of cluster x for i-th video in EG and that ID

_{xij}is the difficulty value given by j-th student in cluster x for the i-th video in IG. Thus, DP

_{x}can be measured by Equation (3). Similarly, we denote that EI

_{xi}is the importance evaluation of cluster x for the i-th video in EG and that II

_{xij}is the importance value given by the j-th student in cluster x for the i-th video in IG. Thus, IP

_{x}can be measured by Equation (4).

_{xij}to judge whether the difficulty of the i-th video in cluster x evaluated by EG and that given by the j-th student are consistent and we use DN

_{xi}to get the number of DC

_{xij}of which the value is 1. Next, we use DA

_{xi}to get the accuracy of difficulty evaluation of cluster x for the i-th video in EG. Finally, we get the average accuracy of difficulty evaluation of cluster x for all videos in EG by Equation (3). In the same way, we can get the average accuracy of importance evaluation of cluster x for all videos in EG by Equation (4). In the experiment, students are divided into three groups according to the clusters, and the average accuracy of difficulty and importance evaluation of each group of students for all videos is calculated (Note that the videos evaluated by students do not include the videos not watched by students). The calculation result is shown in Figure 11.

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Dataset Used in the Experiments

## References

- Long, T.; Cummins, J.; Waugh, M. Use of the flipped classroom instructional model in higher education: instructors’ perspectives. J. Comput. High. Educ.
**2017**, 29, 179–200. [Google Scholar] [CrossRef] - Elmaadaway, M.A.N. The effects of a flipped classroom approach on class engagement and skill performance in a Blackboard course. Br. J. Educ. Technol.
**2018**, 49, 479–491. [Google Scholar] [CrossRef] - Wu, H.Y. Integration of Personalized Learning and Flipped Classroom Teaching Mode. Mod. Educ. Technol.
**2015**, 25, 46–52. [Google Scholar] - Wu, L.J.; Liu, Q.T.; Huan, H.; Liu, M.; Huang, J.X. The Design and Development of Educational Resources Clustering System Oriented to e-Learning. China Educ. Technol.
**2014**, 35, 85–89. [Google Scholar] - Rodríguez, P.; Duque, N.; Ovalle, D.A. Multi-agent system for knowledge-based recommendation of learning objects using metadata clustering. In Highlights of Practical Applications of Agents, Multi-Agent Systems, and Sustainability—The PAAMS Collection; Bajo, J., Hallenborg, K., Pawlewski, P., Botti, V., Sánchez-Pi, N., Méndez, N.D.D., Lopes, F., Julian, V., Eds.; Springer: Cham, Switzerland, 2015; Volume 524, pp. 356–364. [Google Scholar]
- Zhou, Q.; Mu, C.; Yang, D. Research Progress on Educational Data Mining: A Survey. J. Softw.
**2015**, 26, 3026–3042. [Google Scholar] - Dutt, A.; Ismail, M.A.; Herawan, T. A systematic review on educational data mining. IEEE Access
**2017**, 5, 15991–16005. [Google Scholar] [CrossRef] - Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert Syst. Appl.
**2014**, 41, 1432–1462. [Google Scholar] [CrossRef] - van der Sluis, F.; Ginn, J.; van der Zee, T. Explaining student behavior at scale: The influence of videos complexity on student dwelling time. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale; ACM: Edinburgh, UK, 2016; pp. 51–60. [Google Scholar]
- Li, N.; Kidziński, Ł.; Jermann, P.; Dillenbourg, P. MOOC videos interaction patterns: What do they tell us? In Design for Teaching and Learning in a Networked World; Conole, G., Klobučar, T., Rensing, C., Konert, J., Lavoué, E., Eds.; Springer: Cham, Switzerland, 2015; Volume 9307, pp. 197–210. [Google Scholar]
- Ye, H.Z.; Cheng, Q.J.; Huang, H.T. Using the K-means Algorithm-Based Method to Screen High-quality Online Resources. Distance Educ. China
**2014**, 34, 62–66. [Google Scholar] - Niemann, K.; Schmitz, H.C.; Kirschenmann, U.; Wolpers, M.; Schmidt, A.; Krones, T. Cluestering by usage: Higher order co-occurrences of learning objects. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge; ACM: Vancouver, BC, Canada, 2012; pp. 238–247. [Google Scholar]
- Niemann, K.; Wolpers, M. Usage-based clustering of learning resources to improve recommendations. In Open Learning and Teaching in Educational Communities; Rensing, C., de Freitas, S., Ley, T., Muñoz-Merino, P.J., Eds.; Springer: Cham, Switzerland, 2014; Volume 8719, pp. 317–330. [Google Scholar]
- Jiang, Q.; Zhao, W.; Li, S.; Wang, P.J. Research on the Mining of Precise Personalized Learning Path in Age of Big Data: Analysis of Group Learning Behaviors Based on AprioriAll. e-Educ. Res.
**2018**, 39, 45–52. [Google Scholar] - Bogarín, A.; Cerezo, R.; Romero, C. A survey on educational process mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2018**, 8, 1–17. [Google Scholar] [CrossRef] [Green Version] - Liu, C.; Duan, H.; Zeng, Q.; Zhou, M.; Lu, F.; Cheng, J. Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput.
**2019**, 12, 639–653. [Google Scholar] [CrossRef] - Liu, C. Automatic Discovery of Behavioral Models from Software Execution Data. IEEE Trans. Autom. Sci. Eng.
**2018**, 2018 15, 1897–1908. [Google Scholar] [CrossRef] - Liu, C.; Pei, P.; Duan, H.; Zeng, Q. LogRank: An Approach to Sample Business Process Event Log for Efficient Discovery. In 11th International Conference on Knowledge Science, Engineering, and Management (KSEM 2018); Springer: Cham, Switzerland, 2018; pp. 415–425. [Google Scholar]
- Liu, C.; Zhang, J.; Li, G.; Gao, S.; Zeng, Q. A Two-Layered Framework to Discover Software Behavior: A Case Study. IEICE Trans. Inf. Syst.
**2018**, E101-D, 2005–2014. [Google Scholar] [CrossRef] [Green Version] - Zeng, Q.; Sun, S.X.; Duan, H.; Liu, C.; Wang, H. Cross-organizational Collaborative Workflow Mining from a Multi-source log. Decis. Support Syst.
**2013**, 54, 1280–1301. [Google Scholar] [CrossRef] - Liu, C.; Wang, S.; Gao, S.; Zhang, F.; Cheng, J. User Behavior Discovery from Low-level Software Execution Logs. IEEJ Trans. Electr. Electron. Eng.
**2018**, 13, 1624–1632. [Google Scholar] [CrossRef] - Wasik, S.; Antczak, M.; Badura, J.; Laskowski, A.; Sternal, T. A survey on online judge systems and their applications. ACM Comput. Surv. (CSUR)
**2018**, 51, 1–34. [Google Scholar] [CrossRef] [Green Version] - Xue, L.M.; Luan, W.X. Application of clustering algorithm in university network user behavior analysis. Mod. Electron. Tech.
**2016**, 39, 29–32. [Google Scholar] - You, Z.X.; Qian, X.L.; Wang, Z.X. Clustering Research on MOOC Hot Topics Abroad. e-Educ. Res.
**2015**, 36, 38–44. [Google Scholar] - Mekhala. Review Paper on Process Mining. Int. J. Eng. Tech.
**2015**, 1, 11–17. [Google Scholar] - Ayutaya, N.S.N.; Palungsuntikul, P.; Premchaiswadi, W. Heuristic mining: Adaptive process simplification in education. In Proceedings of the 2012 Tenth International Conference on ICT and Knowledge Engineering, Bangkok, Thailand, 21–23 November 2012; pp. 221–227. [Google Scholar]
- Vázquez-Barreiros, B.; Mucientes, M.; Lama, M. ProDiGen: Mining complete, precise and minimal structure process models with a genetic algorithm. Inf. Sci.
**2015**, 294, 315–333. [Google Scholar] [CrossRef] - Günther, C.W.; van der Aalst, W.M.P. Fuzzy mining—Adaptive process simplification based on multi-perspective metrics. In Business Process Management; Alonso, G., Dadam, P., Rosemann, M., Eds.; Springer: Heidelberg/Berlin, Germany, 2007; Volume 4714, pp. 328–343. [Google Scholar]

**Figure 10.**(

**a**) The number of videos with different difficulty in three student clusters and (

**b**) the number of videos with different importance in three student clusters.

**Table 1.**Model structures, video difficulty, and importance in Figure 8.

Video Name | Whether it Appears | Self-Loop | Short-Loop | Long-Loop | Skip | Difficulty | Importance |
---|---|---|---|---|---|---|---|

v_{1} | 1 | Self-Loop1 | 2 | 2 | |||

v_{2} | 1 | Short-Loop1 | 2 | 2 | |||

v_{3} | 1 | Short-Loop1 | 2 | 2 | |||

v_{4} | 1 | Long-Loop1 | 1 | 2 | |||

v_{5} | 1 | Long-Loop1 | 1 | 2 | |||

v_{6} | 1 | Short-Loop2 | Long-Loop1 | 2 | 3 | ||

v_{7} | 1 | Short-Loop2 | Long-Loop1 | 2 | 3 | ||

v_{8} | 1 | Long-Loop1 | 1 | 2 | |||

v_{9} | 1 | Long-Loop1 | 1 | 2 | |||

v_{10} | 1 | 1 | 1 | ||||

v_{11} | 1 | Skip1 | 1 | 0 | |||

v_{12} | 1 | 1 | 1 | ||||

v_{13} | 0 |

**Table 2.**MOOC video classification results in Figure 8.

classify by difficult | classification 1 (D = 1) | classification 2 (D = 2) | ||

v_{4}, v_{5}, v_{8}, v_{9}, v_{10}, v_{11}, v_{12} | v_{1}, v_{2}, v_{3}, v_{6}, v_{7} | |||

classify by importance | classification 1 (I = 0) | classification 2 (I = 1) | classification 3 (I = 2) | classification 4 (I = 3) |

v_{11} | v_{10}, v_{12} | v_{1}, v_{2}, v_{3}, v_{4}, v_{5}, v_{8}, v_{9} | v_{6}, v_{7} |

Cluster | Number of Students | Correct Number of Answered Questions | Correct Rate of Answered Questions | Knowledge Level |
---|---|---|---|---|

1 | 12 | 16.0833 | 0.5092 | High |

2 | 56 | 11.3214 | 0.4818 | Middle |

3 | 26 | 8.8077 | 0.3058 | Low |

4 | 2 | 2.5 | 0.875 | Poor |

overall mean | 96 | 11.0521 | 0.4457 |

**Table 4.**Model structure and the difficulty and importance values for the students at “high” knowledge level.

Video Name | Whether It Appears | Self-Loop | Short-Loop | Long-Loop | Skip | D | I |
---|---|---|---|---|---|---|---|

1.1 JDK installation | 1 | Short-Loop1 | 2 | 2 | |||

1.2 Path configuration | 1 | Short-Loop1 | 2 | 2 | |||

1.3 JAVA_HOME environment variable configuration | 1 | Short-Loop1 | 2 | 2 | |||

1.4 classpath environment variable configuration | 1 | 1 | 1 |

classify by difficult | classification1 (D = 1) | classification2 (D = 2) | |

v_{4}, v_{5}, v_{6}, v_{7}, v_{8}, v_{9}, v_{10}, v_{11}, v_{12}, v_{13}, v_{14}, v_{15}, v_{17}, v_{18}, v_{19}, v_{20}, v_{21}, v_{22}, v_{23}, v_{24}, v_{25}, v_{26}, v_{27}, v_{28}, v_{29}, v_{30}, v_{31}, v_{32}, v_{33}, v_{34}, v_{39}, v_{40} | v_{1}, v_{2}, v_{3}, v_{16}, v_{35}, v_{36}, v_{37}, v_{38} | ||

classify by importance | classification1 (I = 1) | classification2 (I = 2) | classification3 (I = 3) |

v_{4}, v_{17}, v_{19}, v_{20}, v_{21}, v_{22}, v_{23}, v_{24}, v_{25} | v_{1}, v_{2}, v_{3}, v_{5}, v_{6}, v_{7}, v_{8}, v_{9}, v_{10}, v_{11}, v_{12}, v_{13}, v_{14}, v_{15}, v_{16}, v_{18}, v_{26}, v_{27}, v_{28}, v_{29}, v_{30}, v_{31}, v_{32}, v_{33}, v_{34}, v_{39}, v_{40} | v_{35}, v_{36}, v_{37}, v_{38} |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhang, F.; Liu, D.; Liu, C.
MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining. *Sustainability* **2020**, *12*, 3066.
https://doi.org/10.3390/su12073066

**AMA Style**

Zhang F, Liu D, Liu C.
MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining. *Sustainability*. 2020; 12(7):3066.
https://doi.org/10.3390/su12073066

**Chicago/Turabian Style**

Zhang, Feng, Di Liu, and Cong Liu.
2020. "MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining" *Sustainability* 12, no. 7: 3066.
https://doi.org/10.3390/su12073066