3.3.2. The Construction of the Algorithm
Time series association rules mining with up-to-date patterns:
Input: A log database D with n transactions stored in the order of transaction time with equal time intervals; each of them includes the transaction ID, transaction time, and items. The time T, the minimum support threshold $min\_sup$, the minimum UDP threshold min_UDP, and the minimum confidence threshold $min\_conf$ are also included.
Output: Rules mined from the time-series.
Step 1: Scan the database D to generate the candidate $1-itemset$${C}_{1}$, and record the count value and the $Timelist\left(i\right)$ of item i in the log database.
Step 2: Complete the following substeps for the items in ${C}_{1}$:
Substep 2.1: Calculate the Support of the $i-th$ item in ${C}_{1}$.
Substep 2.2: If the Support of the item is more than $min\_sup$, then put the item in $Template-{L}_{1}$. Otherwise, put the item in ${S}_{1}$.
Step 3: For the items i in ${S}_{1}$, complete the following substeps.
Substep 3.1: Set the $First\_ID\left(i\right)$ as the first transaction ID in the $Timelist\left(i\right)$ of the item i, and verify if the item i satisfies Formula (10). If the item i satisfies Formula (10), then it will be retained in ${S}_{1}$ and then will be put in $Template-L1$.
Substep 3.2: Set the $First\_ID\left(i\right)$ as the next transaction ID in the $Timelist\left(i\right)$ of the item i; decrease the count of item i by one; and repeat this substep until $count\left(i\right)$ is equal to zero. If $count\left(i\right)$ is equal to zero and the item or itemset still cannot satisfy Formula (10), then it will be deleted from ${S}_{1}$.
Step 4: Calculate the item or itemset as greater than or equal to $min\_UDP$ or not. If so, save the item or itemset; else, delete it.
Step 5: Combine the set ${S}_{1}$ and the set $Template-{L}_{1}$ to form ${L}_{1}$. Set r = 1, where r is used to keep the current number of items in the itemset to be processed.
Step 6: Generate the candidate set ${C}_{r+1}$ from ${L}_{r}$ in a similar manner to the a priori algorithm; moreover, the order of items should be considered as we mentioned above.
Step 7: Generate the frequent $(r+1)$-patterns $\left({L}_{r+1}\right)$ from ${C}_{r+1}$ in a similar manner to STEPS 2 and 3.
Step 8: If the ${L}_{r+1}$ is null, proceed to the next step. Otherwise, jump to STEPS 5 and 6.
Step 9: Calculate the Confidence and Lift of the itemsets in the ${L}_{r}(r\ge 2)$ with Formulas (9) and (4). If the Confidence of the itemsets is greater than $min\_conf$, then generate the rules in a manner similar to the a priori algorithm. Otherwise, delete the itemsets that cannot meet the $min\_conf$ requirement in ${L}_{r}$.
Step 10: Output the association rules mined from the log database.
Note that in the above algorithm, transactions in the log database must be the time-series with equal intervals.
3.3.4. An Example
In this section, an example is given to illustrate the proposed TSARM-UDP algorithm.
Table 1 shows the log database used in the example. The database contains 10 transactions and six items, denoted from
a to
f.
Input: T = 3, $min\_Tsup$ = 0.5, $min\_UDP$ = 0.1, $min\_Tconf$ = 0.4, log database D.
Output: Rules mined from D.
Step 1: Scan the database, and find the
$count\left(i\right)$ and the
$Timelist\left(i\right)$ of item
i in
D. Take item
a as an example. It appears in Transactions 4, 5, and 8. Thus,
$count\left(a\right)$ is three, and
$Timelist\left(a\right)$ is {4, 5, 8}. The result of STEP 1 is shown in
Table 2.
Step 2: Calculate the
TSupport in
Table 2 using Formula (8). Using item
b as an example, the count of
b is five. Thus, according to Formula (8), the
TSupport of
b is 0.5. The
$min\_Tsup$ given above is 0.5, so
b will be placed in
$Template\_{L}_{1}$. The
TSupport of item
c is 0.3. This value is less than
$min\_Tsup$, so it will be placed in
${S}_{1}$. The
TSupport calculation results are shown in
Table 3, namely
${L}_{1}=\{b,d\}$ and
${S}_{1}=\{a,c,e,f\}$.
Step 3: For the items in ${S}_{1}$, the following steps are performed. Items a and c are used as examples. For item a, $Timelist\left(a\right)=\{4,5,8\}$, so $First\_ID\left(a\right)=4$. In addition, $n=10$, $count\left(a\right)=3$, and $min\_Tsup=0.5$. Substitute the above parameters into Formula (10). On the left side of the inequation is $10-4+1=7$. On the right side of the inequation is $3/0.5=6$. The results do not satisfy the inequation, so the algorithm jumps to Substep3.2. $count\left(a\right)=3-1=2$, and $First\_ID\left(a\right)=5$. Thus, the updated parameters are substituted for the inequation, and recalculate. The result still cannot satisfy the inequation. Repeat SUBSTEP 3.2. $count\left(a\right)=1$, and $First\_ID\left(a\right)=8$. Then, substitute the updated parameters into the inequation, and recalculate. The result still cannot satisfy the inequation. Repeat SUBSTEP 3.2. $count\left(a\right)=0$. Thus, delete item a from ${S}_{1}$.
For item c, $Timelist\left(c\right)=\{7,8,9\}$, so $First\_ID\left(c\right)=7$, $count\left(c\right)=3$, $n=10$, and $min\_Tsup=0.5$. The method of calculating item a above is used to calculate item c. The left side of the inequation is six, and the right side of the inequation is also six. Thus, the result satisfies the inequation, and c will remain in ${S}_{1}$.
After calculating each item in ${S}_{1}$, then delete the items that do not satisfy the inequation. The items that remain in ${S}_{1}$ are $\{c,e,f\}$.
Step 4: Calculate the count of each item in ${S}_{1}$, and delete the items that do not satisfy being equal to or greater than $min\_UDP=0.1$. Update ${S}_{1}$.
Step 5: Combine set ${S}_{1}$ and set $Template\_{L}_{1}$ to form ${L}_{1}=\{b,c,d,e,f\}$. Set $r=1$.
Step 6: Generate the candidate set
${C}_{2}$ from
${L}_{1}$ through the method mentioned above, and the order of items should be considered.
${C}_{2}$ is shown in
Table 4.
Step 7: Generate the frequent two-patterns ${L}_{2}$ in a way similar to STEPS 2 and 3. $Template\_{L}_{2}$ are null, and ${S}_{2}=(d\to c)(d\to e)(d\to f)$. Thus, ${L}_{2}=(d\to c)(d\to e)(d\to f)$.
Step 8: We can generate ${C}_{3}$ from ${L}_{2}$, according to the method we mentioned in the previous article. We can get ${C}_{3}$ = $\left\{\right(d\to c,e\left)\right(d\to c,f\left)\right(d\to e,f\left)\right\}$, but each itemset in ${C}_{3}$ cannot satisfy the $min\_Tsup$ threshold and Formula (9). Thus, ${L}_{3}$ are null. The algorithm runs to STEP 8.
Step 9: In this step, we calculate the
TConfidence of itemsets in
${L}_{2}$ by Formula (9). Taking itemsets
$(d\to c)$ as an example:
$F(d,c,T)=3$, and
$F\left(d\right)=7$. According to Formula (9), the
TConfidence of itemsets
$(d\to c)$ is equal to
$3/7$. Then, we calculate the
Lift of itemsets,
$Lift(d\to c)=10/7$, which is greater than one. Thus,
$Rule(d\to c)$ is valid. The
TConfidence and
Lift of each itemset are given in
Table 5.
As shown in
Table 5, two itemsets satisfy the
$min\_Tconf$ and
Lift requirement. The rule generation method is similar to the a priori algorithm, but needs to consider the order of items and the other steps. The generated rules are given below:
$Rule\left\{1\right\}=d\stackrel{T}{\to}c$, with TConfidence = 3/7, Lift = 10/7
$Rule\left\{2\right\}=d\stackrel{T}{\to}f$, with TConfidence = 4/7, Lift = 10/7
Step 9: Output the rules.