Next Article in Journal
A Bivariate Post-Warranty Maintenance Model for the Product under a 2D Warranty
Next Article in Special Issue
Almost Optimal Searching of Maximal Subrepetitions in a Word
Previous Article in Journal
Asymmetric Growth of Tumor Spheroids in a Symmetric Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Order-Preserving Pattern Matching Algorithm Using Fingerprints

Department of Computer Engineering, Inha University, Incheon 22212, Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(12), 1954; https://doi.org/10.3390/math10121954
Submission received: 16 March 2022 / Revised: 29 May 2022 / Accepted: 2 June 2022 / Published: 7 June 2022
(This article belongs to the Special Issue Analysis of One-Dimensional Regularities)

Abstract

:
Two strings of the same length are order isomorphic if their relative orders are the same. The order-preserving pattern matching problem is to find all substrings of text T that are order isomorphic to pattern P when T ( | T | = n ) and P ( | P | = m ) are given. An O ( m n + n q log q + q ! ) -time algorithm using the O ( m + q ! ) space for the order-preserving pattern matching problem has been proposed utilizing fingerprints of q-grams based on the factorial number system and the bad character heuristic. In this paper, we propose an O ( m n + 2 q ) -time algorithm using the O ( m + 2 q ) space for the order-preserving pattern matching problem, but utilizing fingerprints of q-grams converted to binary numbers. A comparative experiment using three types of time series data demonstrates that the proposed algorithm is faster than existing algorithms because it reduces the number of order isomorphism tests.

1. Introduction

Two strings of the same length from an integer alphabet Σ are order isomorphic if their relative orders are the same. For example, strings x = ( 10 , 5 , 7 ) and y = ( 53 , 23 , 47 ) are order isomorphic because their relative orders are the same as ( 3 , 1 , 2 ) . The order-preserving pattern matching (OPPM) problem is to find all substrings of text T that are order isomorphic to pattern P when T ( | T | = n ) and P ( | P | = m ) over Σ are given. Order-preserving pattern matching can be used to analyze time series data such as stock indices, climate data, melodies, and so on [1].
Various algorithms for solving the OPPM problem have been proposed. An algorithm proposed in [1,2] solves the OPPM problem in O ( n + s o r t ( m ) ) time using the failure function of the Knuth–Morris–Pratt (KMP) algorithm [3]. An algorithm proposed in [4] solves the problem in O ( m n + n q log q + q ! ) time using fingerprints for q-grams that consist of q consecutive characters based on the factorial number system [5,6]. An algorithm presented in [7] is executed in sublinear time on average using binary encoding. An algorithm proposed in [8] uses a skip-search approach [9] and the Intel streaming SIMD extensions (SSE) instruction sets [10]. An algorithm using packed string matching [11,12], the SSE, and advanced vector extensions (AVX) instruction sets [13,14] was proposed in [15]. OPPM in a tree and a directed acyclic graph instead of a simple string were investigated in [16]. In [17], the OPPM problem was solved using a filtering method with minimum (or maximum) values. By generating order-preserving suffix trees in O ( n log n ) time, an algorithm presented in [18] searches P in O ( m + o c c ) time, where o c c is the number of substrings of T that are order isomorphic to P.
Our study makes the following contributions:
  • We improve the time and space complexity required to compute the fingerprint. In [4], the fingerprint of a q-gram based on the factorial number system was computed in O ( q log q ) time using the O ( q ! ) space. The OPPM algorithm proposed in this paper converts the q-gram to a binary number and computes the corresponding fingerprint in O ( q ) time using the O ( 2 q ) space.
  • We propose a fast algorithm by reducing the number of order isomorphism tests. Algorithms using fingerprints quickly find candidate locations where a pattern may occur, and they test whether order isomorphism actually occurs at those locations. The algorithm proposed in this paper improves the actual execution time by reducing the number of order isomorphism tests using fingerprints for two q-grams.
  • We compare the actual execution times of algorithms through various implementations. The execution times are measured by varying the sizes of q-grams for three types of real time series data. The results of implementations are analyzed under various experimental conditions.
The rest of this paper is organized as follows. In Section 2, we define the terms, and we review previous work. In Section 3, we discuss our new order-preserving pattern matching algorithm. In Section 4, we present experimental comparisons of the execution times between the algorithms presented in [4,7] versus the algorithm proposed in this study. Finally, we conclude the paper in Section 5.

2. Preliminaries

A set of strings of length m over integer alphabet Σ is denoted as Σ m . The length of string x is denoted as | x | , the ith character of x as x [ i ] ( 0 i < | x | ) , and the substrings of x from i to j x [ i ] x [ i + 1 ] x [ j ] as x [ i j ] ( 0 i j < | x | ) . If i = 0 , x [ i j ] is called a prefix of x; if j = | x | 1 , it is called a suffix of x.
If x [ i ] x [ j ] y [ i ] y [ j ] ( 0 i , j < | x | ) for two strings x and y of the same length, then x and y are order isomorphic and denoted as x y  [2]. The prefix representation of string x uses prefix table μ x , which is defined as follows [1]:
μ x [ i ] = | { j : x [ j ] x [ i ] for 0 j < i } | .
That is, μ x [ i ] is the number of characters smaller than or equal to x [ i ] in x [ 0 i 1 ] . Prefix table μ x can be computed in O ( | x | log | x | ) time using an order-statistic tree. If x y , then μ x = μ y  [1]. The nearest neighbor representation of x uses location tables L M a x x and L M i n x , which are defined as follows [1,2]:
L M a x x [ i ] = j if   x [ j ] = max { x [ k ] : x [ k ] x [ i ] f o r 0 k < i } , and
L M i n x [ i ] = j if   x [ j ] = min { x [ k ] : x [ k ] x [ i ] f o r 0 k < i } .
That is, L M a x x [ i ] is the location of the largest character j among the characters that are smaller than or equal to x [ i ] in x [ 0 i 1 ] , and L M i n x [ i ] is the location of the smallest character j among the characters that are larger than or equal to x [ i ] in x [ 0 i 1 ] . If there are two or more such j’s that satisfy this condition, the largest j among them is defined as L M a x x [ i ]  (or L M i n x [ i ] ); if there is no such j, they are defined as 1 . L M a x x and L M i n x can be computed in O ( | x | log | x | ) time using order-statistic trees and can be used to determine whether x and y are order isomorphic or not in O ( | x | ) time [1,2]. Table 1 shows prefix table μ x and location tables L M a x x and L M i n x for string x = ( 5 , 11 , 18 , 7 , 3 , 9 ) .
The order-preserving pattern matching problem is formally defined as follows.
Problem 1.
Order-preserving pattern matching problem.
Input: text T ( Σ n ) and pattern P ( Σ m ) .
Output: every position i ( m 1 i < n ) of T where T [ i m + 1 i ] P .
In [4], to apply the bad character heuristic of the Horspool algorithm [19] to OPPM, the notion of a q-gram and a fingerprint based on a factorial number system were used. A q-gram consists of q ( 1 q < m ) consecutive characters, and fingerprint f ( x ) for q-gram x converts x into one integer as follows [4]:
f ( x ) = k = 0 q 1 μ x [ k ] · k ! .
For example, when q = 3 , prefix table μ x of q-gram x = ( 11 , 83 , 32 ) is ( 0 , 1 , 1 ) , and f ( x ) = ( 0 × 0 ! ) + ( 1 × 1 ! ) + ( 1 × 2 ! ) = 3 . The algorithm in [4] consists of two phases, a preprocessing phase and a search phase. In the preprocessing phase, the shift table and location tables for P are computed. First, all elements of shift table D are initialized to maximum moving distance m q + 1 , and then, D is computed using the following equation:
t = max { i : μ P [ i q + 1 i ] = μ x , q 1 i < m 1 } ,
D [ f ( x ) ] = min ( m q + 1 , m t 1 ) .
In the search phase, OPPM is performed using the bad character heuristic and the tables. In the worst case, the algorithm proposed in [4] runs in O ( m n + n q log q + q ! ) time using the O ( m + q ! ) space.

3. New Order-Preserving Pattern Matching Algorithm

Our new OPPM algorithm runs faster and uses less space than the algorithm in [4]. Our algorithm also consists of two phases like the algorithm in [4]. The main differences are as follows. First, our algorithm uses a different fingerprint. It converts q-grams into binary strings and computes the fingerprints for the converted binary strings. Second, our algorithm uses two fingerprints of q-grams to reduce the number of order isomorphism tests. In the preprocessing phase, our algorithm converts pattern P into binary string P using the method from [7]. It also computes the shift tables for two q-grams and the location tables for P. In the search phase, it finds all substrings of T that are order isomorphic to P using the fingerprints for the q-grams of the binary strings and the bad character heuristic.

3.1. Preprocessing Phase

For string x over Σ , binary string x ( | x | = | x | 1 ) is defined as follows [7]:
x [ i ] = 1 if   x [ i ] < x [ i + 1 ] , 0 otherwise .
Fingerprint g ( w ) of q-gram w for binary string x is defined as follows:
g ( w ) = k = 0 q 1 w [ k ] · 2 q k 1 .
For example, when x = ( 21 , 69 , 93 , 77 ) , binary string x converted from x is ( 1 , 1 , 0 ) . When q = 3 , w = ( 1 , 1 , 0 ) and g ( w ) = 1 × 2 2 + 1 × 2 1 + 0 × 2 0 = 6 .
In the preprocessing phase, we compute binary string P and location tables L M a x P and L M i n P for P. P can be computed in O ( m ) time using the O ( m ) space by scanning P. Location tables L M a x P and L M i n P for P can be computed in O ( m log m ) time using the O ( m ) space, as explained above. We also compute shift tables D 1 and D 2 for P . For binary string x , we call x [ | x | q | x | 1 ] and x [ | x | 2 q | x | q 1 ] , respectively, the primary q-gram and the secondary q-gram of x . For example, when q = 3 , as shown in Figure 1, the primary q-gram and the secondary q-gram of P are P [ 5 7 ] and P [ 2 4 ] , respectively. First, all the elements of D 1 and D 2 are initialized to m q and m 2 q , respectively, which are the maximum distances that the pattern can move via the two q-grams. Then, D 1 and D 2 are computed using the following equations for P :
a w = max { i : P [ i q + 1 i ] = w , q 1 i < m 2 } ,
D 1 [ g ( w ) ] = min ( m q , m a w 1 ) ,
b w = max { i : P [ i q + 1 i ] = w , q 1 i < m q 2 } ,
D 2 [ g ( w ) ] = min ( m 2 q , m b w 1 ) .
Note that a w and b w are the last positions of the substrings of P that match q-gram w in P [ 0 m 3 ] and P [ 0 m q 3 ] , respectively. D 1 [ g ( w ) ] and D 2 [ g ( w ) ] store the distances that the pattern can move via the primary q-gram and the secondary q-gram, respectively.
Shift tables D 1 and D 2 can be computed in O ( 2 q + m ) time using the O ( 2 q ) space. Therefore, the preprocessing phase runs in O ( 2 q + m log m ) time using the O ( 2 q + m ) space.

3.2. Search Phase

We denote the fingerprint of the primary q-gram of P as p 1 , and we denote the fingerprint of the secondary q-gram of P as p 2 . That is, p 1 = g ( P [ m q 1 m 2 ] ) and p 2 = g ( P [ m 2 q 1 m q 2 ] ) . Furthermore, we denote the fingerprint of the primary q-gram of T [ i m + 1 i 1 ] as t 1 , and we denote the fingerprint of the secondary q-gram of T [ i m + 1 i 1 ] as t 2 . That is, t 1 = g ( T [ i q i 1 ] ) and t 2 = g ( T [ i 2 q i q 1 ] ) . Algorithm 1 shows the pseudocode of our algorithm.
The search phase consists of n m + 1 steps. In each step i ( m 1 i < n ) , we check whether T [ i m + 1 i ] and P are order isomorphic. First, we check whether fingerprints p 1 and t 1 are the same (line 9 of Algorithm 1). If p 1 t 1 , we shift P forward by D 1 [ t 1 ] increasing i by D 1 [ t 1 ] (line 18 of Algorithm 1). If p 1 = t 1 , we compare p 2 and t 2 (line 11 of Algorithm 1). If p 2 and t 2 are also the same, we test whether P and T [ i m + 1 i ] are order isomorphic using L M a x P and L M i n P in O ( m ) time. If T [ i m + 1 i ] P , we report i as an occurrence. Meanwhile, if T [ i m + 1 i ] P , by the definition of the order isomorphism, p 1 = t 1 and p 2 = t 2 . Therefore, if p 1 t 1 or p 2 t 2 , T [ i m + 1 i ] P ; hence, we can shift P forward by max ( D 1 [ t 1 ] , D 2 [ t 2 ] ) , regardless of whether p 2 and t 2 are the same or not (line 15 of Algorithm 1). The search phase runs in O ( m n ) time in the worst case because it might test order isomorphism in every step. Thus, the proposed algorithm solves the OPPM problem in O ( 2 q + m n ) time using the O ( 2 q + m ) space in total.
Algorithm 1: OPPM algorithm using fingerprints
1:
Input: A text T of length n and a pattern P of length m.
2:
Output: All positions of the substrings of T that are order isomorphic to P.
3:
Compute P , D 1 , D 2 , L M a x P , and L M i n P
4:
p 1 g ( P [ m q 1 m 2 ) ]
5:
p 2 g ( P [ m 2 q 1 m q 2 ) ]
6:
i m 1
7:
while i < n do
8:
     t 1 g ( T [ i q i 1 ] )
9:
    if  p 1 = t 1  then
10:
         t 2 g ( T [ i 2 q i q 1 ] )
11:
        if  p 2 = t 2  then
12:
           if  T [ i m + 1 i ] P  then
13:
               print i
14:
           end if
15:
        end if
16:
         i i + max ( D 1 [ t 1 ] , D 2 [ t 2 ] )
17:
    else
18:
         i i + D 1 [ t 1 ]
19:
    end if
20:
end while

4. Experimental Results

The experimental environment was as follows. The operating system was Windows 10 (64-bit); the CPU was an Intel Core i7-6700 (3.4 GHz); the RAM was 32 GB; the development tool was Visual Studio 2015; the development language was C++. We used three types of time series data in the experiment: a power consumption index, particulate matter (PM2.5) levels, and the Dow Jones Index. The power consumption index consisted of measurement data on the average voltage per minute of a household in Sceaux, France, from 00:00 on 16 December 2006 to 22:00 on 2 December 2008 [20]. The PM2.5 levels were from data recorded in Beijing at one-hour intervals from 00:00 on 2 January 2010 to 22:00 on 9 October 2014 [21]. The Dow Jones Index data were the daily closing prices of the Dow Jones Industrial Average from 2 May 1885 to 12 April 2019 [22]. Lengths n of text T for the power consumption index, the PM2.5 levels, and the Dow Jones Index were generated as 10 6 , 40,000, and 36,000, respectively. Pattern P was generated by extracting strings of lengths 7, 11, and 15 at random positions of T. For brevity, the power consumption index data are hereinafter referred to as VOLT, the particulate matter level data are referred to as PM2.5, and the Dow Jones Index data are indicated as DJIA. The algorithm proposed in [4] is referred to as OHq, and the algorithm based on SBNDM4 [23] and proposed in [7] is referred to as S4OPM. The algorithm proposed in this work was implemented in two versions. The first version was implemented as described in the previous section and is referred to as OHESq. The second version was implemented using only the fingerprint of the primary q-gram and is referred to as OHEq.
Table 2 compares the execution times of each algorithm, which are the sums for executing the algorithm for 1000 patterns and shows the sum of occurrences of all patterns in the text. In Table 2, the execution times of the fastest algorithms among the algorithms using q-grams for each m and q are in bold, and the execution times of the fastest algorithms regardless of q for each m are marked with an asterisk ( ) . With VOLT, OHESq executed up to approximately 1.98 -times faster than OHq ( m = 15 , q = 6 ) . With PM2.5, OHESq executed up to approximately 1.97 -times faster than OHq or OHEq ( m = 15 , q = 6 ) . With DJIA, OHESq executed up to approximately 1.88 -times faster than OHq or OHEq ( m = 15 , q = 6 ) . In all cases, OHESq executed at least 1.11 -times faster than OHq, at least 1.19 -times faster than OHEq, and at least 2.42 -times faster than S4OPM.
Table 3 shows the average number of order isomorphism tests for each m and q of OHq, OHEq, and OHESq using the bad character heuristic. When comparing OHq and OHEq, OHq tested for order isomorphism fewer times than OHEq in all cases. This is because the fingerprints used in [4] based on the factorial number system have a smaller probability that two fingerprints are identical compared to the fingerprints used in this paper. Meanwhile, OHESq tested for order isomorphism fewer times than OHEq in all cases and fewer times than OHq in most cases. We show the execution times of the preprocessing phases and search phases of OHq, OHEq, and OHESq for 1000 patterns in Table A1, Table A2 and Table A3.

5. Conclusions

This study improved the time and space complexity of the previous work on the OPPM problem by utilizing fingerprints of q-grams converted to binary numbers. Experiments on three types of time series data showed our algorithm is faster than the previous work because we reduced the number of order isomorphism tests. We believe the execution times of OPPM algorithms are highly related to the characteristics of the data, such as permutation entropy. Therefore, classifying the criteria of data characteristics and identifying the data according to the criteria can be an important research tasks in the future.

Author Contributions

Y.K. (Youngjoon Kim) and Y.K. (Youngho Kim) designed and analyzed the algorithm. Y.K. (Youngjoon Kim) implemented and experimented with the algorithms and wrote the draft of the paper. Y.K. (Youngho Kim) and J.S.S. reviewed and revised the paper. J.S.S. analyzed the algorithm, provided algorithmic support, and was the project manager. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Korean Government (MSIP) (NRF-2014M3C9A3064706), by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean Government (MSIT) (2020-0-01389, Artificial Intelligence Convergence Research Center (Inha University)), and by INHA UNIVERSITY Research Grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare they have no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
OPPMorder-preserving pattern matching
KMPKnuth–Morris–Pratt
SSEstreaming SIMD extensions
AVXadvanced vector extensions
VOLTpower consumption data
PM2.5particulate matter data
DJIADow Jones Index data

Appendix A

Table A1. Execution times of 1000 patterns for VOLT data.
Table A1. Execution times of 1000 patterns for VOLT data.
DatamAlgorithmPhaseExecution Time (s)
q = 3 q = 4 q = 5 q = 6
VOLT7OHqPrep.000.0020
Search2.7562.3383.6906.835
Total2.7562.3383.6926.835
OHEqPrep.0.001000
Search3.0512.8863.7786.822
Total3.0522.8863.7786.822
OHESqPrep.0.001···
Search2.097
Total2.098
11OHqPrep.0000.003
Search2.1291.3011.6122.229
Total2.1291.3011.6122.232
OHEqPrep.0.0010.00500.003
Search1.9611.4211.3561.448
Total1.9621.4261.3561.451
OHESqPrep.000.002·
Search1.3661.0461.028
Total1.3661.0461.030
15OHqPrep.0.0010.0010.0010.003
Search1.9400.9951.0571.352
Total1.9410.9961.0581.355
OHEqPrep.0.0030.0010.0030.002
Search2.0771.1751.0000.921
Total2.0801.1761.0030.923
OHESqPrep.0.0020.0020.0010.004
Search1.1820.7890.6700.682
Total1.1840.7910.6710.686
Table A2. Execution times of 1000 patterns for PM2.5 data.
Table A2. Execution times of 1000 patterns for PM2.5 data.
DatamAlgorithmPhaseExecution Time (s)
q = 3 q = 4 q = 5 q = 6
PM2.57OHqPrep.00.0020.0010
Search0.1190.1070.1590.280
Total0.1190.1090.1600.280
OHEqPrep.000.0010
Search0.1280.1210.1520.281
Total0.1280.1210.1530.281
OHESqPrep.0.001···
Search0.088
Total0.089
11OHqPrep.0.0010.00100
Search0.0920.0620.0730.096
Total0.0930.0630.0730.096
OHEqPrep.0.0010.0010.0040
Search0.0800.0620.0540.060
Total0.0810.0630.0580.060
OHESqPrep.0.0040.0010.001·
Search0.0530.0470.042
Total0.0570.0480.043
15OHqPrep.00.0010.0010
Search0.0840.0470.0480.059
Total0.0840.0480.0490.059
OHEqPrep.0.0010.00100.001
Search0.0690.0440.0390.036
Total0.0700.0450.0390.037
OHESqPrep.0.0010.00400
Search0.0460.0300.0280.030
Total0.0470.0340.0280.030
Table A3. Execution times of 1000 patterns for DJIA data.
Table A3. Execution times of 1000 patterns for DJIA data.
DatamAlgorithmPhaseExecution Time (s)
q = 3 q = 4 q = 5 q = 6
DJIA7OHqPrep.0.001000.001
Search0.1010.0900.1360.245
Total0.1020.0900.1360.246
OHEqPrep.00.00100
Search0.1120.1060.1410.254
Total0.1120.1070.1410.254
OHESqPrep.0.001···
Search0.080
Total0.081
11OHqPrep.0.0030.0010.0010.002
Search0.0750.0490.0600.083
Total0.0780.0500.0610.085
OHEqPrep.0.0010.0010.0010
Search0.0720.0520.0480.055
Total0.0730.0530.0490.055
OHESqPrep.0.000.0020·
Search0.0500.0400.038
Total0.0510.0420.038
15OHqPrep.0.0010.00100.001
Search0.0710.0370.0400.048
Total0.0720.0380.0400.049
OHEqPrep.0.0010.00200.001
Search0.0620.0380.0330.030
Total0.0630.0400.0330.031
OHESqPrep.0.0010.00100
Search0.0420.0290.0260.026
Total0.0430.0300.0260.026

References

  1. Kim, J.; Eades, P.; Fleischer, R.; Hong, S.; Iliopoulos, C.S.; Park, K.; Puglisi, S.J.; Tokuyama, T. Order-preserving matching. Theor. Comput. Sci. 2014, 525, 68–79. [Google Scholar] [CrossRef]
  2. Kubica, M.; Kulczynski, T.; Radoszewski, J.; Rytter, W.; Walen, T. A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 2013, 113, 430–433. [Google Scholar] [CrossRef]
  3. Knuth, D.E.; Morris, J.H., Jr.; Pratt, V.R. Fast Pattern Matching in Strings. SIAM J. Comput. 1977, 6, 323–350. [Google Scholar] [CrossRef] [Green Version]
  4. Cho, S.; Na, J.C.; Park, K.; Sim, J.S. A fast algorithm for order-preserving pattern matching. Inf. Process. Lett. 2015, 115, 397–402. [Google Scholar] [CrossRef]
  5. Knuth, D.E. The Art of Computer Programming, Volume II: Seminumerical Algorithms, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1998. [Google Scholar]
  6. Mares, M.; Straka, M. Linear-Time Ranking of Permutations. In Proceedings of the Algorithms—ESA 2007, 15th Annual European Symposium, Eilat, Israel, 8–10 October 2007; Lecture Notes in Computer, Science. Arge, L., Hoffmann, M., Welzl, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4698, pp. 187–193. [Google Scholar] [CrossRef]
  7. Chhabra, T.; Tarhio, J. A filtration method for order-preserving matching. Inf. Process. Lett. 2016, 116, 71–74. [Google Scholar] [CrossRef]
  8. Cantone, D.; Faro, S.; Külekci, M.O. An Efficient Skip-Search Approach to the Order-Preserving Pattern Matching Problem. In Proceedings of the Prague Stringology Conference (PSC) 2015, Prague, Czech Republic, 24–26 August 2015; pp. 22–35. [Google Scholar]
  9. Charras, C.; Lecrog, T.; Pehoushek, J.D. A very fast string matching algorithm for small alphabets and long patterns. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching, Piscataway, NJ, USA, 20–22 July 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–64. [Google Scholar]
  10. Intel. Intel®64 and IA-32 architectures optimization reference manual. In Order Number: 248966; Intel: Santa Clara, CA, USA, 2011; Volume 25. [Google Scholar]
  11. Faro, S.; Külekci, M.O. Fast Packed String Matching for Short Patterns. In Proceedings of the 15th Meeting on Algorithm Engineering and Experiments, ALENEX 2013, New Orleans, LA, USA, 7 January 2013; Sanders, P., Zeh, N., Eds.; SIAM: Philadelphia, PA, USA, 2013; pp. 113–121. [Google Scholar] [CrossRef] [Green Version]
  12. Faro, S.; Külekci, M.O. Fast and flexible packed string matching. J. Discret. Algorithms 2014, 28, 61–72. [Google Scholar] [CrossRef]
  13. Jeong, H.; Kim, S.; Lee, W.; Myung, S. Performance of SSE and AVX Instruction Sets. CoRR 2012. [Google Scholar]
  14. Intel. Intel Architecture Instruction Set Extensions Programming Reference; Intel Corp.: Santa Clara, CA, USA, 2004. [Google Scholar]
  15. Chhabra, T.; Faro, S.; Külekci, M.O.; Tarhio, J. Engineering order-preserving pattern matching with SIMD parallelism. Softw. Pract. Exp. 2017, 47, 731–739. [Google Scholar] [CrossRef]
  16. Nakamura, T.; Inenaga, S.; Bannai, H.; Takeda, M. Order Preserving Pattern Matching on Trees and DAGs. In Proceedings of the String Processing and Information Retrieval—24th International Symposium, SPIRE 2017, Palermo, Italy, 26–29 September 2017; Lecture Notes in Computer, Science. Fici, G., Sciortino, M., Venturini, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10508, pp. 271–277. [Google Scholar] [CrossRef] [Green Version]
  17. Na, J.C.; Lee, I. A Simple Heuristic for Order-Preserving Matching. IEICE Trans. Inf. Syst. 2019, 102-D, 502–504. [Google Scholar] [CrossRef] [Green Version]
  18. Crochemore, M.; Iliopoulos, C.S.; Kociumaka, T.; Kubica, M.; Langiu, A.; Pissis, S.P.; Radoszewski, J.; Rytter, W.; Walen, T. Order-preserving indexing. Theor. Comput. Sci. 2016, 638, 122–135. [Google Scholar] [CrossRef] [Green Version]
  19. Horspool, R.N. Practical Fast Searching in Strings. Softw. Pract. Exp. 1980, 10, 501–506. [Google Scholar] [CrossRef]
  20. Hebrail, G.; Berard, A. Individual Household Electric Power Consumption Data Set. 2012. Available online: https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption (accessed on 17 December 2021).
  21. Chen, S.X. Beijing PM2.5 Data Set. 2017. Available online: https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data, (accessed on 17 December 2021).
  22. Williamson, S. Daily Closing Values of the DJA in the United States, 1885 to Present, Measuring Worth. 2021. Available online: https://www.measuringworth.com/datasets/DJA/index.php (accessed on 17 December 2021).
  23. Navarro, G.; Raffinot, M. Flexible Pattern Matching in Strings—Practical Online Search Algorithms for Texts and Biological Sequences; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Order-preserving pattern matching using fingerprints of the primary q-gram and the secondary q-gram when q = 3 .
Figure 1. Order-preserving pattern matching using fingerprints of the primary q-gram and the secondary q-gram when q = 3 .
Mathematics 10 01954 g001
Table 1. Prefix table μ x and location tables L M a x x and L M i n x for x = ( 5 , 11 , 18 , 7 , 3 , 9 ) .
Table 1. Prefix table μ x and location tables L M a x x and L M i n x for x = ( 5 , 11 , 18 , 7 , 3 , 9 ) .
i012345
x [ i ] 51118739
μ x [ i ] 012103
L M a x x [ i ] 1 010 1 3
L M i n x [ i ] 1 1 1 101
Table 2. Comparison of the execution times and the number of occurrences for VOLT, PM2.5, and DJIA data (sums for 1000 patterns). Bold indicates the execution times of the fastest algorithms for each m and q, and the data marked with ∗ indicate the execution times of the fastest algorithms regardless of q for each m.
Table 2. Comparison of the execution times and the number of occurrences for VOLT, PM2.5, and DJIA data (sums for 1000 patterns). Bold indicates the execution times of the fastest algorithms for each m and q, and the data marked with ∗ indicate the execution times of the fastest algorithms regardless of q for each m.
DatamAlgorithmExecution Time (Seconds)Number of
Occurrences
q = 3 q = 4 q = 5 q = 6
VOLT7OHq2.7562.3383.6926.835992,773
OHEq3.0522.8863.7786.822
OHESq2.098···
S4OPM5.258
11OHq2.1291.3011.6122.2322765
OHEq1.9621.4261.3561.451
OHESq1.3661.0461.030·
S4OPM2.996
15OHq1.9410.9961.0581.3551001
OHEq2.0801.1761.0030.923
OHESq1.1840.7910.6710.686
S4OPM2.513
PM2.57OHq0.1190.1090.160.28117,682
OHEq0.1280.1210.1530.281
OHESq0.089···
S4OPM0.218
11OHq0.0930.0630.0730.0963613
OHEq0.0810.0630.0580.06
OHESq0.0570.0480.043·
S4OPM0.122
15OHq0.0840.0480.0490.0591020
OHEq0.070.0450.0390.037
OHESq0.0470.0340.0280.030
S4OPM0.087
DJIA7OHq0.1020.090.1360.24669,054
OHEq0.1120.1070.1410.254
OHESq0.081···
S4OPM0.196
11OHq0.0780.0500.0610.0851381
OHEq0.0730.0530.0490.055
OHESq0.0510.0420.038·
S4OPM0.112
15OHq0.0720.0380.0400.0491002
OHEq0.0630.0400.0330.031
OHESq0.0430.0300.0260.026
S4OPM0.079
Table 3. Comparison of the average numbers of order isomorphism tests for VOLT, PM2.5, and DJIA data.
Table 3. Comparison of the average numbers of order isomorphism tests for VOLT, PM2.5, and DJIA data.
DatamAlgorithmAverage Number of
Order Isomorphism Tests
q = 3 q = 4 q = 5 q = 6
VOLT7OHq68,46023,40481863005
OHEq56,87236,60823,60515,861
OHESq15,861···
11OHq51,31121,71836761037
OHEq36,72918,35610,3816257
OHESq10,38833931037·
15OHq47,11096032440662
OHEq31,11213,14865913587
OHESq36,06585272336725
PM2.57OHq33841523711367
OHEq260116381076747
OHESq747···
11OHq2587914379164
OHEq1737902511306
OHESq46416255·
15OHq2244658263114
OHEq1442663342189
OHESq148037611539
DJIA7OHq2492930366156
OHEq20921319855577
OHESq577···
11OHq187051817160
OHEq1328664373221
OHESq36912137·
15OHq174240612440
OHEq1146487243132
OHESq12823058527
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, Y.; Kim, Y.; Sim, J.S. An Improved Order-Preserving Pattern Matching Algorithm Using Fingerprints. Mathematics 2022, 10, 1954. https://doi.org/10.3390/math10121954

AMA Style

Kim Y, Kim Y, Sim JS. An Improved Order-Preserving Pattern Matching Algorithm Using Fingerprints. Mathematics. 2022; 10(12):1954. https://doi.org/10.3390/math10121954

Chicago/Turabian Style

Kim, Youngjoon, Youngho Kim, and Jeong Seop Sim. 2022. "An Improved Order-Preserving Pattern Matching Algorithm Using Fingerprints" Mathematics 10, no. 12: 1954. https://doi.org/10.3390/math10121954

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop