Unknown

Dataset Information

0

NetNCSP: Nonoverlapping closed sequential pattern mining.


ABSTRACT: Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.

SUBMITTER: Wu Y 

PROVIDER: S-EPMC7118609 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

NetNCSP: Nonoverlapping closed sequential pattern mining.

Wu Youxi Y   Zhu Changrui C   Li Yan Y   Guo Lei L   Wu Xindong X  

Knowledge-based systems 20200331


Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns d  ...[more]

Similar Datasets

| S-EPMC8743106 | biostudies-literature
| S-EPMC3333188 | biostudies-literature
| S-EPMC6410268 | biostudies-literature
| S-EPMC1892096 | biostudies-literature
| S-EPMC6042480 | biostudies-literature
| S-EPMC8645335 | biostudies-literature
| S-EPMC4979166 | biostudies-literature
| S-EPMC7814421 | biostudies-literature
| S-EPMC8172301 | biostudies-literature
| S-EPMC9847378 | biostudies-literature