Potential Applications using Class Frequency Distribution of Maximal Repeats extracted from Tagged Sequential data
講者:王經篤 / 亞洲大學資訊工程系 時段:14:10~14:50 地點:3F – 第一會議室 講題:Potential Applications using Class Frequency Distribution of Maximal Repeats extracted from Tagged Sequential data |
摘要:
With the state-of-the-art computation mode, Map&Reduce, This talk will present a novel approach to speed up the computation of extracting maximal repeats from tagged sequences and meanwhile computing the class frequency distribution of these repeats. An USA patent based on above approach is being applied as "Wang, Ching-Tu. Method for Extracting Maximal Repeat Patterns and Computing Frequency Distribution Tables. Patent Application Serial Number 15/208,994. 13 July 2016." There are some potential applications that can benefit from adopting this approach as follows. (1) This approach provides valuable clues for trend analysis in text mining that one can have texts attached with timestamps as tags and then observe the frequency distribution of the patterns over equally spaced time intervals to predict the trend. Observing frequency distributions (histories) of significant patterns plays an important role for trend analysts. (2) It is a big challenge in bioinformatics for biologists or domain experts, to identify the relationship between "genotype" and "phenotype". However, this paper may provide valuable clues to identify the biomarkers (genotypes) by comparing the class frequency distributions of maximal repeat patterns extracted from those genomic sequences attached with distinct classes (tags), where the classes are given firmly by biologists or domain experts according to the existence of features (phenotypes). Therefore, it is attractive for biologists to have further experiments with the patterns, if exist, that just appear in one unique class. (3) Production line analysis that the ones collect the values sensed or materials used in each step when the factories generate the products as their traceability. Once some products are identified their defects by the quality control department, or are out of order after being sold and used for a time period, the companies may benefit from their rapid response by inspecting the suspected patterns that are found mostly in the traceability of failed products but seldom happened in that of normal products. (4) This approach also provides new direction for web events (logs) analysis in internet security that the ones collect the time series of events attached with internet address (IPs) as tags and then may detect the regularity of some hacker actions; behavior analysis that the ones use the sequential actions labeled with types (classes) as tags by observers (or researcher) and then may identify some distinctive habits (actions) just happened in specific classes.
講者簡介:
http://dns2.asia.edu.tw/~jdwang/
- Tutorial 議程規劃 < Apache Flink 篇 >
- Log event stream processing in Flink way