Potential Applications using Class Frequency Distribution of Maximal Repeats extracted from Tagged Sequential data

講者:王經篤 / 亞洲大學資訊工程系
地點:3F – 第一會議室
講題:Potential Applications using Class Frequency Distribution of Maximal Repeats extracted from Tagged Sequential data


With the state-of-the-art computation mode, Map&Reduce,
This talk will present a novel approach to speed up the computation of extracting maximal
repeats from tagged sequences and meanwhile computing the class frequency distribution of
these repeats. An USA patent based on above approach is being applied as "Wang, Ching-Tu.
Method for Extracting Maximal Repeat Patterns and Computing Frequency Distribution Tables.
Patent Application Serial Number 15/208,994. 13 July 2016."

There are some potential applications that can benefit from adopting this approach as follows.

(1) This approach provides valuable clues for trend analysis in text mining
that one can have texts attached with timestamps as tags
and then observe the frequency distribution of the patterns over equally spaced time
intervals to predict the trend. Observing frequency distributions (histories) of
significant patterns plays an important role for trend analysts.

(2) It is a big challenge in bioinformatics for biologists or domain experts,
to identify the relationship between "genotype" and "phenotype".
However, this paper may provide valuable clues to identify the biomarkers (genotypes)
by comparing the class frequency distributions of maximal repeat patterns
extracted from those genomic sequences attached with distinct classes (tags),
where the classes are given firmly by biologists or domain experts
according to the existence of features (phenotypes).
Therefore, it is attractive for biologists to have further experiments with the patterns,
if exist, that just appear in one unique class.

(3) Production line analysis that the ones collect the values sensed or materials used
in each step when the factories generate the products as their traceability.
Once some products are identified their defects by the quality control department,
or are out of order after being sold and used for a time period,
the companies may benefit from their rapid response by inspecting the suspected patterns
that are found mostly in the traceability of failed products but seldom happened in that
of normal products.

(4) This approach also provides new direction for
web events (logs) analysis in internet security
that the ones collect the time series of events attached with internet address (IPs) as tags
and then may detect the regularity of some hacker actions;
behavior analysis that the ones use the sequential actions labeled with types (classes)
as tags by observers (or researcher)
and then may identify some distinctive habits (actions) just happened in specific classes.