Improved LINE-1 Detection through Pattern Matching by Increasing Probe Length

Biology (Basel). 2024 Apr 2;13(4):236. doi: 10.3390/biology13040236.

Abstract

Long Interspersed Element-1 (LINE-1 or L1) is an autonomous transposable element that accounts for 17% of the human genome. Strong correlations between abnormal L1 expression and diseases, particularly cancer, have been documented by numerous studies. L1PD (LINE-1 Pattern Detection) had been previously created to detect L1s by using a fixed pre-determined set of 50-mer probes and a pattern-matching algorithm. L1PD uses a novel seed-and-pattern-match strategy as opposed to the well-known seed-and-extend strategy employed by other tools. This study discusses an improved version of L1PD that shows how increasing the size of the k-mer probes from 50 to 75 or to 100 yields better results, as evidenced by experiments showing higher precision and recall when compared to the 50-mers. The probe-generation process was updated and the corresponding software is now shared so that users may generate probes for other reference genomes (with certain limitations). Additionally, L1PD was applied to other non-human genomes, such as dogs, horses, and cows, to further validate the pattern-matching strategy. The improved version of L1PD proves to be an efficient and promising approach for L1 detection.

Keywords: LINE-1; ORF; k-mer; pattern matching; probe.