fMLC: fast multi-level clustering and visualization of large molecular datasets

D Vu; S Georgievska; S Szoke; A Kuzniar; V Robert

doi:10.1093/bioinformatics/btx810

fMLC: fast multi-level clustering and visualization of large molecular datasets

Bioinformatics. 2018 May 1;34(9):1577-1579. doi: 10.1093/bioinformatics/btx810.

Authors

D Vu¹, S Georgievska², S Szoke¹, A Kuzniar², V Robert¹

Affiliations

¹ Bioinformatics group, Westerdijk Fungal Biodiversity Institute, 3584CT Utrecht, The Netherlands.
² Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands.

PMID: 29253070
DOI: 10.1093/bioinformatics/btx810

Abstract

Motivation: Despite successful applications of data clustering and visualization techniques in molecular sequence identification, current technologies still do not scale to large biological datasets.

Results: We address this problem by a new multi-threaded tool, fMLC, primarily developed to cluster DNA sequences, that is supplemented with an interactive web-based visualization component, DiVE. fMLC enabled to compare, cluster and visualize 350K ITS fungal sequences at the species level. It took less than two hours to compare and cluster the dataset, which is twelve times faster than the time reported previously.

Availability and implementation: https://github.com/FastMLC/fMLC (doi: 10.5281/zenodo.926820).

Contact: d.vu@westerdijkinstitute.nl or v.robert@westerdijkinstitute.nl.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis*
Software
Time Factors