Eureka Unveils Plant-LncPipe: New Tool for Plant lncRNA ID

Nanjing Agricultural University The Academy of Science

Long non-coding RNAs (lncRNAs) are ubiquitous transcripts with crucial regulatory roles in various biological processes, including chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. While accumulating evidence elucidates mechanisms by which plant lncRNAs modulate growth, root development, and seed dormancy, their accurate identification remains challenging due to a lack of plant-specific methods. Currently, the mainstream methods for plant lncRNA identification are largely developed based on human or animal datasets. Consequently, the accuracy and effectiveness of these methods in predicting plant lncRNAs have not been fully evaluated.

Recently, a research article titled " Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification" by the group led by Jian-Feng Mao from Beijing Forestry University and Umea University was published online in Horticulture Research. This study extensively collected high-quality RNA-sequencing data from various plants and utilized these plant-specific data to retrain the models of three mainstream lncRNA prediction tools, namely CPAT, LncFinder, and PLEK. The performance of the retrained models was compared and evaluated against other popular lncRNA prediction tools, such as CPC2, CNCI, RNAplonc, and LncADeep. The results demonstrated that the retrained models significantly improved the prediction performance for plant lncRNAs. Among them, two retrained models, LncFinder-plant and CPAT-plant, outperformed others on multiple evaluation metrics, rendering them the most suitable tools for plant lncRNA identification.

This research developed a computational pipeline, named Plant-LncPipe, for the identification and analysis of plant lncRNAs. This pipeline integrates two top-performing identification models, CPAT-plant and LncFinder-plant, enabling a comprehensive computational process encompassing raw data preprocessing, transcript assembly, lncRNA identification, lncRNA classification, and lncRNA origins. This computational pipeline can be widely applied to various plant species. Plant-LncPipe is publicly available and can be downloaded from the following link: https://github.com/xuechantian/Plant-LncRNA-pipline.

The study demonstrates that retraining lncRNA prediction models on high-quality plant transcriptomic data enabled more accurate capture of plant lncRNA features, significantly enhancing prediction precision and reliability. The study underscored the importance of species-specific retraining to improve model accuracy. Retraining existing mature models retained prior accumulated experience and methodologies while further boosting model applicability and accuracy.

Ph.D. student Xue-Chan Tian and master student Zhao-Yang Chen from Beijing Forestry University were the co-first authors. Ph.D. students Shuai Nie, Tian-Le Shi, Xue-Mei Yan, Yu-Tao Bao, Zhi-Chao Li, Kai-Hua Jia, master's student Hai-Yao Ma, and postdoctoral researcher Wei Zhao participated in and assisted with the research. This research was supported by the National Key R&D Program of China (2022YFD2200103) and National Natural Science Foundation of China (32171816).

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.