Evaluation of quality-related parameters in raw NGS data and implementing tools to obtain them

Publish Year: 1396
نوع سند: مقاله کنفرانسی
زبان: English
View: 466

نسخه کامل این Paper ارائه نشده است و در دسترس نمی باشد

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

IBIS07_156

تاریخ نمایه سازی: 29 فروردین 1397

Abstract:

Given the low accuracy of Next Generation Sequencing (NGS) compared to Sanger sequencing, it is likely to misinterpret the data, if no primary quality control is performed. Quality control (QC) of raw data is considered as an important initiative step for overcomming instrumental artifacts. However, the main problem is that there is niether specific guideline nor gold standard parameters. This study aims to re-introduce parameters related to QC and suggests combination of existing tools for efficient quality checking. The suggested parameters of pre-processing to focus on, namely Quality Score, Read Complexity, Duplicate Reads were extracted for the data. The very first parameter to investigate, quality score as measure of uncertainty of basecall, depended on instrumental variables. Due to variation of length and arrangement of bases in each read, it is necessary to observe base composition visually for further decisions like adapter trimming or defining a quality score cut-off. Another important parameter to consider was read complexity which could cause mistaken alignment.Third one to investigate was duplicate read. Removing duplicate reads, believed to be a result of experimental errors, may cause loss of uniqe biological information. Also efficiency of the sequencer, bases of high quality, Primer/Adapter contamination and N base count were helpful for decision making. Tools used to obtain effective factors and implement an effective pipeline were a suggested combination of PPR Plot program[1], FaQCs[2], AfterQC[3] and NGS QC Toolkit[4]. A lot of information is generated by using QC tools that can help deciding on properties of secondary step of NGS analysis, utilizing our implemented combination of aforementioned tools, data-specific features like Quality Score, Read Complexity and Duplicate Reads could be quantified to simplify quality control for an expert.

Authors

H Mohammadi

Student Research Committee, Department of Bioinformatics, School of Advanced Medical Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, ۸۱۷۴۶-۷۳۴۶۱, Iran

M Sehhati

Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, ۸۱۷۴۶-۷۳۴۶۱, Iran

A Vaez

Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, ۸۱۷۴۶-۷۳۴۶۱, Iran