Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning




In order to contribute to the dissemination of HPC among academia and SMEs, weekly seminars have been held online since December 2020. HPC use cases in different areas ranging from modern computing tools such as parallel programming, GPU programming, big data, AI, and machine learning techniques to applied subjects such as computational fluid dynamics and materials modelling are presented in these talks, which helped increase the awareness in this area.

For information about the project, visit our EuroCC web page.

  • Seminars will be held online via Zoom.
  • Please register to the Zoom meeting with your real name/surname.

NOTE: Please pay attention to the following points during the online seminar event:

  • Please use the Q&A section for your questions and the chat section for your comments.
    • 2:00 PM 3:00 PM
      Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning 1h

      Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms that use whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. In this talk, I will present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent from (i) sequencing technology, (ii) exome capture kit and (iii) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.

      Speaker: Dr A. Ercüment Çiçek (Bilkent University)