Large-scale investigation of short read archives for disruption of transcription termination and circular splicing
Final Report Abstract
Short read archives such as the NCBI SRA provide vast amounts of public RNA-seq data for diverse cell types and conditions that can be used to investigate scientific questions beyond the objectives for which the data were originally generated. In this project, we investigated methods for exploiting these public data to identify novel conditions inducing disruption of transcription termination (DoTT) and particular circRNAs. This was motivated by the discovery of (i) widespread DoTT affecting the majority of genes in lytic herpes simplex virus 1 (HSV-1) infection and diverse cell stresses and (ii) increases in circRNA levels and induction of a novel circRNA of the nuclear non-coding RNA NEAT1 during HSV-1 infection. NEAT1 is one of the most highly expressed lincRNAs, is essential for the structure of paraspeckles, nuclear bodies located in the interchromatin space, and has to remain unspliced for paraspeckle assembly. In this project, we evaluated several methods for the fast search in large sequencing data sets for evidence of DoTT or expression of certain circRNAs: (i) succinct data structures for short read data, such as Sequence Bloom Trees (SBTs), that allow fast searching for presence of particular sequences, (ii) fast approximate transcript quantification approaches, such as kallisto, and (iii) querying recount2/3, which provide easy access to large numbers of re-mapped RNA-seq data in the SRA. Our results showed that (i) quantitative approaches are absolutely necessary both for read-through and circRNA detection, (iii) quantification of read-through for large numbers of samples can be performed quickly both with kallisto or using recount2/3, but (iii) specificity in distinguishing samples with DoTT from those without DoTT is low. This is due to (i) the poor quality of metadata annotation in the SRA, resulting in samples being classified as RNA-seq which were generated with other sequencing methods and (ii) the high variability in read-through depending on condition and type of RNA sequenced. Searching for novel splicing events in recount2/3 proved to be more successful, in particular for detecting conditions with NEAT1 splicing. As the induced NEAT1 circRNA was linked to novel linear splicing, other conditions with NEAT1 circular splicing could be detected by searching linear splice junctions provided by recount3. In this way, we discovered that CDK7 inhibition as well as CDK7 and MED1 knockdown induce both DoTT and NEAT1 splicing. This highlighted a possible link between read-through and NEAT1 splicing and a potential role for CDK7 in HSV-1 inhibition. In addition, we showed that degradation of linear host transcripts by viral proteins leads to an enrichment of circRNAs in both HSV-1 and influenza A virus infection. In contrast, the novel NEAT1 circRNA is induced by increased de novo biogenesis occurring cotranscriptionally. In summary, this project showed that novel biological insights can be obtained by exploring the wealth of public sequencing data already available.
Publications
-
Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution. GigaScience, 9(6).
Kluge, Michael; Friedl, Marie-Sophie; Menzel, Amrei L. & Friedel, Caroline C.
-
CDK11 regulates pre-mRNA splicing by phosphorylation of SF3B1. Nature, 609(7928), 829-834.
Hluchý, Milan; Gajdušková, Pavla; Ruiz de los Mozos, Igor; Rájecký, Michal; Kluge, Michael; Berger, Benedict-Tilman; Slabá, Zuzana; Potěšil, David; Weiß, Elena; Ule, Jernej; Zdráhal, Zbyněk; Knapp, Stefan; Paruch, Kamil; Friedel, Caroline C. & Blazek, Dalibor
-
HSV-1 and influenza infection induce linear and circular splicing of the long NEAT1 isoform. PLOS ONE, 17(10), e0276467.
Friedl, Marie-Sophie; Djakovic, Lara; Kluge, Michael; Hennig, Thomas; Whisnant, Adam W.; Backes, Simone; Dölken, Lars & Friedel, Caroline C.
