Changes in transcriptional regulation are thought to be one of the key drivers of carcinogenesis. Although next-generation sequencer revolutionized transcriptome analysis, there are limitations in the analysis of full-length transcripts with short-read sequencing data. We developed a multi-sample long-read transcriptome assembly pipline, MuSTA, and showed through simulation that a transcriptome can be constructed from the transcripts represented by the target samples, enabling accurate evaluation of transcriptional regulation. RNA extracted from 22 breast cancer clinical specimens were subjected to Iso-seq full-length transcriptome sequencing using Sequel (PacBio). The MuSTA pipeline was applied to the long-read sequencing data to successfully obtain a full-length transcriptome for the entire cohort. By comparing isoform existence and expression between estrogen receptor positive and triple-negative subtypes, we obtained a comprehensive set of subtype-specific isoforms and differentially used isoforms which consisted of both known and unannotated isoforms. We also found that the exon-intron structure of fusion transcripts depends on the features of the involved genomic regions, and that three-piece fusion transcripts were transcribed from complex structural variations.
Learning Objectives:
1. To understand the somatic structural variations of cancer genomes
2. To understand the technical difficulties in transcript analysis with long-read sequencers
3. To find out how transcriptome analysis using a long-read sequencer is useful for understanding cancer biology