ScBUC-seq – Novel strategy for single cell long-read sequencing with nanopore technology
Long-read sequencing for single cell research has the potential to give unprecedented insight into human diseases, with the possibility of being used as a diagnostic tool.
A range of studies have been published combining single cell approaches with long-read sequencing using either PacBio or Oxford Nanopore technologies . These promising methods afford unique advantages such as the detection of RNA splicing events, fusion transcripts and single nucleotide polymorphisms. However, they also have disadvantages, ranging from low numbers of cells that can be sequenced due to low read depth, to low base-calling accuracy, and high-cost implications.
Recently, Martin Philpott et al. published a pre-print on bioRxiv titled “Highly accurate barcode and UMI error correction using dual nucleotide dimer blocks allows direct single-cell nanopore transcriptome sequencing”, addressing many of these issues.
Here is a short summary of their work:
- The team developed a method for single-cell Barcode UMI Correction sequencing (scBUC-seq)
- The method enables nanopore-based single cell transcriptome long-read sequencing without the need for additional short-read sequencing for error correction
- ScBUC-seq uses mRNA capture oligos with barcodes and UMIs that are made of homodimeric nucleotides
- Barcodes were synthesised through a split and pool process using dual nucleoside phosphoramidite building blocks
- Subsequent data analysis involves new two-pass assignment method for barcode identification. This ensures that:
– Correct barcodes are identified
– Barcode errors are corrected and assigned to unaffected cells
- The authors showed successful error correction on species mixing data
- They could resolve isoform usage between three different myeloma cell lines
- Finally, they detected a fusion transcript from Ewing’s sarcoma
For a more detailed description of the work, keep reading
High base-calling error rates are a known issue with long-read methods using nanopore sequencing. For bulk RNA sequencing methods, the advantages gained through nanopore long-read sequencing often outweigh the disadvantages. However, these become a major hurdle to single cell workflows because they require high base accuracy for detection of barcodes and UMIs. Often, this is overcome by combining nanopore technology with Illumina sequencing and using of the short-read libraries for error correction. Unfortunately, this method is not just labour intensive, but it also results in high experimental cost.
The solution developed by Philpott and colleagues, scBUC-seq, removes the need for additional short-read sequencing by redesigning the barcode and UMI structure used for single cell RNA Sequencing (scRNA-Seq). This method uses homodimeric nucleotides, which are incorporated into the barcodes and UMIs by classic split-pool synthesis. This generates sequences of repeat bases allowing for efficient error detection.
We redeveloped the beads for single-cell. We error-correct barcode and UMIs by using blocks of dimeric nucleotides during synthesis. This allows us to pin point the barcodes with errors. We then use this to error correct both the barcode AND the UMI. pic.twitter.com/hhmcp1jbDU
— Adam Cribbs (@AdamCribbs) January 20, 2021
The authors combined this oligo structure with a novel computational approach for barcode identification using a two-pass assignment method. In the first step, correct oligos are identified through determining barcodes that contain full length nucleotide complementary pairs. These correct barcodes are then used to correct the remaining barcodes, enabling them to be assigned to cells that have not been affected by sequencing errors. Furthermore, UMI correction method from UMI-tools was modified to incorporate the sequencing error rate into the deduplication strategy. Employing this method, they were able to recover 96% of barcodes and efficiently de-duplicate, even with base-calling error rates of over 10%, in a simulation.
The authors tested scBUC-seq in three different experiments:
1. Species mixing experiment with human HEK293T and mouse 3T3
2. Resolving of myeloma cell lines NCI-H929, JJN3 and DF15
3. Detection of fusion transcripts in Ewing’s cells line
In the first instance, the authors performed an scRNA-Seq run on the Nadia Instrument with an equal mixture of human HEK293T and mouse 3T3 cells. Over the course of the run, cells were encapsulated with the homodimeric-nucleotide barcoded beads described above. This was followed by cDNA amplification, sequencing library preparation and sequencing with either Illumina or Oxford Nanopore technologies to assess scBUC-seq’s performance on both platforms. Using the methodology described above, the authors were able to recover an extra 8% and 54% of barcodes for Illumina and Oxford Nanopore sequencing, respectively.
In a second experiment, the researchers applied scBUC-seq to a 1:1:1 mixture of myeloma cell lines NCI-H929, JJN3 and DF15 to show how efficiently they could resolve these cell lines on a gene and transcript level. 500- and 1,200-cell libraries were prepared on the Nadia Instrument and sequenced on a MinIOn and a PromethION sequencer, respectively. In accordance with literature, a significant differential expression of both immunoglobulin kappa and lambda, and light-chain isoform usage between the different myeloma cell lines were observed.
In the last experiment, the aim was to apply scBUC-seq to detect genomic translocation events. As a model the authors chose STA-ET-1, an Ewing’s sarcoma cell line which is known to translate the fusion protein EWS-FLI. This fusion protein regulates the expression of several target genes that maintain stem cell phenotypes, promote cell proliferation, survival and drug resistance. Species mixing data was used as a control to determine the rate of false positive fusion events. By employing a threshold of 5 UMIs per fusion event, all falsely detected human-mouse artefacts were removed. Using the same threshold, the authors were also able to detect the EWS-FLI fusion transcript in 17% of cells and a novel fusion product between FLI1 and the long non-coding RNA AL596087.2.
— Adam Cribbs (@AdamCribbs) January 20, 2021
In conclusion, Philpott and colleagues were successfully able to show that their method, scBUC-seq, enables reliable and cost-effective long-read sequencing for single cell samples processed on the Nadia Instrument. This novel approach could enable the detection of many biologically relevant pathways such as RNA-splicing events, single nucleotide polymorphism and imprinting at a single cell level. You can find the full paper here.
Read our past blogs reviewing interesting papers by our customers: