Series | GSE71315 |
Title | Single cell analysis of long non-coding RNAs in the developing human neocortex |
---|---|
Year | 2016 |
Country | USA |
Article | Lim DA,Diaz AA,Kriegstein AR,Weissman JS,He D,Attenello FJ,Horlbeck MA,Lui JH,Pollen AA,Nowakowski TJ,Liu SJ.Single-cell analysis of long non-coding RNAs in the developing human neocortex.Genome biology.2016 Apr 14 |
PMID | 27081004 |
Bio Project | BioProject: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA290888 |
Sra | SRA: http://www.ncbi.nlm.nih.gov/sra?term=SRP061549 |
Overall Desgin | 16 Bulk Tissue Samples from GW13-23; 226 Single Cells from GW19.5-23.5 |
Summary | Long non-coding RNAs (lncRNAs) comprise a diverse class of transcripts that can regulate molecular and cellular processes in brain development and diseasee. LncRNAs exhibit cell type- and tissue-specific expression, but little is known about the expression and function of lncRNAs in the developing human brain. Here, we deeply profiled lncRNAs from polyadenylated and total RNA obtained from human neocortex at different stages of development and integrated this resource to analyze the transcriptomes of single cells. While lncRNAs were generally detected at low levels in whole tissues, single cell transcriptomics revealed that many lncRNAs are abundantly expressed in individual cells and are cell type-specific. Furthermore, we used CRISRPi to show that LOC646329, a lncRNA enriched in radial glia but detected at low abundance in tissues, regulates cell proliferation. The discrete and abundant expression of lncRNAs among individual cells has important implications for both their biological function and utility for distinguishing neural cell types. |
Experimental Protocol | Trizol followed by DNAse I treatment; QIAGEN Rneasy columns; illumina Truseq Stranded mRNA Trizol followed by DNAse I treatment; QIAGEN Rneasy columns; illumina Truseq Stranded Total RNA with Ribo-Zero Gold Single-cell capture and cell lysis using Fluidigm C1.; RT and whole transcriptome amplifcation on the Fluidigm C1 IFC; library indexed by illumina Nextera DNA Sample Prep Kit |
Data processing | Bulk RNA-seq: Strand-specific reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, using TopHat v2.0.10 with the flags (--library-type fr-firststrand –microexon-search). De novo transcriptome assembly was performed separately on rRNA depletion total RNA-seq alignments, and on polyA selection RNA-seq alignments, using Cufflinks v2.2.1 with the flags (-M ensembl_75_mtRNA_rRNA.gtf -b genome.fa -u --library-type fr-firststrand --max-multiread-fraction 0.25 --3-overhang-tolerance 2000). Transcriptome assemblies at all developmental stages and replicates were merged, separately for rRNA depletion total RNA-seq and polyA selection RNA-seq, with the Ensembl 75/GENCODE 19 reference transcriptome, using Cuffmerge. To identify transcripts novel compared to Ensembl, we utilized Cuffcompare class codes and extracted those assembled transcripts classified as: i – novel intronic, u – novel intergenic, x – novel antisense. All novel transcripts under 200 nt in length were removed. Of the remaining transcripts, we determined minimal read coverage thresholds based on whether Cufflinks classified previously annotated transcripts as having “full_read_support.†By analyzing the true positive rate vs. false positive rate of classifying known genes as obtaining “full_read_support†at various coverage thresholds, we determined the minimum coverage to be 1.4 for polyA and 1.67 for total RNA-seq (at FDR = 0.05). Starting with just the polyA RNA-seq data, transcripts with read coverage above 1.4 in both biological replicates of at least one developmental stage were included in the reference and considered to be expressed in the neocortex. Due to limited availability of early fetal tissue, the GW14.5 sample was treated as the biological duplicate of the GW13 sample. Novel transcripts that were predicted to have protein coding capability by one or more of the following methods were classified as transcripts of uncertain coding potential (TUCP): CPAT, threshold = 0.364; CPC, threshold = 0; Pfam. For comparing to the Pfam database, the longest potential open reading frame (ORF) of each novel transcript was obtained, and any putative ORF that had a significant match for a protein domain annotated in Pfam A or Pfam B resulted in the parent transcript being classified as a TUCP. All remaining novel lncRNAs and TUCPs were then named according to recently proposed nomenclature standards, for instance LINC-[nearest mRNA] for intergenic lncRNAs and [nearest mRNA]-AS for antisense lncRNAs, and were then merged to the Ensembl 75 reference transcriptome, resulting in the polyA Full reference transcriptome. The polyA Stringent reference transcriptome was produced by removing all novel single-exon lncRNAs and TUCPs. Known lncRNAs from Ensembl were obtained by identifying transcripts with one of the following biotype classifications: “3prime_overlapping_ncrna,†“antisense,†“lincRNA,†“processed_transcript,†“sense_intronic,†and “sense_overlapping.†The same pipeline, with the coverage threshold of 1.67, was performed for reads derived from the total RNA-seq. Gene-level fragment counts for each polyA and total RNA sample were quantified using featureCounts v1.4.6, using the flags: -p -s 2 -B -C -t exon -g gene_id. Count tables were normalized to TPM (Transcripts per Million) for internal comparisons and visualizations of bulk RNA-seq. To identify differentially expressed genes, we used DESeq2 on gene-level fragment counts derived from the polyA samples and polyA Full reference transcriptome. Pairwise negative binomial significance tests were performed between developmental stages using biological duplicates, and the union of genes that were significant at FDR < 0.01 were classified as differentially expressed.; Single Cell RNA-seq: Paired end 100 reads from single cell cDNA libraries were quality trimmed using Trim Galore with the flags: -q 20 --nextera --length 20. Trimmed reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, augmented with the 92 ERCC Spike-In Control sequences, using TopHat v2.0.10 with the flags: --transcriptome-index=polya_stringent_reference.gtf --prefilter-multihits. The polyA Stringent reference transcriptome, derived from whole tissue RNA-seq as described above, was used as a transcriptome guide. Gene-level fragment counts were quantified using featureCounts v1.4.6 with the flags: -p -B -C. Counts were normalized by transcriptome size factors according to DESeq. 50 additional single cell libraries were also included, which were deposited in SRP041736. |
Platform | GPL16791 |
Public On | Public on Mar 28 2016 |
Gene Symbol | Ensembl ID | FDR |
---|---|---|
BRCC3 | ENSG00000185515 | 6.63060963658729e-10 |
GORASP2 | ENSG00000115806 | 8.73334980808892e-10 |
AK4 | ENSG00000162433 | 2.76682530477686e-09 |
PRDX2 | ENSG00000167815 | 1.0951214581781e-08 |
MAST3 | ENSG00000099308 | 1.77103707895552e-08 |
RCBTB2 | ENSG00000136161 | 3.87890943222414e-08 |
FANCI | ENSG00000140525 | 9.23783846190054e-08 |
WDR54 | ENSG00000005448 | 9.23783846190054e-08 |
ZNF304 | ENSG00000131845 | 1.05797346057124e-07 |
RPL10 | ENSG00000147403 | 1.05797346057124e-07 |