Deep4mC: Computational prediction of 4mC sites in the DNA sequences


  Deep4mC prediction:




Deep4mC implemented six models, including A. thaliana, C. elegans, D. melanogaster, E. coli, G. pickeringii and G. subterraneus. For species with small number of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average Area Under Curve (AUC) values more than 0.9 of multiple cross-validations across different species.




The threshold represents the class probability for input sample to be a real 4mC site and its value ranges from 0 to 1. A score greater than 0.5 indicates a candidate 4mC modification site, whereas a score less than 0.5 suggests that it is not a 4mC modification site. Users can set the threshold to filter results in the web server. When the threshold is set to all, all prediction results will be displayed. When the threshold is set to a value between 0.5 and 0.9, only the results with a score greater than the selected value will be displayed.




There are two options for the Sequence length, including 41-bp and Other-length. For the 41-bp option, the website will return the probability value for the position "21" for all input samples for a quick perdition. Moreover, for the Other-length option, the website will return the probability value of all positions where the nucleotide is the cytosine (C).



  Sample FASTA sequences

>S1
CGCAACCCGATCTTAAAAGCCGTAAGAATTGTATCCTTGTT
>S2
AGGATGTGGCGGGGAATTGCCGTGATCGATGAATGCTACCT
>S3
TTGAATACATCAGTGTAGCGCGCGTGCGGCCCAGAACATCT
>S4
CTTTGAGAAGCAAGAAGAAGCTTCGTTATTTTTTTGGAGTC
>S5
TAGGACGTGGTTTAACTGTTCGAGTTCATATATTTGCAGAC
>S6
TTCATGCATAACTTCTATACCAAAGTTAGCACGGTTAATAA