» ZFN target site algorithm for identifying sites compatible with the Lawson-Wolfe modular assembly system
This protocol entails two separate PCR steps: first, to separately amplify each of the 3 individual fingers in the correct backbone finger position for each particular ZFN; second, an overlapping PCR step to place the 3 zinc finger proteins (ZFPs) into the correct single contiguous fragment. This 3- finger ZFP is then cloned in frame with a FokI nuclease variant in a pCS2-backbone plasmid.
» ZFN target site algorithm for identifying sites for selection using the Bacterial one hybrid system
This algorithm will also aid you in the design of libraries for the target sites using a combination of design and selection.
» A bioconductor package with minimalist design for plotting elegant track layers
(Collaborated with Dr. Wang)
Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq.
» A bioconductor package for analysis of high-throughput sequencing data processed by restriction enzyme digestion.
(Collaborated with Dr. Fazio)
The package includes functions to build restriction enzyme cut site (RECS) map, distribute mapped sequences on the map with five different approaches, find enriched/depleted RECSs for a sample, and identify differentially enriched/depleted RECSs between samples.
Chen PB, Zhu LJ, Hainer SJ, McCannell KN, Fazzio TG. Unbiased chromatin accessibility profiling by REDseq uncovers unique features of nucleosome variants in vivo.
BMC Genomics. 2014 Dec 15;15:1104. doi:10.1186/1471-2164-15-1104. PubMed PMID: 25494698; PubMed Central PMCID:PMC4378318.
» A bioconductor package to plot stacked logos for single or multiple DNA, RNA and amino acid sequence
(Collaboration with Dr. Brodsky and Dr. Wolfe)
This bioconductor package is designed for graphical representation of multiple motifs. It draws amino acid sequence as easy as to draw DNA/RNA sequence. It provides the flexibility for users to select the font type and symbol colors. It is part of a pipeline to identify the candidate binding sites for the known transcription factors via sequence matching.
» Identification of Novel alternative PolyAdenylation Sites (PAS)
(Collaborated with Dr. Green)
Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanisms, which occur, in most human genes. InPAS facilitates the discovery of novel APA sites from RNA-seq data. It leverages the cleanUpdTSeq package to fine tune the identified APA sites.
» Build Regulatory Network from ChIP-chip/ChIP-seq and Expression Data
(Collaborated with Dr. Heidi)
GeneNetworkBuilder (GNB) is a web application for the discovery of the transcription regulatory network for a given transcription factor (TF) of Caenorhabditis elegans, Homo sapiens and so on, using ChIP-chip/ChIP-seq combined with gene expression profile from either RNA-seq or expression microarray experiments. An R/Bioconductor package is also available. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TFs.
» A database of Drosophila transcription factor binding specificities
(Collaborated with Dr. Brodsky and Dr. Wolfe)
FlyFactorSurvey is a database of DNA binding specificities for Drosophila transcription factors (TFs) primarily determined using the bacterial one-hybrid system. The database provides community access to over 400 recognition motifs and position weight matrices for over 200 TFs, including many unpublished motifs. Search tools and flat file downloads are provided to retrieve binding site information (as sequences, matrices and sequence logos) for individual TFs, groups of TFs or for all TFs with characterized binding specificities. Linked analysis tools allow users to identify motifs within our database that share similarity to a query matrix or to view the distribution of occurrences of an individual motif throughout the Drosophila genome. Together, this database and its associated tools provide computational and experimental biologists with resources to predict interactions between Drosophila TFs and target cis-regulatory sequences.
Zhu LJ, Christensen RG, Kazemian M, Hull CJ, Enuameh MS, Basciotta MD, Brasefield JA, Zhu C, Asriyan Y, Lapointe DS, Sinha S, Wolfe SA and Brodsky MH. (2010) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res., 39(Database issue): D111-D117.
» A bioconductor package to find and visualize significantly enriched or depleted amino acid motif or amino acid group patterns in proteome dataset
(Collaborated with Dr. Acharya)
The dagLogo package implements iceLogo in R to visualize differential amino acid sequence pattern, and test and visualize significant amino acid group patterns by classifying the amino acids into groups according to charge, chemistry and hydrophobicity, etc.
» A bioconductor package for the design of target-specific guide RNAs in CRISPR-Cas9, genome-editing systems
(Collaborated with Dr. Brodsky)
CRISPR-Cas systems can cleave specific target sequences depending on the sequence of a CRISPR-derived guide RNA (gRNA) and the source of the Cas9 protein. CRISPRseek is a highly flexible, open source software package to identify gRNAs that target a given input sequence while minimizing off-target cleavage at other sites within any selected genome. The package will identify potential gRNAs that target a sequence of interest for CRISPR-Cas9 systems from different bacterial species and generate a cleavage score for potential off-target sequences utilizing published or user-supplied weight matrices with position-specific mismatch penalty scores. Identified gRNAs may be further filtered to only include those that occur in paired orientations for increased specificity and/or those that overlap restriction enzyme sites. For applications where gRNAs are desired to discriminate between two related sequences, CRISPRseek can rank gRNAs based on the difference between predicted cleavage scores in each input sequence. CRISPRseek has the function of the genome-wide search for off-targets, and scores, ranks, fetches flank sequence and indicates whether the target and off-targets are located in exon region. Potential guide RNAs are annotated with total score of the top5 and topN off-targets, detailed topN mismatch sites, restriction enzyme cut sites, and paired guide RNAs. With the package GeneRfold installed, the minimum free energy and bracket notation of secondary structure of gRNA and gRNA backbone constant region will also be included in the summary file. This package leverages Biostrings and BSgenome packages.
Lihua Julie Zhu (2015). Overview of guide RNA design tools for CRISPR-Cas9 genome editing technology. Frontiers in Biology , Volume 10, Issue 4, pp 289-296
» A bioconductor package to classify putative polyA sites as true or false/internally oligodT primed
(Collaborated with Dr. Lawson)
3′ end processing is important for transcription termination, mRNA stability and regulation of gene expression. By analyzing sequence features flanking 3′ ends derived from oligo-dT-based sequencing, we developed a naïve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites. This allows users to separate true, biologically relevant polyA sites from false, oligo-dT primed polyA sites. The algorithm was published as the cleanUpdTSeq package in bioconductor.
Sheppard, S., Lawson ND* and Zhu LJ*. (2013) [* denotes cocorresponding author] Accurate identification of polyadenylation sites from 3' end deep sequencing using a naïve Bayes classifier. Bioinformatics 2013
» A bioconductor package for annotating peaks identified in ChIP-seq, Chip-chip or any high-throughput experiments
(Collaborated with Dr. Lawson and Dr. Green)
A bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. The package also allows users to pass their own annotation data such as different ChIP experiments or datasets from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.
Zhu LJ*, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS and Green MR. (2010) [* denotes corresponding author] ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237.