Latest update: 2022-09-17
This webpage is for showing more results of NeuronMotif.
The code of NeuronMotif is available at github:
https://github.com/wzthu/NeuronMotif
If you have any questions about using NeuronMotif or downloading data, please feel free to contact:
Zheng Wei, weiz(at)tsinghua.edu.cn
Department of Automation, Tsinghua University
Cite: Wei, Zheng, et al. “NeuronMotif: Deciphering transcriptional cis-regulatory codes from deep neural networks.” bioRxiv (2021).doi:10.1101/2021.02.10.430606
NeuronMotif is an algorithm that can convert model weights of a well-trained Convolutional Neural Network (CNN) to motif grammar including motif dictionary and motif syntax (Figure I). NeuronMotif does not depend on any known positive sequence samples or other prior information. User only need to provide the architecture and the weight of CNN.
NeuronMotif Algorithm
Input: convolutional neural network
Output: cis-regulatory grammar (glossary and syntax)
Figure I. The goal of NeuronMotif
In this work, we use two datasets from DeepSEA[1] and Basset[2]. We train five models:
Trained by DeepSEA dataset:
DeepSEA
DD-10
Trained by Basset dataset:
Basset
BD-10
Here, we show some examples of the motifs decoupled from these models. See next section for details.
Download PPMs of CRMs and their visualization result for all convolutional neuron or footprinting result of layer-10 neuron in BD-10 and DD-10 model:
Model | Dataset | PPM | Visualized result | Footprinting result |
---|---|---|---|---|
DeepSEA | DeepSEA | link | link | NA |
DD-10 | DeepSEA | link | link | link |
Basset | Basset | link | link | NA |
BD-10 | Basset | link | link | link |
The files like xxx.tar.gz can be decompressed by tar -xzvf xxx.tar.gz
The subsections in this section show brief information and NeuronMotif visualization result of the 4 models.
This dataset is obtained from the work:
Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.
Input is \(1000\times4\) onehot code of DNA sequence.
Output is the boolen label of the DNA sequence to mark if it overlaps with 919 types of ChIP-seq (TF and HM) and DNase-seq peaks from different cell types.
Model structure is
Convolutional layer \(kernel\_320 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_480 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_960 \times size\_8\)
Max-pooling layer \(size\_4\)
input_bp = 1000
conv_kernel_size = 8
pool_kernel_size = 4
maxnrom = MaxNorm(max_value=0.9, axis=0)
l1l2 = l1_l2(l1=0, l2=1e-6)
def crelu(x, alpha=0.0, max_value=None, threshold=1e-6):
return relu(x, alpha, max_value, threshold)
batch_size=16
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(320, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seqInput)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(480, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(960, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dropout(0.5)(seq)
seq = Flatten()(seq)
seq = Dense(925,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dense(919,kernel_regularizer=l1l2, kernel_constraint=maxnrom, activity_regularizer=l1_l2(l1=1e-8,l2=0))(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif to this DCNN.
Layer | Decouples | Neurons | CRMs | Links |
---|---|---|---|---|
1 | 1 | 320 | up to 1 PWMs/neuron | link |
2 | 1 | 480 | up to 4 PWMs/neuron | link |
3 | 1 | 960 | up to 16 PWMs/neuron | link |
Model structure is
Convolutional layer \(kernel\_80 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_160 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_320 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_640 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_1280 \times size\_3\)
input_bp=1000
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(128, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(320, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(640, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(1024, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(1280, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Flatten()(seq)
seq = Dense(925)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(919)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif to this DCNN.
Layer | Decouples | Neurons | CRMs | Links |
---|---|---|---|---|
1 | 1 | 64 | up to 1 PWMs/neuron | link |
2 | 1 | 80 | up to 1 PWMs/neuron | link |
3 | 1 | 128 | up to 2 PWMs/neuron | link |
4 | 1 | 160 | up to 2 PWMs/neuron | link |
5 | 1 | 256 | up to 4 PWMs/neuron | link |
6 | 1 | 320 | up to 4 PWMs/neuron | link |
7 | 1 | 512 | up to 8 PWMs/neuron | link |
8 | 1 | 640 | up to 8 PWMs/neuron | link |
9 | 1 | 1024 | up to 16 PWMs/neuron | link |
10 | 2 | 1280 | up to 256 PWMs/neuron | link |
This dataset is obtained from the work:
Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.
Input is \(600\times4\) onehot code of DNA sequence.
Output is the boolen label of the DNA sequence to mark if it overlaps with 164 types of DNase-seq peaks from different cell types.
Model structure is
Convolutional layer \(kernel\_300 \times size\_19\)
Max-pooling layer \(size\_3\)
Convolutional layer \(kernel\_200 \times size\_11\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_200 \times size\_7\)
Max-pooling layer \(size\_4\)
input_bp = 600
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(300, 19)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=3)(seq)
seq = Conv1D(200, 11)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Conv1D(200, 7)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Flatten()(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif to this DCNN.
Layer | Decouples | Neurons | CRMs | Links |
---|---|---|---|---|
1 | 1 | 300 | up to 1 PWMs/neuron | link |
2 | 1 | 200 | up to 3 PWMs/neuron | link |
3 | 1 | 200 | up to 12 PWMs/neuron | link |
Model structure is
Convolutional layer \(kernel\_64 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_128 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_256 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_384 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_512 \times size\_3\)
input_bp = 600
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(64, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(64, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Flatten()(seq)
seq = Dense(1024)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif to this DCNN.
Layer | Decouples | Neurons | CRMs | Links |
---|---|---|---|---|
1 | 1 | 64 | up to 1 PWMs/neuron | link |
2 | 1 | 64 | up to 1 PWMs/neuron | link |
3 | 1 | 128 | up to 2 PWMs/neuron | link |
4 | 1 | 128 | up to 2 PWMs/neuron | link |
5 | 1 | 256 | up to 4 PWMs/neuron | link |
6 | 1 | 256 | up to 4 PWMs/neuron | link |
7 | 1 | 384 | up to 8 PWMs/neuron | link |
8 | 1 | 384 | up to 8 PWMs/neuron | link |
9 | 1 | 512 | up to 16 PWMs/neuron | link |
10 | 2 | 512 | up to 256 PWMs/neuron | link |
[1] Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.
[2] Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.
[3] Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning important features through propagating activation differences.” International Conference on Machine Learning. PMLR, 2017.
[4] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).