Latest update: 2022-09-17

This webpage is for showing more results of NeuronMotif.

The code of NeuronMotif is available at github:

If you have any questions about using NeuronMotif or downloading data, please feel free to contact:

Zheng Wei, weiz(at)tsinghua.edu.cn

Department of Automation, Tsinghua University

Cite: Wei, Zheng, et al. “NeuronMotif: Deciphering transcriptional cis-regulatory codes from deep neural networks.” bioRxiv (2021).doi:10.1101/2021.02.10.430606

1 The goal of NeuronMotif

NeuronMotif is an algorithm that can convert model weights of a well-trained Convolutional Neural Network (CNN) to motif grammar including motif dictionary and motif syntax (Figure I). NeuronMotif does not depend on any known positive sequence samples or other prior information. User only need to provide the architecture and the weight of CNN.

NeuronMotif Algorithm

Input: convolutional neural network
Output: cis-regulatory grammar (glossary and syntax)

Figure I. The goal of NeuronMotif

2 Motif grammar examples

In this work, we use two datasets from DeepSEA[1] and Basset[2]. We train five models:

Trained by DeepSEA dataset:

DeepSEA
DD-10

Trained by Basset dataset:

Basset
BD-10

Here, we show some examples of the motifs decoupled from these models. See next section for details.

3 Motif grammar gallery

Download PPMs of CRMs and their visualization result for all convolutional neuron or footprinting result of layer-10 neuron in BD-10 and DD-10 model:

Model	Dataset	PPM	Visualized result	Footprinting result
DeepSEA	DeepSEA	link	link	NA
DD-10	DeepSEA	link	link	link
Basset	Basset	link	link	NA
BD-10	Basset	link	link	link

The files like xxx.tar.gz can be decompressed by tar -xzvf xxx.tar.gz

The subsections in this section show brief information and NeuronMotif visualization result of the 4 models.

3.1 DeepSEA Dataset

This dataset is obtained from the work:

Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.

Input is \(1000\times4\) onehot code of DNA sequence.

Output is the boolen label of the DNA sequence to mark if it overlaps with 919 types of ChIP-seq (TF and HM) and DNase-seq peaks from different cell types.

3.1.1 DeepSEA model

Model structure is

Convolutional layer \(kernel\_320 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_480 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_960 \times size\_8\)
Max-pooling layer \(size\_4\)

input_bp = 1000
conv_kernel_size = 8
pool_kernel_size = 4

maxnrom = MaxNorm(max_value=0.9, axis=0)
l1l2 = l1_l2(l1=0, l2=1e-6)

def crelu(x, alpha=0.0, max_value=None, threshold=1e-6):
    return relu(x, alpha, max_value, threshold)

batch_size=16

seqInput = Input(shape=(input_bp, 4), name='seqInput')

seq = Conv1D(320, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seqInput)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(480, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(960, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dropout(0.5)(seq)
seq = Flatten()(seq)
seq = Dense(925,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dense(919,kernel_regularizer=l1l2, kernel_constraint=maxnrom, activity_regularizer=l1_l2(l1=1e-8,l2=0))(seq)
seq = Activation('sigmoid')(seq)

model = Model(inputs = [seqInput], outputs = [seq])

We applied NeuronMotif to this DCNN.

Layer	Decouples	Neurons	CRMs	Links
1	1	320	up to 1 PWMs/neuron	link
2	1	480	up to 4 PWMs/neuron	link
3	1	960	up to 16 PWMs/neuron	link

3.1.2 DD-10 model

Model structure is

Convolutional layer \(kernel\_64 \times size\_7\)
Convolutional layer \(kernel\_80 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_128 \times size\_3\)
Convolutional layer \(kernel\_160 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_256 \times size\_3\)
Convolutional layer \(kernel\_320 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_512 \times size\_3\)
Convolutional layer \(kernel\_640 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_1024 \times size\_3\)
Convolutional layer \(kernel\_1280 \times size\_3\)

input_bp=1000

seqInput = Input(shape=(input_bp, 4), name='seqInput')


seq = Conv1D(128, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(320, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(640, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(1024, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(1280, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Flatten()(seq)
seq = Dense(925)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(919)(seq)
seq = Activation('sigmoid')(seq)

model = Model(inputs = [seqInput], outputs = [seq])

We applied NeuronMotif to this DCNN.

Layer	Decouples	Neurons	CRMs	Links
1	1	64	up to 1 PWMs/neuron	link
2	1	80	up to 1 PWMs/neuron	link
3	1	128	up to 2 PWMs/neuron	link
4	1	160	up to 2 PWMs/neuron	link
5	1	256	up to 4 PWMs/neuron	link
6	1	320	up to 4 PWMs/neuron	link
7	1	512	up to 8 PWMs/neuron	link
8	1	640	up to 8 PWMs/neuron	link
9	1	1024	up to 16 PWMs/neuron	link
10	2	1280	up to 256 PWMs/neuron	link

3.2 Basset Dataset

This dataset is obtained from the work:

Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.

Input is \(600\times4\) onehot code of DNA sequence.

Output is the boolen label of the DNA sequence to mark if it overlaps with 164 types of DNase-seq peaks from different cell types.

3.2.1 Basset model

Model structure is

Convolutional layer \(kernel\_300 \times size\_19\)
Max-pooling layer \(size\_3\)
Convolutional layer \(kernel\_200 \times size\_11\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_200 \times size\_7\)
Max-pooling layer \(size\_4\)



input_bp = 600

seqInput = Input(shape=(input_bp, 4), name='seqInput')

seq = Conv1D(300, 19)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=3)(seq)
seq = Conv1D(200, 11)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Conv1D(200, 7)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Flatten()(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)


model = Model(inputs = [seqInput], outputs = [seq])

We applied NeuronMotif to this DCNN.

Layer	Decouples	Neurons	CRMs	Links
1	1	300	up to 1 PWMs/neuron	link
2	1	200	up to 3 PWMs/neuron	link
3	1	200	up to 12 PWMs/neuron	link

3.2.2 BD-10 model

Model structure is

Convolutional layer \(kernel\_64 \times size\_7\)
Convolutional layer \(kernel\_64 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_128 \times size\_3\)
Convolutional layer \(kernel\_128 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_256 \times size\_3\)
Convolutional layer \(kernel\_256 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_384 \times size\_3\)
Convolutional layer \(kernel\_384 \times size\_3\)
Max-pooling layer \(size\_2\)
Convolutional layer \(kernel\_512 \times size\_3\)
Convolutional layer \(kernel\_512 \times size\_3\)


input_bp = 600


seqInput = Input(shape=(input_bp, 4), name='seqInput')

seq = Conv1D(64, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(64, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Flatten()(seq)
seq = Dense(1024)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)

model = Model(inputs = [seqInput], outputs = [seq])

We applied NeuronMotif to this DCNN.

Layer	Decouples	Neurons	CRMs	Links
1	1	64	up to 1 PWMs/neuron	link
2	1	64	up to 1 PWMs/neuron	link
3	1	128	up to 2 PWMs/neuron	link
4	1	128	up to 2 PWMs/neuron	link
5	1	256	up to 4 PWMs/neuron	link
6	1	256	up to 4 PWMs/neuron	link
7	1	384	up to 8 PWMs/neuron	link
8	1	384	up to 8 PWMs/neuron	link
9	1	512	up to 16 PWMs/neuron	link
10	2	512	up to 256 PWMs/neuron	link

4 References

[1] Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.

[2] Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.

[3] Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning important features through propagating activation differences.” International Conference on Machine Learning. PMLR, 2017.

[4] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).

NeuronMotif Results

1 The goal of NeuronMotif

2 Motif grammar examples

3 Motif grammar gallery

3.1 DeepSEA Dataset

3.1.1 DeepSEA model

3.1.2 DD-10 model

3.2 Basset Dataset

3.2.1 Basset model

3.2.2 BD-10 model

4 References