i trying out recent arxiv work called " factorized cnn ", which argues spatially separated convolution (depth-wise convolution), channel-wise linear projection(1x1conv), can speed convolution operation. this figure conv layer architecture i found out can implement architecture tf.nn.depthwise_conv2d , 1x1 convolution, or tf.nn.separable_conv2d. below implementation: #conv filter depthwise convolution depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32))) #conv filter linear channel projection pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64))) conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0)) #depthwise convolution, multiplier 1 conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='same')) #...