{"id":416,"date":"2023-07-11T03:58:03","date_gmt":"2023-07-11T03:58:03","guid":{"rendered":"https:\/\/themenectar.com\/salient\/mag\/?p=416"},"modified":"2026-03-27T09:16:37","modified_gmt":"2026-03-27T09:16:37","slug":"convolutions-to-cnns-deep-learning","status":"publish","type":"post","link":"https:\/\/curriculo.me\/engineering\/convolutions-to-cnns-deep-learning\/","title":{"rendered":"Convolutions to CNNs: From Array Operations to Deep Learning"},"content":{"rendered":"[vc_row type=&#8221;in_container&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; scene_position=&#8221;center&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; row_position_desktop=&#8221;default&#8221; row_position_tablet=&#8221;inherit&#8221; row_position_phone=&#8221;inherit&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; flex_gap_desktop=&#8221;10px&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_backdrop_filter=&#8221;none&#8221; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; column_position=&#8221;default&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; animation_type=&#8221;default&#8221; bg_image_animation=&#8221;none&#8221; border_type=&#8221;simple&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221;][vc_column_text]In these application areas, convolution is often performed as a filter that transforms signals and pixels into more desirable values.<\/p>\n<blockquote><p>In high-performance computing, the convolution pattern is often referred to as stencil computation. Convolution typically involves a significant number of arithmetic operations on each data element. Each output data element can be calculated independently of each other, a desirable trait for parallel computing. On the other hand, there is substantial level of input data sharing among output data elements with somewhat challenging boundary conditions. This makes convolution an important use case of sophisticated tiling methods and input data staging methods.<\/p><\/blockquote>\n<p>Convolution is an array operation where each output data element is a weighted sum of a collection of neighboring input elements. The weights used in the weighted sum calculation are defined by an input mask array, commonly referred to as the convolution kernel. We will refer to these mask arrays as convolution masks. The same convolution mask is typically used for all elements of the array<\/p>\n<p>Below is shown a convolution example for 1D data where a 5-element convolution mask array M is applied to a 7-element input array N.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/dkiuwylqmmslisywtcql.supabase.co\/storage\/v1\/object\/public\/blog-images\/content\/blog-content-1766254210995-q3g5vr2nw7.png\" alt=\"1D Convolution\" \/><\/figure>\n<p>For image processing and computer vision, input data are typically two-dimensional arrays, with pixels in an x-y space. Image convolutions are therefore 2D convolutions. The mask does not have to be a square array.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/dkiuwylqmmslisywtcql.supabase.co\/storage\/v1\/object\/public\/blog-images\/content\/blog-content-1766254556598-oi9jpg7nrb.png\" alt=\"2D Convolution\" \/><\/figure>\n<hr \/>\n<h2>1. From 1D to 2D convolution<\/h2>\n<p>You know 1D convolution:<\/p>\n<ul>\n<li><strong>Input:<\/strong> 1D array (e.g., <code>[x0, x1, x2, ...]<\/code>)<\/li>\n<li><strong>Kernel:<\/strong> small 1D array (e.g., <code>[w0, w1, w2]<\/code>)<\/li>\n<li><strong>Operation:<\/strong> slide the kernel and take weighted sums.<\/li>\n<\/ul>\n<p>For <strong>images<\/strong>, the input is 2D (height \u00d7 width), so:<\/p>\n<ul>\n<li><strong>Input:<\/strong> matrix of pixels, shape <code>H \u00d7 W<\/code><\/li>\n<li><strong>Kernel:<\/strong> small matrix, e.g. <code>3 \u00d7 3<\/code> or <code>5 \u00d7 5<\/code><\/li>\n<li><strong>Operation:<\/strong> place the kernel over each <code>3 \u00d7 3<\/code> patch of the image, multiply element wise, sum \u2192 one output pixel.<\/li>\n<\/ul>\n<p>This is just the same idea you already know, but:<\/p>\n<ul>\n<li>Array \u2192 2D array (matrix).<\/li>\n<li>Neighboring \u201celements\u201d \u2192 neighboring pixels in 2D.<\/li>\n<\/ul>\n<hr \/>\n<h2>2. Multiple channels (e.g., RGB)<\/h2>\n<p>Real images are not just 2D, they often have <strong>channels<\/strong>:<\/p>\n<ul>\n<li>Shape: <code>H \u00d7 W \u00d7 C<\/code> (e.g., <code>C = 3<\/code> for R, G, B).<\/li>\n<li>Then a <strong>kernel<\/strong> also has depth <code>C<\/code>:\n<ul>\n<li>Shape: <code>kH \u00d7 kW \u00d7 C<\/code>.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Convolution step:<\/p>\n<ul>\n<li>For one position, you take a small cube: <code>kH \u00d7 kW \u00d7 C<\/code>.<\/li>\n<li>Multiply elementwise with the kernel <code>kH \u00d7 kW \u00d7 C<\/code>.<\/li>\n<li>Sum all values \u2192 1 number (one output channel value at that location).<\/li>\n<\/ul>\n<p>So conceptually: <strong>3D convolution over space + channels<\/strong>, but still just \u201cweighted sum of neighbors\u201d.<\/p>\n<hr \/>\n<h2>3. Filters (kernels) and feature maps<\/h2>\n<p>In CNNs, you don\u2019t use just one kernel; you use <strong>many<\/strong>:<\/p>\n<ul>\n<li>Suppose you use <code>F<\/code> different kernels.<\/li>\n<li>Each kernel has shape <code>kH \u00d7 kW \u00d7 C_in<\/code>.<\/li>\n<li>Each kernel produces <strong>one output channel<\/strong>, called a <strong>feature map<\/strong>.<\/li>\n<\/ul>\n<p>So if the input shape is:<\/p>\n<ul>\n<li><code>H \u00d7 W \u00d7 C_in<\/code>,<\/li>\n<\/ul>\n<p>and you use <code>F<\/code> kernels, the output shape becomes:<\/p>\n<ul>\n<li><code>H_out \u00d7 W_out \u00d7 F<\/code>.<\/li>\n<\/ul>\n<p>Each of those <code>F<\/code> output channels captures a different kind of pattern:<\/p>\n<ul>\n<li>One kernel may learn to detect <strong>edges<\/strong>, another <strong>corners<\/strong>, another <strong>textures<\/strong>, etc.<\/li>\n<\/ul>\n<p>Think:<br \/>\n<strong>\u201cA CNN layer = apply many different convolutions (filters) over the input.\u201d<\/strong><\/p>\n<hr \/>\n<h2>4. Stride and padding<\/h2>\n<p>Two more array-related details:<\/p>\n<ul>\n<li><strong>Stride:<\/strong> how far you move the kernel each step.\n<ul>\n<li>Stride 1: move 1 pixel at a time \u2192 dense output.<\/li>\n<li>Stride 2: skip every other pixel \u2192 smaller output.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Padding:<\/strong> add zeros around the border of the image.\n<ul>\n<li>Without padding, output gets smaller.<\/li>\n<li>With padding, you can keep the same height\/width.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Mathematically, it\u2019s still just sliding windows and dot products, but with control over:<\/p>\n<ul>\n<li>How far you slide (stride).<\/li>\n<li>Whether you lose border information (padding).<\/li>\n<\/ul>\n<hr \/>\n<h2>5. Nonlinearity: adding activation functions<\/h2>\n<p>So far it\u2019s all <strong>linear<\/strong>: weighted sums. A CNN layer adds <strong>nonlinearity<\/strong>:<\/p>\n<ul>\n<li>After convolution, apply an activation function elementwise<br \/>\n(e.g., <code>ReLU(x) = max(0, x)<\/code>).<\/li>\n<\/ul>\n<p>So a single <strong>conv layer<\/strong> is:<\/p>\n<ol>\n<li>Convolution (linear weighted sums).<\/li>\n<li>Activation (nonlinear).<\/li>\n<\/ol>\n<p>This combination lets CNNs learn complex, non-linear mappings.<\/p>\n<hr \/>\n<h2>6. Stacking layers \u2192 deep feature hierarchies<\/h2>\n<p>A full CNN is just <strong>many layers<\/strong> of:<\/p>\n<ul>\n<li>Convolution \u2192 activation (\u2192 sometimes pooling).<\/li>\n<\/ul>\n<p>Intuition:<\/p>\n<ul>\n<li><strong>Early layers<\/strong>: detect simple things (edges, blobs).<\/li>\n<li><strong>Middle layers<\/strong>: combine them into textures, shapes.<\/li>\n<li><strong>Later layers<\/strong>: combine shapes into object parts and objects.<\/li>\n<\/ul>\n<p>All of this is still built from the same primitive you understand:<br \/>\n<strong>\u201ctake local neighborhoods and compute weighted sums.\u201d<\/strong><\/p>\n<hr \/>\n<h2>7. Connection to fully connected (dense) layers<\/h2>\n<p>A <strong>fully connected layer<\/strong> is also just weighted sums:<\/p>\n<ul>\n<li>Input: 1D vector.<\/li>\n<li>Output unit = sum of (weight \u00d7 input) + bias.<\/li>\n<\/ul>\n<p>Difference:<\/p>\n<ul>\n<li>Dense layer: every output unit uses <strong>all<\/strong> input elements.<\/li>\n<li>Conv layer: each output unit uses only a <strong>local neighborhood<\/strong> (sparse connectivity) and <strong>shares weights<\/strong> across positions.<\/li>\n<\/ul>\n<p>So you can see CNNs as:<\/p>\n<ul>\n<li>A clever way to use your convolution operation to get:\n<ul>\n<li>Locality (nearby pixels matter more).<\/li>\n<li>Weight sharing (same kernel across all positions).<\/li>\n<li>Fewer parameters than a dense layer on full images.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<hr \/>\n<h2>Conclusion: From Simple Operation to Powerful Networks<\/h2>\n<pre><code>\"At each position, take neighbors and compute a weighted sum with a kernel,\"<\/code><\/pre>\n<p>then you\u2019re already standing on the core idea behind CNNs.<\/p>\n<p>CNNs take that simple building block and:<\/p>\n<ul>\n<li>Extend it from 1D arrays to 2D\/3D tensors.<\/li>\n<li>Apply many different kernels in parallel to produce rich feature maps.<\/li>\n<li>Add nonlinearities to escape pure linearity.<\/li>\n<li>Stack many such layers so that each stage \u201csees\u201d more abstract structure.<\/li>\n<\/ul>\n<p>The magic of CNNs isn\u2019t a new kind of math\u2014it\u2019s the composition of many small, local, convolution operations into a deep architecture that can learn powerful visual representations.[\/vc_column_text][\/vc_column][\/vc_row]\n","protected":false},"excerpt":{"rendered":"<p>We will study convolution, which is a popular array operation that is used in various forms in signal processing, digital recording, image processing, video processing, and computer vision.<\/p>\n","protected":false},"author":3,"featured_media":862,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,9],"tags":[22,20,18,21],"class_list":{"0":"post-416","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-software","8":"category-tech","9":"tag-automation","10":"tag-cnn","11":"tag-data-science","12":"tag-deep-learning"},"_links":{"self":[{"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/posts\/416","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/comments?post=416"}],"version-history":[{"count":6,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/posts\/416\/revisions"}],"predecessor-version":[{"id":872,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/posts\/416\/revisions\/872"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/media\/862"}],"wp:attachment":[{"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/media?parent=416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/categories?post=416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/curriculo.me\/engineering\/wp-json\/wp\/v2\/tags?post=416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}