Introduction to Neural Networks & Perceptrons
Neural Networks are the foundation of modern AI and Deep Learning. They mimic the way the human brain processes information. Let’s break it down step by step.
1. What is Neural Networks
A Neural Network is a computational model made of neurons (nodes) arranged in layers that learn patterns from data. It consists of.
- Input Layer: Takes in the raw data (features).
- Hidden Layers: Processes the data using weights & activation functions.
- Output Layer: Produces the final prediction or classification.
Think of it like a brain, where neurons pass signals and adjust connections based on experience.
2. What is a Perceptron? (Basic Building Block of Neural Networks)
A Perceptron is the simplest form of a neural network. It represents a single neuron that makes a binary decision (e.g., spam or not spam).
Perceptron Structure:
y = f(WX + b)
Where:
- X = Input features
- W = Weights (learnable parameters)
- b = Bias (adjustment for better learning)
- f() = Activation Function (e.g., step function, sigmoid)
- y = Output (prediction)
Perceptron Learning Rule:
- Initialize random weights.
- Compute the output using the weighted sum.
- Apply the activation function (step function for binary classification).
- Update weights using errors from the prediction.
- Repeat until convergence.
3. Types of Neural Networks
Single-Layer Perceptron:
- Only 1 layer (input → output).
- Used for linear problems (e.g., AND/OR gates).
Multi-Layer Perceptron (MLP):
- Has hidden layers between input and output.
- Can solve non-linear problems.
Deep Neural Networks (DNNs):
- Many hidden layers.
- Used in Deep Learning for image recognition, NLP, etc.
4. Activation Functions (The Brain of a Neuron)
Activation functions introduce non-linearity, helping the network learn complex patterns. Some common ones:
Activation Function | Formula | Use Case |
---|---|---|
Step Function | 0 or 1 | Simple binary classification |
Sigmoid (logistic) | 1 / (1 + e-x) | Probabilities (e.g., email spam detection) |
ReLU | max(0, x) | Deep learning (avoids vanishing gradient) |
Tanh | (ex – e-x) / (ex + e-x) | Values between -1 and 1 |
5. Hands-on: Implementing a Perceptron in Python
Let’s create a simple Perceptron model using NumPy.
import numpy as np
# Step Activation Function
def step_function(x):
return 1 if x >= 0 else 0
class Perceptron:
def __init__(self, learning_rate=0.1, epochs=10):
self.lr = learning_rate
self.epochs = epochs
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.epochs):
for idx, x_i in enumerate(X):
linear_output = np.dot(x_i, self.weights) + self.bias
y_pred = step_function(linear_output)
# Update rule: w = w + lr * (y – y_pred) * x
update = self.lr * (y[idx] – y_pred)
self.weights += update * x_i
self.bias += update
def predict(self, X):
linear_output = np.dot(X, self.weights) + self.bias
return np.array([step_function(x) for x in linear_output])
# Training Data (AND Logic Gate)
X_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_train = np.array([0, 0, 0, 1]) # AND gate output
# Train the Perceptron
perceptron = Perceptron()
perceptron.fit(X_train, y_train)
# Make predictions
predictions = perceptron.predict(X_train)
print(“Predictions:”, predictions) # Expected: [0, 0, 0, 1]
Output: The perceptron correctly classifies the AND gate logic.
Next Steps:
- Want to implement a Multi-Layer Perceptron (MLP)?
- Need help understanding Backpropagation?
Activation Functions: ReLU, Sigmoid, Softmax
Activation functions are essential in neural networks because they introduce non-linearity and help the model learn complex patterns. Let’s explore ReLU, Sigmoid, and Softmax—three of the most widely used activation functions.
1. Rectified Linear Unit (ReLU)
Formula:
f(x) = max (0 , x)
How it works:
- if x > , output is x.
- if x ≤ 0, output is 0.
Pros:
- Avoids the vanishing gradient problem (compared to Sigmoid/Tanh).
- Computationally efficient (faster training).
- Works well in deep networks.
Cons:
- Dying ReLU problem: Neurons can become inactive if they always output zero.
- Can be unstable when inputs are negative.
Python Implementation:
import numpy as np
import matplotlib.pyplot as plt
# ReLU function
def relu(x):
return np.maximum(0, x)
# Generate inputs
x = np.linspace(-10, 10, 100)
y = relu(x)
# Plot
plt.plot(x, y, label=”ReLU”)
plt.title(“ReLU Activation Function”)
plt.xlabel(“Input”)
plt.ylabel(“Output”)
plt.legend()
plt.grid()
plt.show()
2. Sigmoid (Logistic) Function
Formula:
f ( x ) = 1/1+e−x
How it works:
- Squashes input between 0 and 1 (good for probabilities).
- if x is large → output is close to 1.
- if x is small → output is close to 0.
Pros:
- Useful for binary classification (e.g., spam detection).
- Smooth and differentiable.
Cons:
- Vanishing gradient problem: Small gradients slow down learning.
- Outputs are not centered at zero, which slows training.
Python Implementation:
# Sigmoid function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Generate inputs
y = sigmoid(x)
# Plot
plt.plot(x, y, label=”Sigmoid”, color=”red”)
plt.title(“Sigmoid Activation Function”)
plt.xlabel(“Input”)
plt.ylabel(“Output”)
plt.legend()
plt.grid()
plt.show()
3. Softmax (Multi-Class Classification)
Formula:
f(xi) = exi/∑exi
How it works:
- Converts raw scores into probabilities that sum to 1.
- The higher the value of xi the higher its probability.
- Used in multi-class classification (e.g., image classification).
Pros:
- Outputs are interpretable as probabilities.
- Ensures all values sum to 1 (useful for classification).
Cons:
- Can be sensitive to outliers.
- Computationally expensive for large datasets.
Python Implementation:
# Softmax function
def softmax(x):
exp_x = np.exp(x – np.max(x)) # Subtract max to prevent overflow
return exp_x / np.sum(exp_x)
# Example inputs
x_vals = np.array([2.0, 1.0, 0.1])
y = softmax(x_vals)
# Print softmax probabilities
print(“Softmax Probabilities:”, y)
Output Example:
Softmax Probabilities: [0.65900114 0.24243297 0.09856589]
The highest value gets the highest probability.
4. Activation Functions:
Activation Function | Formula | Best For | Pros | Cons |
---|---|---|---|---|
ReLU | max(0, x) | Deep Learning | Fast, avoids vanishing gradients | Dying ReLU problem |
Sigmoid | 1 / (1 + e-x) | Binary Classification | Good for probabilities | Vanishing gradient, slow training |
Softmax | exi / ∑exj | Multi-Class Classification | Converts outputs into probabilities | Sensitive to outliers |
Next Steps:
- Want to implement activation functions in a Neural Network?
- Need help choosing the best activation function for your task?
Backpropagation & Gradient Descent
Backpropagation and Gradient Descent are the core techniques that power modern Neural Networks. Let’s break them down step by step.
1. What is Backpropagation?
Backpropagation (short for “backward propagation of errors”) is the learning algorithm for training neural networks. It updates the weights by propagating the error backward from the output layer to the input layer.
How It Works:
- Forward Pass → Compute predictions using initial weights.
- Compute Loss → Measure how far predictions are from the actual output.
- Backward Pass → Calculate gradients (how much each weight contributes to the error).
- Update Weights → Adjust weights using Gradient Descent to minimize error.
Goal: Reduce the loss function by adjusting weights iteratively.
2. What is Gradient Descent?
Gradient Descent is the optimization algorithm used to update weights during backpropagation. It finds the minimum of the loss function by moving in the direction of the negative gradient.
Formula:
W = W – a . al/aw
Where:
- W = Weight
- a = Learning Rate (step size)
- al/aw = Gradient of the Loss Function
3. Types of Gradient Descent
Type | Description | Pros | Cons |
---|---|---|---|
Batch Gradient Descent | Uses the entire dataset for one update. | Stable updates. | Slow for large datasets. |
Stochastic Gradient Descent (SGD) | Updates weights after each sample. | Faster updates, avoids local minima. | Noisy, less stable. |
Mini-Batch Gradient Descent | Uses a small batch (e.g., 32 samples) for each update. | Balance between speed & stability. | Requires tuning batch size. |
Goal: Mini-Batch Gradient Descent is most commonly used in deep learning.
4. Hands-on: Implementing Backpropagation & Gradient Descent
Let’s build a simple Neural Network with 1 hidden layer and implement backpropagation manually.
import numpy as np
# Sigmoid Activation Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Derivative of Sigmoid
def sigmoid_derivative(x):
return x * (1 – x)
# Training Data (XOR Logic Gate)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]]) # XOR Output
# Initialize Weights and Biases
np.random.seed(42)
input_neurons, hidden_neurons, output_neurons = 2, 2, 1
W1 = np.random.uniform(-1, 1, (input_neurons, hidden_neurons))
W2 = np.random.uniform(-1, 1, (hidden_neurons, output_neurons))
b1 = np.zeros((1, hidden_neurons))
b2 = np.zeros((1, output_neurons))
# Hyperparameters
learning_rate = 0.1
epochs = 10000
# Training Loop
for epoch in range(epochs):
# Forward Propagation
hidden_input = np.dot(X, W1) + b1
hidden_output = sigmoid(hidden_input)
final_input = np.dot(hidden_output, W2) + b2
final_output = sigmoid(final_input)
# Compute Error
error = y – final_output
# Backpropagation
d_output = error * sigmoid_derivative(final_output)
d_hidden = d_output.dot(W2.T) * sigmoid_derivative(hidden_output)
# Update Weights and Biases
W2 += hidden_output.T.dot(d_output) * learning_rate
W1 += X.T.dot(d_hidden) * learning_rate
b2 += np.sum(d_output, axis=0, keepdims=True) * learning_rate
b1 += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate
# Print Loss Every 1000 Epochs
if epoch % 1000 == 0:
loss = np.mean(np.square(error))
print(f”Epoch {epoch}: Loss = {loss:.5f}”)
# Final Predictions
print(“\nFinal Predictions:”)
print(final_output.round())
5. Key Takeaways:
- Backpropagation efficiently updates weights in neural networks.
- Gradient Descent optimizes weights by minimizing the loss function.
- Different types of Gradient Descent (Batch, Mini-Batch, SGD) have trade-offs.
- Backpropagation + Gradient Descent = How Deep Learning Works.
Next Steps:
- Want to see how optimization algorithms like Adam & RMSprop improve gradient descent?
- Need help implementing Neural Networks with TensorFlow/PyTorch?
Convolutional Neural Networks (CNNs)
CNNs are a powerful class of neural networks specifically designed for image processing, object detection, and computer vision tasks. They automatically learn spatial features from images, making them the backbone of modern AI applications like self-driving cars, facial recognition, and medical imaging.
1. Why Use CNNs Instead of Regular Neural Networks
A fully connected neural network (MLP) treats images as flat arrays (1D vectors), which:
- Ignores spatial structure (e.g., neighboring pixels matter in images).
- Requires too many parameters (impractical for large images like 1080p).
- CNNs solve this by preserving spatial relationships and reducing parameters.
2. CNN Architecture – The Building Blocks
A CNN consists of three key layers:
- Convolutional Layer – Extracts features (edges, textures, shapes).
- Pooling Layer – Reduces spatial dimensions while keeping important information.
- Fully Connected (FC) Layer – Converts features into final predictions.
Typical CNN Structure:
[INPUT IMAGE] → [CONV] → [RELU] → [POOL] → [CONV] → [RELU] → [POOL] → [FC] → [OUTPUT]
3. CNN Layers Explained
1. Convolutional Layer (Feature Extraction)
- Uses a filter (kernel) to slide over the image and extract features.
- Detects edges, textures, patterns, etc.
- Outputs a feature map (transformed representation of the image).
Mathematical Formula:
Y (i, j) = ∑ W . X + b
Where:
- X = Input matrix (image).
- W = Kernel (filter).
- b = Bias term.
Example: Edge detection using a simple filter
import numpy as np
from scipy.signal import convolve2d
# Define a simple 3×3 edge detection filter
filter = np.array([
[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1]
])
# Example 5×5 grayscale image
image = np.array([
[10, 10, 10, 10, 10],
[10, 50, 50, 50, 10],
[10, 50, 255, 50, 10],
[10, 50, 50, 50, 10],
[10, 10, 10, 10, 10]
])
# Apply convolution
feature_map = convolve2d(image, filter, mode=”same”)
print(“Feature Map:\n”, feature_map)
2. Activation Function (ReLU for Non-Linearity)
After convolution, we apply ReLU (Rectified Linear Unit) to introduce non-linearity.
f ( x ) = max( 0,x )
- Helps detect complex patterns in images.
- Avoids vanishing gradients.
3. Pooling Layer (Downsampling for Efficiency)
Pooling reduces spatial dimensions while keeping important information.
Common Pooling Types:
Pooling Type | How It Works |
---|---|
Max Pooling | Takes the largest value in a region. |
Average Pooling | Takes the average of values in a region. |
Example: 2×2 Max Pooling on a 4×4 feature map
Before Pooling:
1 3 2 1
5 6 8 2
3 1 4 9
7 2 6 5
After 2×2 Max Pooling:
6 8
7 9
Reduces computation and prevents overfitting.
4. Fully Connected (FC) Layer (Final Predictions)
- Flattens the feature maps into a 1D vector.
- Passes the vector into a fully connected layer for classification.
- Uses Softmax (multi-class) or Sigmoid (binary) for final output.
5. Output Layer
Produces class probabilities for classification tasks.
4. CNN Example Architecture
A typical CNN for image classification:
INPUT → CONV → ReLU → POOL → CONV → ReLU → POOL → FC → OUTPUT
Example: Recognizing handwritten digits (MNIST dataset).
5. Hands-on: Implementing a CNN in Python (Keras)
Let’s build a CNN to classify digits (0-9) from the MNIST dataset.
import tensorflow as tf
from tensorflow.keras import layers, models
# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize pixel values to [0,1]
X_train, X_test = X_train / 255.0, X_test / 255.0
# Reshape data for CNN (adding channel dimension)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
# CNN Model
model = models.Sequential([
layers.Conv2D(32, (3,3), activation=’relu’, input_shape=(28,28,1)), # Conv Layer
layers.MaxPooling2D((2,2)), # Pooling Layer
layers.Conv2D(64, (3,3), activation=’relu’),
layers.MaxPooling2D((2,2)),
layers.Flatten(), # Flatten feature maps
layers.Dense(128, activation=’relu’), # Fully connected layer
layers.Dense(10, activation=’softmax’) # Output layer (10 classes)
])
# Compile the model
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
# Train the model
model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f”Test Accuracy: {test_acc:.4f}”)
6: Key Takeaways
- CNNs are best for image-related tasks (classification, object detection, etc.).
- Convolution & Pooling layers extract important features.
- ReLU activation introduces non-linearity for better learning.
- Fully Connected layers classify extracted features.
- CNNs power AI applications like facial recognition, medical imaging, and self-driving cars.
Next Steps:
- Want to apply CNNs to real-world images (CIFAR-10, ImageNet)?
- Interested in Transfer Learning (using pre-trained models like ResNet, VGG, MobileNet)?
Recurrent Neural Networks (RNNs) & Long Short-Term Memory (LSTMs)
Recurrent Neural Networks (RNNs) are specialized neural networks designed to process sequential data (e.g., time series, speech, text, videos). They remember past inputs using a feedback loop, making them powerful for handling context and dependencies in data.
1. Why RNNs
Traditional neural networks (like CNNs or MLPs) struggle with sequential data because.
- They assume all inputs are independent, ignoring previous data points.
- They can’t capture relationships in time-dependent data (e.g., previous words in a sentence).
RNNs solve this by remembering past inputs and using them for future predictions.
2. How RNNs Work
Unlike standard neural networks, an RNN has a loop that allows information to be passed from one step to the next.
RNN Structure:
- Takes an input sequence (e.g., words in a sentence).
- Maintains a hidden state that remembers previous inputs.
- Passes the hidden state to the next time step.
- Uses the final hidden state for prediction.
RNN Formula:
ht = f (W xXt + Whht-1 + b)
Where:
- Xt = Input at time step t
- ht = Hidden state at time step t
- Wx , Wh = Weight matrices
- b = Bias
- f = Activation function (usually tanh or ReLU)
3. The Problem with RNNs: Vanishing Gradient
RNNs have a major issue: when processing long sequences, they struggle to remember early inputs due to the vanishing gradient problem. This makes it hard for RNNs to learn long-term dependencies.
Solution: Use LSTMs (Long Short-Term Memory) or GRUs (Gated Recurrent Units).
4. Long Short-Term Memory (LSTMs)
LSTMs are an improved version of RNNs designed to remember long-term dependencies using gates that control the flow of information.
LSTM Components:
- Forget Gate → Decides what information to discard.
- Input Gate → Decides what new information to store.
- Cell State → Maintains long-term memory.
- Output Gate → Decides the final output.
LSTM Formula:
- Forget Gate: ft = σ (Wf [ht – 1, Xt] + bf)
- Input Gate: it = σ(Wi[ht – 1, Xt] + bi)
- Cell State Update: Ct = FtCt – 1 + itC~t
- Output Gate: ot = σ(Wo[ht – 1, Xt] + bo)
- Hidden State Update: ht = ot tanh(Ct)
LSTMs handle long-term dependencies much better than vanilla RNNs.
5. Hands-on: Implementing an RNN & LSTM in Python (Keras)
Task: Sentiment Analysis on IMDB Movie Reviews
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, Dense
# Load IMDB dataset
from tensorflow.keras.datasets import imdb
max_features = 10000 # Vocabulary size
maxlen = 200 # Cut sequences after 200 words
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
# Pad sequences to ensure equal input length
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
# Build an RNN Model
rnn_model = Sequential([
Embedding(max_features, 128, input_length=maxlen), # Word Embeddings
SimpleRNN(64, activation=’relu’), # Simple RNN Layer
Dense(1, activation=’sigmoid’) # Binary Classification Output
])
# Compile & Train
rnn_model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
rnn_model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))
Now, Let’s Use an LSTM Model Instead!
# Build an LSTM Model
lstm_model = Sequential([
Embedding(max_features, 128, input_length=maxlen), # Word Embeddings
LSTM(64, return_sequences=False), # LSTM Layer
Dense(1, activation=’sigmoid’) # Binary Classification Output
])
# Compile & Train
lstm_model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
lstm_model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))
LSTMs generally outperform RNNs on tasks that require long-term memory.
6. Key Takeaways:
- RNNs are great for sequential data but struggle with long sequences.
- LSTMs solve the vanishing gradient problem and handle long-term dependencies.
- Gated Recurrent Units (GRUs) are a simpler alternative to LSTMs.
- Used in speech recognition, chatbots, stock prediction, text generation and more.
Next Steps:
- Want to explore GRUs (Gated Recurrent Units)?
- Need help with Text Generation using LSTMs (e.g., writing like Shakespeare)?
- Interested in Time-Series Forecasting with RNNs?
Hands-on Practice:
1- Build a Neural Network using TensorFlow/Keras
In this exercise, we’ll build a basic neural network using TensorFlow/Keras to classify handwritten digits from the MNIST dataset.
Steps to Build a Neural Network
- Load and preprocess the dataset
- Build a neural network using Keras Sequential API
- Train the model
- Evaluate and make predictions
Implementing a Neural Network in Python (Keras)
lets get started:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
# Load MNIST dataset (handwritten digits 0-9)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Normalize pixel values to range [0,1]
X_train, X_test = X_train / 255.0, X_test / 255.0
# Visualize a sample image
plt.imshow(X_train[0], cmap=”gray”)
plt.title(f”Label: {y_train[0]}”)
plt.show()
# Define a simple feedforward neural network
model = Sequential([
Flatten(input_shape=(28, 28)), # Flatten 28×28 image into 1D array
Dense(128, activation=’relu’), # Hidden layer with ReLU
Dense(10, activation=’softmax’) # Output layer with Softmax (10 classes)
])
# Compile the model
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
# Train the model
model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))
# Evaluate model on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f”Test Accuracy: {test_acc:.4f}”)
# Make predictions
predictions = model.predict(X_test)
# Show prediction for the first test image
import numpy as np
predicted_label = np.argmax(predictions[0])
print(f”Predicted Label: {predicted_label}, True Label: {y_test[0]}”)
# Visualize the prediction
plt.imshow(X_test[0], cmap=”gray”)
plt.title(f”Predicted: {predicted_label}, True: {y_test[0]}”)
plt.show()
Explanation of the Code
1- Loads and normalizes the MNIST dataset (digits 0-9).
2- Builds a feedforward neural network with.
- Flatten Layer → Converts 2D images into 1D array.
- Dense Layer (128 neurons, ReLU) → Learns patterns.
- Dense Layer (10 neurons, Softmax) → Outputs probabilities for 10 digits.
3- Compiles the model with Adam optimizer and cross-entropy loss.
4- Trains the model for 5 epochs.
5- Evaluates performance and makes predictions.
Key Takeaways:
- Neural networks can classify images effectively.
- ReLU activation helps deep layers learn patterns.
- Softmax activation is used for multi-class classification.
- TensorFlow/Keras makes it easy to build deep learning models.
Next Steps:
- Want to add more hidden layers for better accuracy?
- Interested in CNNs (Convolutional Neural Networks) for image tasks?
- Need help deploying the model into a real-world app?
2- Implement an Image Classifier using CNN
Let’s build a Convolutional Neural Network (CNN) using TensorFlow/Keras to classify images from the CIFAR-10 dataset (which contains 10 classes of images like airplanes, cars, birds, etc.).
Steps to Implement a CNN Classifier
- Load & Preprocess the CIFAR-10 dataset
- Build a CNN model using Keras
- Train & evaluate the model
- Make predictions on new images
Implementing CNN in Python (Keras)
lets get started:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
# Load CIFAR-10 dataset (10 categories of 32×32 color images)
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
# Normalize pixel values to range [0,1]
X_train, X_test = X_train / 255.0, X_test / 255.0
# Define class names for CIFAR-10 dataset
class_names = [‘Airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’,
‘Dog’, ‘Frog’, ‘Horse’, ‘Ship’, ‘Truck’]
# Visualize a sample image
plt.imshow(X_train[0])
plt.title(f”Label: {class_names[y_train[0][0]]}”)
plt.show()
# Build the CNN Model
model = models.Sequential([
layers.Conv2D(32, (3,3), activation=’relu’, input_shape=(32,32,3)), # Conv Layer
layers.MaxPooling2D((2,2)), # Pooling Layer
layers.Conv2D(64, (3,3), activation=’relu’),
layers.MaxPooling2D((2,2)),
layers.Conv2D(128, (3,3), activation=’relu’),
layers.MaxPooling2D((2,2)),
layers.Flatten(), # Flatten feature maps
layers.Dense(128, activation=’relu’), # Fully Connected Layer
layers.Dense(10, activation=’softmax’) # Output layer (10 classes)
])
# Compile the model
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f”Test Accuracy: {test_acc:.4f}”)
# Make predictions
predictions = model.predict(X_test)
# Show prediction for the first test image
import numpy as np
predicted_label = np.argmax(predictions[0])
true_label = y_test[0][0]
print(f”Predicted Label: {class_names[predicted_label]}, True Label: {class_names[true_label]}”)
# Visualize the prediction
plt.imshow(X_test[0])
plt.title(f”Predicted: {class_names[predicted_label]}, True: {class_names[true_label]}”)
plt.show()
Explanation of the Code
1. Loads & normalizes the CIFAR-10 dataset (images are resized to 32×32).
2. Builds a CNN model with.
- 3 Convolutional Layers (with ReLU activation).
- MaxPooling Layers (to reduce dimensionality).
- Fully Connected Dense Layer (to make final predictions).
- Softmax Activation (for multi-class classification).
3. Compiles & trains the model for 10 epochs.
4. Evaluates the model on test images.
5. Makes predictions and visualizes the results.
Key Takeaways:
- CNNs are powerful for image classification.
- Convolutional layers extract important features.
- Pooling layers reduce computational complexity.
- Softmax activation is used for multi-class classification.
- TensorFlow/Keras makes it easy to implement deep learning models.
Next Steps:
- Want to tune hyperparameters for better accuracy?
- Interested in Transfer Learning using pre-trained models like ResNet?
- Need help deploying the model as a web app?
– If you are interested in guest posts write for us in Education and submit an article for publishing at admin@buhave.com