{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "54e1ciQlt6qc" }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "id": "9cct1hM0t6qe" }, "source": [ "# Tutorial 2: How to use the Pytorch to train the neural networks for classification?\n", "\n", "## What is the classfication problem?\n", "\n", "A classification problem involves predicting whether something is one thing or another. For example,\n", "\n", "| Problem type | What is it? | Example |\n", "| -------------------------- | ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |\n", "| Binary classification | Target can be one of two options, e.g. yes or no | Predict whether or not someone has heart disease based on their health parameters. |\n", "| Multi-class classification | Target can be one of more than two options | Decide whether a photo of is of food, a person or a dog. |\n", "| Multi-label classification | Target can be assigned more than one option | Predict what categories should be assigned to a Wikipedia article (e.g. mathematics, science & philosohpy). |\n", "\n", "\n", "In this tutorial, we're going to work through the multi-class classification classification problem with PyTorch.\n", "\n", "## What is the data format commonly used for classification?\n", "\n", "Generally, there are **image**, **text**, **audio** or **video** data.\n", "\n", "We can use standard python packages that load data into a **``numpy array``**, and then convert this array into a **``torch.Tensor``**.\n", "\n", "- For images, packages such as Pillow, OpenCV\n", "- For audio, packages such as scipy and librosa\n", "- For text, either raw Python or Cython based loading, or NLTK and\n", " SpaCy\n", "\n", "Specifically for vision, we have created a package called\n", "**``torchvision``**, that has data loaders for common datasets such as\n", "ImageNet, CIFAR10, MNIST, etc. and data transformers for images, such as,\n", "**``torchvision.datasets``** and **``torch.utils.data.DataLoader``**.\n", "\n", "\n", "In this tutorial, we will use the **CIFAR10** dataset.\n", "It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,\n", "‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of\n", "size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.\n", "\n", "\n", "\n", "\n", "## The workflow of training an image classifier using pytorch\n", "\n", "\n", "1. Load and normalize the CIFAR10 training and test datasets using\n", " ``torchvision``\n", "2. Define a Convolutional Neural Network\n", "3. Define a loss function and optimizer\n", "4. Train the network on the training data\n", "5. Test the network on the test data\n" ] }, { "cell_type": "markdown", "metadata": { "id": "bykSkJRNt6qg" }, "source": [ "### Import the related packages" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "HxF69pNrt6qg" }, "outputs": [], "source": [ "import torch\n", "import torchvision\n", "import torchvision.transforms as transforms" ] }, { "cell_type": "markdown", "metadata": { "id": "b2RLirzAt6qg" }, "source": [ "### 1. Load and normalize CIFAR10\n", "\n", "torchvision.datasets.CIFAR10 [Cource Code](https://pytorch.org/vision/stable/_modules/torchvision/datasets/cifar.html#CIFAR10)\n", "\n", "\n", "The output of torchvision datasets are PILImage images of range [0, 1].\n", "We transform them to Tensors of normalized range [-1, 1].\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "La2lMbkyt6qg" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Files already downloaded and verified\n", "Files already downloaded and verified\n" ] } ], "source": [ "transform = transforms.Compose(\n", " [transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n", "\n", "batch_size = 4\n", "\n", "trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)\n", "\n", "trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)\n", "\n", "testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)\n", "\n", "testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)\n", "\n", "classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')" ] }, { "cell_type": "markdown", "metadata": { "id": "gf_PhJcEt6qg" }, "source": [ "Show some of the training images for verification!\n", "\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "77mukym1t6qh" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "frog horse plane cat \n" ] } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "# functions to show an image\n", "\n", "\n", "def imshow(img):\n", " img = img / 2 + 0.5 # unnormalize\n", " npimg = img.numpy()\n", " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n", " plt.show()\n", "\n", "\n", "# get some random training images\n", "dataiter = iter(trainloader)\n", "images, labels = next(dataiter)\n", "\n", "# show images\n", "imshow(torchvision.utils.make_grid(images))\n", "# print labels\n", "print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))" ] }, { "cell_type": "markdown", "metadata": { "id": "OK5FQ3ent6qh" }, "source": [ "### 2. Define a Convolutional Neural Network\n", "The neural network should satisfy the inputs with 3-dimension.\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "kzweh3t9t6qh" }, "outputs": [], "source": [ "import torch.nn as nn\n", "import torch.nn.functional as F\n", "\n", "\n", "class Net(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = nn.Conv2d(3, 6, 5)\n", " self.pool = nn.MaxPool2d(2, 2)\n", " self.conv2 = nn.Conv2d(6, 16, 5)\n", " self.fc1 = nn.Linear(16 * 5 * 5, 120)\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " # import pdb\n", " # pdb.set_trace()\n", " x = self.pool(F.relu(self.conv1(x)))\n", " x = self.pool(F.relu(self.conv2(x)))\n", " x = torch.flatten(x, 1) # flatten all dimensions except batch\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", "\n", "\n", "net = Net()" ] }, { "cell_type": "markdown", "metadata": { "id": "tsvrK52pt6qh" }, "source": [ "### 3. Define a Loss function and optimizer\n", "\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "v8DzbDyct6qh" }, "outputs": [], "source": [ "import torch.optim as optim\n", "\n", "\n", "# SGD with momentum.\n", "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)\n", "\n", "# You can also use the Adam optimizer.\n", "# optimizer = optim.Adam([var1, var2], lr=0.001)\n", "\n", "\n", "# 定义学习率调度器\n", "# scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)\n", "\n", "# 更新学习率\n", "# scheduler.step()" ] }, { "cell_type": "markdown", "metadata": { "id": "NZcn4w6ut6qh" }, "source": [ "### 4. Train the network\n", "\n", "Here, we have to loop over our data iterator, and feed the inputs to the\n", "network and optimize.\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "background_save": true }, "id": "HRHToWRft6qi", "outputId": "6d65a525-4052-4d4a-b657-db3f53304b77" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 1] loss: 2.286\n", "[1, 2001] loss: 2.091\n", "[1, 4001] loss: 1.747\n", "[1, 6001] loss: 1.431\n", "[1, 8001] loss: 1.637\n", "[1, 10001] loss: 1.620\n", "[1, 12001] loss: 1.018\n", "[2, 1] loss: 1.035\n", "[2, 2001] loss: 1.051\n", "[2, 4001] loss: 1.663\n", "[2, 6001] loss: 1.609\n", "[2, 8001] loss: 0.938\n", "[2, 10001] loss: 1.332\n", "[2, 12001] loss: 0.877\n", "Finished Training\n" ] } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Classification Cross-Entropy loss \n", "criterion = nn.CrossEntropyLoss()\n", "\n", "losses = []\n", "\n", "for epoch in range(2): # loop over the dataset multiple times\n", "\n", " for i, data in enumerate(trainloader, 0):\n", " # get the inputs; data is a list of [inputs, labels]\n", " inputs, labels = data\n", "\n", " # zero the parameter gradients\n", " optimizer.zero_grad()\n", "\n", " # forward + backward + optimize\n", " outputs = net(inputs)\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " # print statistics\n", " if i % 2000 == 0: # print every 2000 mini-batches\n", " print(f'[{epoch + 1}, {i + 1:5d}] loss: {loss:.3f}')\n", " losses.append(loss.item())\n", "\n", "print('Finished Training')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10, 6))\n", "plt.plot(losses, label='Training Loss')\n", "plt.xlabel('Iteration (every 2000 mini-batches)')\n", "plt.ylabel('Loss')\n", "plt.title('Training Loss Over Iterations')\n", "plt.legend()\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "1BlocxS1t6qi" }, "source": [ "Let's quickly save our trained model:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MWtwOiEYt6qi" }, "outputs": [], "source": [ "PATH = './cifar_net.pth'\n", "print(net.state_dict())\n", "torch.save(net.state_dict(), PATH)" ] }, { "cell_type": "markdown", "metadata": { "id": "fqSo4Y-lt6qi" }, "source": [ "### 5. Test the network on the test data\n", "\n", "**We have trained the network for 2 passes over the training dataset.\n", "But we need to check if the network has learnt anything at all.**\n", "\n", "Generally, we will use the val/test datasets to evaluation the trained model.\n", "\n", "Again, show some of the testing images!\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "g7_gI2zLt6qi" }, "outputs": [], "source": [ "dataiter = iter(testloader)\n", "images, labels = next(dataiter)\n", "\n", "# print images\n", "imshow(torchvision.utils.make_grid(images))\n", "print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))" ] }, { "cell_type": "markdown", "metadata": { "id": "h1GpdiECt6qi" }, "source": [ "Load our saved model\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NAFbPaQ8t6qi" }, "outputs": [], "source": [ "net = Net()\n", "net.load_state_dict(torch.load(PATH))" ] }, { "cell_type": "markdown", "metadata": { "id": "n6PxK6JRt6qi" }, "source": [ "Let us see what the neural network thinks these examples above are:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oYgBTMpPt6qj" }, "outputs": [], "source": [ "outputs = net(images)\n", "# print(outputs)" ] }, { "cell_type": "markdown", "metadata": { "id": "yDbZxu-ut6qj" }, "source": [ "The outputs are energies for the 10 classes.\n", "The higher the energy for a class, the more the network\n", "thinks that the image is of the particular class.\n", "So, let's get the index of the highest energy:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gK7oJVy4t6qj" }, "outputs": [], "source": [ "_, predicted = torch.max(outputs, 1)\n", "\n", "print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'\n", " for j in range(4)))" ] }, { "cell_type": "markdown", "metadata": { "id": "CqF9F4M0t6qj" }, "source": [ "How the network performs on the whole testing dataset?\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ckGFasHit6qj" }, "outputs": [], "source": [ "correct = 0\n", "total = 0\n", "# since we're not training, we don't need to calculate the gradients for our outputs\n", "with torch.no_grad():\n", " for data in testloader:\n", " images, labels = data\n", " # calculate outputs by running images through the network\n", " outputs = net(images)\n", " # the class with the highest energy is what we choose as prediction\n", " _, predicted = torch.max(outputs.data, 1)\n", " total += labels.size(0)\n", " correct += (predicted == labels).sum().item()\n", "\n", "print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')" ] }, { "cell_type": "markdown", "metadata": { "id": "S31zUKPKt6qj" }, "source": [ "How the network performs on the whole testing dataset to each class?\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BMnamSCZt6qj" }, "outputs": [], "source": [ "# prepare to count predictions for each class\n", "correct_pred = {classname: 0 for classname in classes}\n", "total_pred = {classname: 0 for classname in classes}\n", "\n", "# again no gradients needed\n", "with torch.no_grad():\n", " for data in testloader:\n", " images, labels = data\n", " outputs = net(images)\n", " _, predictions = torch.max(outputs, 1)\n", " # collect the correct predictions for each class\n", " for label, prediction in zip(labels, predictions):\n", " if label == prediction:\n", " correct_pred[classes[label]] += 1\n", " total_pred[classes[label]] += 1\n", "\n", "\n", "# print accuracy for each class\n", "for classname, correct_count in correct_pred.items():\n", " accuracy = 100 * float(correct_count) / total_pred[classname]\n", " print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')" ] }, { "cell_type": "markdown", "metadata": { "id": "v9azFLBVt6qj" }, "source": [ "\n", "\n", "## Training on GPU\n", "How to transfer a Tensor and the neural\n", "net onto the GPU.\n", "\n", "Let's first define our device as the first visible cuda device if we have\n", "CUDA available:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hgU0tAYMt6qj" }, "outputs": [], "source": [ "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n", "\n", "# Assuming that we are on a CUDA machine, this should print a CUDA device:\n", "\n", "print(device)" ] }, { "cell_type": "markdown", "metadata": { "id": "Fq9YK2-qt6qj" }, "source": [ "The rest of this section assumes that ``device`` is a CUDA device.\n", "\n", "Then these methods will recursively go over all modules and convert their\n", "parameters and buffers to CUDA tensors:\n", "\n", " net.to(device)\n", "\n", "\n", "Remember that you will have to send the inputs and targets at every step\n", "to the GPU too:\n", "\n", " inputs, labels = data[0].to(device), data[1].to(device)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ONQyo0pNt6qk" }, "outputs": [], "source": [ "del dataiter" ] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "dp3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }