Neural Networks. Chatbot.

Síguenos / Follow us!


We are going to create a simple chatbot. The first task is to train our bot. We are going to be using some libraries like numpy, nltk (Natural Language Toolkit), which contains many tools for cleaning up the text and preparing it for deep learning algorithms, json, which loads json files directly into Python, and PyTorch.

We will implement a bag_of_words function so we can transform or reduce each training sentence or an input sentence from the user into an array of 0's and 1's against the array of words in the corpus (aka vocabulary). Each position in the list will represent a word from our vocabulary. If the position in the list is a 1 then it means that the word is in our sentence, if it is a 0 then the word is not present in our sentence.

Similarly to a bag_of_words, we will create an output list with the length of the number of tags we have in our dataset. Each position in the list will represent one distinct tag. Then, we will convert our training data, i.e. input and output, to numpy arrays and use them to train our model.

Training Data

Now we are going to see what kind of data we will need to provide to our chatbot with. We will just use data that we write ourselves in a .JSON file that contains the following format:

    "intents": [{
            "tag": "greetings", # The tag of a category.
            "patterns": ["Hi there", "hello", "greetings", "How are you?", "hi", "what's up", "hello"], # Patterns are just examples of what a greeting would look like.
            "responses": ["Hello!", "What can I do for you?"], # The exact responses that the chatbot will respond with.
            "context": [""]
            "tag": "goodbye",
            "patterns": ["bye", "good bye", "see you later"],
            "responses": ["have a nice time, welcome back again", "bye bye"],
            "context": [""]

We are basically creating a bunch of messages (patterns) that the user is likely to type in and mapping them to a group of appropriate responses. The tag on each dictionary in the file indicates the group that each message belongs to.

With this data, we will train our neural network to take a sentence (a bag of words) and classify it as one of the tags in our file. Then, we can take randomly a response from his tag's group and display it to the user.

Natural Language Preprocessing

Next, we are going to create a Natural Language Preprocessing pipeline to clean up and prepare our data.

# This code is very much inspired by Contextual Chatbots with Tensorflow, Implementation of Contextual Chatbot in PyTorch, NLP based Chatbot in PyTorch. The file content ( is as follows:
# NLTK is a leading platform for building Python programs to work with human language data.
import numpy as np
import nltk
import re
import unicodedata

from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
import json

class PreProcessing():
    def __init__(self):
        self.stemmer = PorterStemmer()
        self.all_stopwords = stopwords.words('english')

        with open('/yourPath/intents.json', 'r') as f: # Load our JSON Data.
            intents = json.load(f)

        self.vocabulary = [] # The vocabulary of our training data. 
        self.tags = [] # The list of tags of our training data.
        self.xy = [] # The list of pairs (patterns, tags)
        # It loops through each intent in our training data.
        for intent in intents['intents']:
            tag = intent['tag']
            for pattern in intent['patterns']:
                # Tokenize each word in the pattern.
                t_pattern = PreProcessing.tokenize(pattern)
                # Add to our vocabulary.
                # Add to xy.
                self.xy.append((t_pattern, tag))

        # Stem and remove stop words (they are common words that are not indexed because they are not meaningful).
        self.vocabulary = [
            self.stem(w) for w in self.vocabulary if not w in self.all_stopwords]
        # Remove duplicates and sort from vocabulary and tags.
        self.vocabulary = sorted(set(self.vocabulary))
        self. tags = sorted(set(self.tags))

        # Create training data: bag_of_words, index_tags
        self.X_train = []
        self.Y_train = []
        for (pattern, tag) in self.xy:
            bag = self.bag_of_words(pattern)

learn Python

    def get_tags(self):
        return self.tags

    def get_vocabulary(self):
        return self.vocabulary

    def get_Xtrain(self):
        return self.X_train

    def get_Ytrain(self):
        return self.Y_train

    def get_intents(self):
        return self.intents

    def tokenize(sentence):
        It chops or splits a sentence into an array of words.
        return nltk.word_tokenize(sentence)

    # Turn a Unicode string to plain ASCII, thanks to
    def unicodeToAscii(s):
        return ''.join(
            c for c in unicodedata.normalize('NFD', s)
            if unicodedata.category(c) != 'Mn'

    def normalizeString(word):
        Lowercase, trim, and remove non-letter characters
        # It converts to lower case and removes the leading and trailing characters.
        word = PreProcessing.unicodeToAscii(word.lower().strip())
        # It removes numbers
        word = re.sub(r'\d+', '', word)
        # It removes all punctuaction except spaces
        word = re.sub(r'[^\w\s]', '', word)
        # It removes all spaces
        word = ''.join(word.split())
        return word

    def stem(self, word):
        It finds the root of the "normalized" word, e.g., Playing, Plays, Played... all share the same stem, "play" 
        return self.stemmer.stem(PreProcessing.normalizeString(word))

    def bag_of_words(self, pattern):
        It returns a bag of words list. It will contain a "1" for each word in our vocabulary that is found in the sentence, 0 otherwise
        For example:
        sentence = ["how", "are", "you", "doing"]
        words = ["hi", "how", "bye", "are", "good", "doing", "cool"...]
        bag   = [  0 ,    1 ,    0 ,   1 ,    0 ,   1 ,    0,  ...]
        # stem each word
        sentence_words = [self.stem(word) for word in pattern]
        # initialize bag with 0 for each word in our vocabulary
        bag = np.zeros(len(self.vocabulary), dtype=np.float32)

        for idx, w in enumerate(self.vocabulary):
            if w in sentence_words:
                bag[idx] = 1

        return bag

if __name__ == '__main__':
    p = PreProcessing() # This code is just for debugging purposes.
    print(p.tokenize("How are you doing?"))

Developing a Model

Now that we have preprocessed all of our data, it is time to start creating and training a model.

PyTorch is an open source machine learning framework that accelerates the path from research prototyping to production deployment. To install it, you need to select your preferences (Conda, Pip, LibTorch), OS (Windows, macOS, or Linux), Computer Platform, and run the install command.

A network may have three types of layers: input layers that take raw input from the training data, hidden layers that take input from previous layers and pass output to other layers, and output layers that make a prediction. All hidden layers typically use the same activation function. We will use Rectified Linear Activation (ReLU) and a standard feed-forward neural network.

Feedforward neural networks are artificial neural networks where information only travels forward in the network (no loops), first through the input nodes, then through the hidden nodes (if present), and finally through the output nodes.

from import Dataset # File:

class myNeuralNet(nn.Module):
    """ nn.Module is the base class for all neural network modules. A typical training process starts by defining the neural network. A Neural Network consists of Layers, such as Linear, and activation functions like ReLU.
    def __init__(self, input_size, hidden_size, num_classes):
        super(myNeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size) # nnLinear is a module that creates single layer feed-forward network with n inputs and m output. It applies a linear transformation to the incoming data: y = x*W^T + b. 
        self.l2 = nn.Linear(hidden_size, hidden_size)
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()

Relu is an activation function that is defined as this: relu(x) = { 0 if x<0, x if x > 0}.

After each layer, an activation function needs to be applied. An activation function outputs a small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs are large enough, the activation function "fires", otherwise it does nothing. In other words, an activation function is like a gate that checks that an incoming value is greater than a critical number. They add non-linearities into neural networks.

    def forward(self, x):
        """You just have to define the forward function, and the backward function (where gradients are computed) is 
        automatically defined for us."""
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        #  In the last layer, we don’t need an activation function because later on, in our code, we will use cross-entropy loss and it automatically applies an activation function for us.
        return out

class myChatDataset(Dataset):
    """ Code for processing data samples can get messy and hard to maintain. We want our dataset code to be decoupled from our model training code for better readability and modularity. stores the samples and their corresponding labels. A custom Dataset class must implement three functions: __init__, __len__, and __getitem__.""" 
    def __init__(self, X_train, Y_train): # It is run once when instantiating the Dataset object.
        self.x_data = X_train
        self.y_data = Y_train

    # It loads and returns a sample from the dataset at the given "index".
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # It returns the number of samples in our dataset.
    def __len__(self):
        return len(self.x_data)

Training and saving the model

"The artificial neural network is like a collection of strings that are ‘tuned’ to training data. Imagine the weight of each string (synapse) connecting a series of tuning pegs (neurons) and an iterative process to achieve proper tuning (training data). In each iteration, there is additional fine-tuning (back-propagation) to adjust to the desired pitch. Eventually, the instrument is tuned and when played (used for prediction) it will harmonize properly (have acceptably low error rates)," Chatbotlife, How Neural Networks Work.

import numpy as np # File:
import random
import torch
import torch.nn as nn
from import DataLoader
from myNLP import PreProcessing
from model import myNeuralNet, myChatDataset
import configparser

Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output (*1), calculates the error in its guess (loss, *2), computes the derivatives of the error with respect to its parameters (back propagation *3), parameters (model weights and biases) are adjusted according to the gradient of the loss function, and updates these parameters (*4).

A feedforward neural network can be trained using various methods. The most popular ones are back-propagation and gradient descent. The back-propagation algorithm computes the derivatives of the connections with respect to the loss function in the neural network using the chain rule. The gradient descent algorithm then updates the weights and biases in the neural network based on the derivatives and the learning rate.

def training(criterion, optimizer, Number_Epochs, train_loader, model):
    for epoch in range(Number_Epochs):
        for (words, labels) in train_loader:
            # Get model predictions for the current words (*1).
            predictions = model(words)
            # Compute the loss between actual and predicted values (*2).
            loss = criterion(predictions, labels)
            # We clear calculated gradients because in PyTorch, for every mini-batch during the training phase, we need to explicitly set the gradients to zero before starting to do backpropragation (i.e., updating our weights and biases) because PyTorch accumulates the gradients on subsequent backward passes. 
            # After computing the loss (how far is the output from being correct), we propagate gradients back into the network's parameters (*3).
            # Update parameters, typically weight = weight - learning_rate * gradient (*4).

        if (epoch+1) % 100 == 0:
                f'Epoch [{epoch+1}/{Number_Epochs}], Loss: {loss.item():.4f}')

    print(f'final loss: {loss.item():.4f}')

def main():
    print("Load & Preprocess Data")
    myPreProcessing = PreProcessing()
    X_train = np.array(myPreProcessing.get_Xtrain())
    Y_train = np.array(myPreProcessing.get_Ytrain())

    # Config parameters. We use the ConfigParser class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files.
    config = configparser.ConfigParser()'chat.ini')

Our file chat.ini:
[DEFAULT] Number_Epochs = 1000
Batch_Size = 8
Learning_Rate = 0.001
Hidden_Size = 8
File_Training = data.pth

    Number_Epochs = int(config['DEFAULT']['Number_Epochs']) # We read various parameters such as number of epochs, learning rate, etc.
    Batch_Size = int(config['DEFAULT']['Batch_Size'])
    Learning_Rate = float(config['DEFAULT']['Learning_Rate'])
    Hidden_Size = int(config['DEFAULT']['Hidden_Size'])
    File_Training = config['DEFAULT']['File_Training']

    input_size = len(X_train[0])
    output_size = len(myPreProcessing.get_tags())

    train_loader = DataLoader(dataset=myChatDataset(X_train, Y_train),

DataLoader wraps an iterable around the Dataset to enable easy access to the samples. While training a model, we typically want to pass samples in “minibatches” (batch_size=Batch_Size), reshuffle the data at every epoch to reduce model overfitting (shuffle=True), and use Python’s multiprocessing to speed up the process.

Overfitting occurs when a model fits exactly against its training data. When the model memorizes the noise and fits too closely or exactly to our training set, the model becomes “overfitted,” and it is unable to generalize well to new data.

When training, data is split into small batches, each batch is jargoned as a minibatch. So we use a subset of the training set (we call it a “mini-batch”) at a time in each epoch. It is a trade-off between having fast model updates and accurate model updates.

    model = myNeuralNet(input_size, Hidden_Size, output_size)

    # A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target. The input is expected to contain raw, unnormalized scores for each class (aka logits), so we don't need to convert it into probabilities by a softmax function.
    criterion = nn.CrossEntropyLoss()
    # Create our optimizer. We need to give it an iterable containing the parameters (model.parameters() returns the model's parameters, i.e. weights and biases) and we also specify one optimizer-specific option (the learning rate).
    optimizer = torch.optim.Adam(model.parameters(), lr=Learning_Rate)

    training(criterion, optimizer, Number_Epochs, train_loader, model) //It trains our model.

    data = { //Finally, we save our model.
        "model_state": model.state_dict(), # The parameters (i.e. weights and biases) of a model are contained in the model’s parameters (accessed with model.parameters()). A state_dict is a Python dictionary object that maps each layer to its parameter tensor. 
        "input_size": input_size,
        "output_size": output_size,
        "vocabulary": myPreProcessing.get_vocabulary(),
        "tags": myPreProcessing.get_tags()
    }, File_Training) # Saves our model to a disk file. This function uses Python’s pickle utility for serialization.
    print(f'Training is complete. The file has been saved to {File_Training}')

if __name__ == '__main__':

Using the model. Making predictions.

import random
import torch
from model import myNeuralNet
from myNLP import PreProcessing
import configparser
from myActions import myTime
interacting_user = True # A global variable that indicates that we are still interacting with the user.

def quit():
    global interacting_user
    interacting_user = False
    return "I am sorry to let you go!"

def response(sentence, myPreProcessing, model, tags, intents):
    sentence = myPreProcessing.tokenize(sentence) # We need to tokenized the user's input.
    X = myPreProcessing.bag_of_words(sentence) # Convert it to a bag of words.
    X = X.reshape(1, X.shape[0]) # It returns a tensor with the same data and number of elements as X, with the specified shape (1, X.shape[0]). We don't need an array of zeros and ones, but a matrix with 1 row and X.shape[0] columns.
    X = torch.from_numpy(X) # It creates a tensor from a numpy.ndarray.

    output = model(X) # Get model predictions for the current sentence's bag of words. Pytorch model returns a matrix instead of a column vector! 
    _, predicted = torch.max(output, dim=1) # It returns the maximum value of all elements in the "output" tensor. dim is the dimension to reduce (0, columns; 1, rows). It returns a tuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim and indices ("predicted") is the index location of each maximum value found.

>>> a = torch.randn(4, 4)
>>> a
tensor([[-0.1198, -1.3780, -0.2481, -1.0648],
[ 0.4371, -1.2758, -0.2922, -0.7690],
[ 1.0653, 0.5043, -2.1397, 0.3395],
[-1.0578, 0.9984, -1.2274, 1.7620]])
>>> torch.max(a, 1)
values=tensor([-0.1198, 0.4371, 1.0653, 1.7620]),
indices=tensor([0, 0, 0, 3]))
>>> torch.max(a, 0)
values=tensor([ 1.0653, 0.9984, -0.2481, 1.7620]),
indices=tensor([2, 3, 0, 3]))
>>> a = torch.randn(1, 4)
>>> a
tensor([[ 1.2387, -1.0884, -1.8268, -1.5615]])
>>> torch.max(a, 1)

    tag = tags[predicted.item()] # predicted.item() returns the value of this tensor as a standard Python number, so this is the index that we need to find the "predicted" tag.

    probs = torch.softmax(output, dim=1) # It applies the softmax function to our model prediction to the user's input. The output of the softmax function is a probability distribution. It returns a tensor of the same dimension and shape as the "output", a matrix with 1 row and len(myPreProcessing.get_tags()) columns. 
    prob = probs[0][predicted.item()] # Finally, we get the probability of this predicted tag.
    if prob.item() > 0.75: # If the probability is good enough, we will answer the user with one random response of the predicted tab.
        for intent in intents['intents']:
            if tag == intent["tag"]:
                print(f" >: {random.choice(intent['responses'])}")
        print(f" >: I do not understand...")

def main():
    actions = { # Some input (or keywords) from the user will trigger some methods.
        "quit": quit,
        "bitcoin": bitcoin,
        "time": myTime
    config = configparser.ConfigParser()'chat.ini')
    data = torch.load(config['DEFAULT']['File_Training'])
    Hidden_Size = int(config['DEFAULT']['Hidden_Size'])
    input_size = data["input_size"]
    output_size = data["output_size"]
    vocabulary = data['vocabulary']
    tags = data['tags']
    model_state = data["model_state"]

    myPreProcessing = PreProcessing()
    intents = myPreProcessing.get_intents()
    model = myNeuralNet(input_size, Hidden_Size, output_size)
    model.load_state_dict(model_state) # Loads our model's parameter dictionary.
    model.eval() # model.train() sets our model in training mode. model.eval() set our model in evaluation or inference mode.

    print("Let's talk! (type 'quit' to exit)")
    while interacting_user:
        sentence = input("You: ")
        if sentence in actions: # Some keywords will trigger some functions, such as quit (quit the chat), time, etc.
            print(f">: {actions[sentence]()}")
            response(sentence, myPreProcessing, model, tags, intents)

if __name__ == '__main__':
from time import gmtime, strftime
import requests

def myTime():
    return strftime("%Y-%m-%d %H:%M:%S", gmtime())

def bitcoin():
    r = requests.get('')
    return "The current price of Bitcoin is: $" + r.json()['bpi']['USD']['rate']

learn Python

Compártelo / Share it!

Author: Anawim

I am a social activist. I have two Bachelor's degrees, Maths and Computer & Software Engineering. I also have a Ph.D. in Psychology. I have written nine published books, four scientific articles, and five scientific presentations. I simply want to contribute to making a difference where it counts, so that we make the world a better, more sustainable, prosperous, and fairer place. I am always willing to give free talks and lectures about the social problems that exist in our world today.

Leave a Reply

Your email address will not be published. Required fields are marked *