What Is Deep Learning?

What is Deep Learning? Well, put in the simplest of terms, it’s a subset of Machine Learning and a large part of how Artificial Intelligence is changing the way society works. In this post, we cover Deep Learning thoroughly.

We’ll discuss what it is, how it works, its models, and architecture. It’s going to be a long post, so let’s just get right to it.

So Just What is Deep Learning?

Deep Learning is an important part of Artificial Intelligence. The purpose of Deep Learning is to help machines learn on their own.

Here is how it goes.

Artificial Intelligence > Machine Learning > Deep Learning.

This means that Machine Learning is the sub-directory of Artificial Intelligence and Deep Learning is the sub-directory of Machine Learning.

The brief descriptions below show how the three are related:

  • Artificial Intelligence: Its purpose is to mimic human behavior.
  • Machine Learning: It uses statistical methods to help the machines improve with experience.
  • Deep Learning: It makes it possible for machines to compute multilayer neural networks. This helps the machine with mimicking the human brain’s functions.

Deep Learning works best when there is an enormous amount of input and output data involved. A general rule of thumb is the more data is involved, the better Deep learning performs.

This is the opposite of what other sub-directories of Machine Learning do. They perform badly with a high number of data. The higher the number of data, the worse those directories perform.

Here is a good analogy for understanding the relationship between Deep Learning and data. In this analogy, the purpose of the person is to become as creative as they can be. We can think of data as knowledge the person can gain and Deep Learning as the brain of the person.

Similar to how the brain needs information to be creative, deep learning needs data to be efficient. The more information a brain has, the more creative it can be, the more data is given to deep learning algorithms, the more efficient they can be.

Now that you’ve got an understanding of what is deep learning, let’s talk about how it works.

How Does Deep Learning Work?

Perceptions are the basic building blocks of the neural network, so they are what makes deep learning work.

In order to make it easier for you to understand how Deep Learning works, we’re going to use an example of biological neurons.

A Neuron has dendrites that receive input from other neurons. Once the input has been received, the axon of the receiving neuron sends the same output to the neuron next to it.

In Deep Learning, perceptrons work in a similar way. They receive several inputs (in the form of data) and produce a single binary output. The output has a value in weight which is denoted by w1, w2, w3. The higher the number goes, the more valuable the data has.

In a sense, it’s correct to say that a perceptron makes the decision by weighing the evidence. Let’s make this concept easier to understand, by taking a look at the real-life example.

Suppose you want to go hunting and you’re not sure because of a number of factors.

  1. You don’t know whether your hunting guide is reliable.
  2. You are not sure whether it’s worth it to pay a lot of money to the guide.

We’ll address both points mentioned above by their numbers and we’ll put X behind them. We’ll call point 1 X1 and point 2 X2. If your hunting guide is reliable, the value of X1 is going to be 1 and if it’s not reliable, the value will be 0. The same goes for the second point if you’re sure that the hunting guide is worth the cost, the value of X2 is going to be 1, and if the guide is not worth the cost the value of X2 will be 0.

Using the perceptrons, we can create a model that can solve such problems. The model will use a threshold of 10 and base the answer on how many scores does the question gets. The scores are going to be defined by Weight W.

If the Weight is 6 or above, then the value of X is going to be 1. If it’s going to be 4 or below, the value of X is going to be 0. To bring this output to the users, a neuron is produced which transmits the message to the users.

What is the Difference Between a Perceptron and a Neuron?

They are not two completely different things. In a way, a neuron is a part of a perceptron. Let me show you what I meant.

Some perceptrons have only one layer, and others have multiple layers. A perceptron with a single layer has nothing inside it, whereas a perceptron with multiple layers is called a neural network and it has some hidden layers.

In a neural network, these hidden layers are the part that is the last step of the process. In other words, these hidden layers are called neurons.

Now that we’ve got an idea of how Deep Learning works, let’s talk about the models of Deep Learning.

Deep Learning: A digital image of Neurons in a Neural Network.

Models of Deep Learning

These are the 3 primary models of Deep Learning:

  1. Unsupervised
  2. Semi-Supervised
  3. Supervised

These models have more to do with how a system learns with Deep Learning as you’ll see below.

1. Unsupervised

In this model, machines learn on their own without the help of a human supervisor. They learn the relationship between elements in a dataset on their own.

Once they’ve found the relationship between elements, the specific dataset that shows the relationship is classified. No labels are required to classify the dataset. To analyze the data, the algorithm looks inside the hidden parts.

The three main algorithms used for this purpose are.

  1. Clustering
  2. Neural Network
  3. Anomaly detection

Although all these three algorithms serve their own purpose, the most commonly used and most efficient algorithm for analyzing the data is Clustering.

These are the four primary functions of the Clustering algorithm:

  1. Detect abnormal behavior
  2. Analyze and label data
  3. Compress images
  4. Market segmentation

Now that you’ve understood how Unsupervised learning works, let’s talk about Semi-Supervised learning.

2. Semi-Supervised Learning

The purpose of Semi-Supervised Learning is to perform some tasks on its own and use human help for other tasks.

To teach how Semi-Supervised Learning works, we’re going to use an analogy of an 8-year-old as an example. Both Semi-Supervised Learning and an 8-year-old child works in a similar way, in a sense of course not literally.

An 8-year-old does some tasks on their own, such as using a phone, taking a shower, and wearing clothes. Similarly, Semi-Supervised machines perform some tasks on their own (without using labeled data). For other tasks, an 8-year-old needs his/her parents’ help, such as going to a school that’s not near the house, preparing lunch, and homework. Again similarly, a Semi-Supervised machine needs the help of humans to perform various tasks. The data used in this scenario is called labeled data.

In most Semi-Supervised machines, the majority of data used is unlabelled. Only a small portion of data used is labeled.

Semi-Supervised learning can be divided into two parts:

  1. Inductive: In this part, we infer the mapping from X to Y.
  2. Transductive: In this part we infer the labels for a given dataset.

3. Supervised Learning

Supervised Learning is used in most cases. In this model, the machine has a supervisor that provides the answers for questions and solutions for problems.

The supervisor gives the data beforehand, and the machine uses it to increase its understanding so it can provide the right answer when needed.

In each situation, there is input and output. The input is a vector, and the output is the signal provided by the supervisor (aka labeled data). The labeled data is integrated into the machine and the algorithms use the labeled data for mapping new examples.

These are the four key areas where Supervised Learning provides the best results.

  1. Language and Speech Recognition
  2. Analyzing sentiments
  3. Detecting spam
  4. Identifying objects in an image. Such as text, font, color, gender, and race.

That’s it! We’ve covered the 3 main models of Deep Learning. Now let’s talk about Deep Learning architectures.

Deep Learning can involve one of five different architectures.

Deep Learning Architectures

There are 5 neural network architectures in Deep Learning. They all perform different functions and each one of them is suitable for different tasks.

Here are the names of these 5 architectures, and we’ve explained each of them in detail below.

  1. Recurrent Neural Networks (RNNs)
  2. Convolutional Neural Networks (CNNs)
  3. Generative Adversarial Networks (GANs)
  4. Long Short-Term Memory Networks (LSTM)
  5. Restricted Boltzmann Machines (RBM)

1. Recurrent Neural Networks

RNNs are designed for recognizing patterns when the data given is present in a sequence. The data could be handwriting, text, spoken words, images and etc.

RNNs have backpropagation that allows them to have internal memory. Here is how it works. The data that is entered in the RNNs system goes from the first layer to the second. Once the 2nd receives the data it doesn’t pass it on, instead, it feeds it back to the 1st layer.

This creates a feedback loop of data in the RNNs system, so whenever they need to use that data it’s available on the go which helps the system a ton with making accurate predictions.

2. Convolutional Neural Networks (CNNs)

The CNNs are probably the most commonly used architecture of Deep Learning. They have a built-in 2D Convolutional layer system which makes one of the ideal options for printing 2D imagery.

Here is how Convolutional Neural Networks work. They extract the features from each image while the networking is training on the set of images. Once the network has completed working on an image, that is when CNNs start extracting the features. This is the reason why CNNS are perfect tasks that involve computer vision.

The learning process behind this feature detecting process is extremely complicated. First the CNNs filter hundreds, sometimes thousands of layers to extract the feature from. Processing this extremely high number of layers is already quite complicated, and to make the process even more complicated, the complexity of learned features increases after processing each layer.

Here is a real-life example. Suppose a CNN is being used for classifying an image of a newborn baby. The CNN technology analyzes the picture by taking in the pixels as input. The first layer will take in the edges, the second will arrange those edges in an orderly manner, the third layer will take in features such as face, body, color, and lighting, of the image. A few seconds after this step, the last layer will produce the final output using everything that the previous layers have extracted.

3. Generative Adversarial Networks (GANs)

GANs have this amazing ability to produce new data by themselves entirely. They have two built-in networks and both are opposite of each other. The names of these networks are Generator and Discriminator.

The purpose of The Generator Network is to turn a low-quality and unrealistic image into a high-quality and realistic image. The job of The Discriminator Network is to check whether the result produced by The Generator Network looks realistic and high-quality or not. If for some reason, The Generator doesn’t produce a realistic image The Discriminator will send it back so The Generator produces the right output.

The final output of Generative Adversarial Networks depends on the accuracy of The Discriminator. The final output can only be ideal if The Discriminator Network can detect the difference between real and fake as well as it is possible for the technology to do so. Therefore, The Generator Network weights are learned based on the loss of The Discriminator Network.

Another important thing to note here is that the Discriminator is its own teacher. Meaning it gets better with each try. Therefore, the more The Discriminator Network has been used, the chances are the better result it’ll produce.

Here is the worst-case scenario for both networks. The Discriminator Network will produce bad results if it’s not good at recognizing the difference between real and fake and The Generator Network will do it if it becomes good at fooling the other network. This is why we had mentioned above that they work as opposed to each other in order to achieve a good output.

4. Long Short-Term Memory Networks

Although The LSTM Network was created in 1977 it has only gained popularity in the last decade or so. It is used in everyday devices that we use such as Smartphones and Alexa.

The LSTM was created from the typical neural-based-neural-structures architectures but it has created some functions possible that were not with neural-based-neural-structures. Here is the main thing that it can produce that was not possible before. A memory cell.

The memory cell can retain its value for both the short and long term. Here is how the memory cell works.

The memory cell has three gates that control how the data flows. They decide which dataset goes in and which doesn’t. The names of these gates are input gate, forget gate, and output gate.

Here is how each of them works:

  • Input gate: The input gate decides which data goes into the memory and which doesn’t. If it doesn’t see a specific dataset as beneficial it’ll not let it pass through the gate.
  • Forget gate: The forget gate decides which datasets to remember and which datasets to forget. In other words, it decided which data stays in the memory and which data gets erased.
  • Output gate: The last gate has two main jobs. It decides which set of data gets used in the final output and it influences the other two gates. Here is how it influences the other two gates.

It has a feature called weights. The algorithms such as BPTT optimizes these weights to control the resulting network output error. So if the other two gates are performing perfectly with the help of the algorithms the weights can modify them or even shut them down, if needed.

The advanced LSTM is able to produce video and image captioning systems in which an image or video (of all formats) can be captioned using natural language. However, the LSTM doesn’t perform this function on its own. It uses the help of CNNs to perform these functions.

Here is how the pair of these two architectures work. The CNNs is the one who implements the processing. It captioned the videos and images. Once this process is completed, the LSTM comes into play, it converts the output created by CNNs into natural language.

5. Restricted Boltzmann Machines

First known as Harmonium, this architecture was invented by Paul Smolensky in 1986. Similar to LTMS it became popular in the 21st century.

The Restricted Boltzmann Machines (RBM) has 2 layers. The names of these layers are the input layer and hidden layer. Here is how the RDM works.

There are nodes in the input layer and the hidden layer is connected to all of those nods. This feature is only possible in the modern version of RDM. It’s not possible for all the nods of both layers to be connected in the older version of Traditional Boltzmann Machines (TBM).

When the RBM is training a certain dataset, it uses the stochastic approach to calculate the probability distribution of that data. During this training, each neuron in the machine gets activated.

The machine has two biases. Visible bias and Hidden bias. The Hidden bias is used for activating the process and the Visible bias is used in reconstructing the input.

The reconstructed input is also known as the generative model. This is because the reconstructed input is always different from the original input.


That’s it! We’ve covered everything you need to know to understand the basic functions of Deep Learning. We hope that you’ve found this article to be helpful, if the answer is yes then feel free to share it.

Website | + posts
Share this:

Leave a Comment