Self Organizing Maps
Deep belief networks (Boltzmann Machine)
<> Artificial neural network
<> Convolution neural network
<> Recurrent neural network
There are some basic requirements for starting in Deep Learning, which are:
<> Machine Learning
<> Python Programming
If the set of weights in the network is put to a zero, then all the neurons at each layer will start producing the same output and the same gradients during backpropagation.
As a result, the network cannot learn at all because there is no source of asymmetry between neurons. That is the reason why we need to add randomness to the weight initialization process.
Deep learning has brought significant changes or revolution in the field of machine learning and data science. The concept of a complex neural network (CNN) is the main center of attention for data scientists. It is widely taken because of its advantages in performing next-level machine learning operations. The advantages of deep learning also include the process of clarifying and simplifying issues based on an algorithm due to its utmost flexible and adaptable nature. It is one of the rare procedures which allow the movement of data in independent pathways. Most of the data scientists are viewing this particular medium as an advanced additive and extended way to the existing process of machine learning and utilizing the same for solving complex day to day issues.
With the use of sequential processing, programmers were up against:
The usage of high processing power
The difficulty of parallel execution
Transfer learning is a scenario where a large model is trained on a dataset with a large amount of data and this model is used on simpler datasets, thereby resulting in extremely efficient and accurate neural networks.
The popular examples of transfer learning are in the case of:
Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. Typically, the number of hidden units is much less than the number of visible ones. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation for input data.
<> Valid padding: It is used when there is no requirement for padding. The output matrix will have the dimensions (n – f + 1) X (n – f + 1) after convolution.
<> Same padding: Here, padding elements are added all around the output matrix. It will have the same dimensions as the input matrix.
Yes, it is possible to begin with zero initialization. However, it is not recommended to use because setting up the weights to zero initially will cause all of the neurons to produce the same output and the same gradients when performing backpropagation. This means that the network will not have the ability to learn at all due to the absence of asymmetry between each of the neurons.
Encoder: This part of the network compresses the input into a latent space representation. The encoder layer encodes the input image as a compressed representation in a reduced dimension. The compressed image is the distorted version of the original image.
Code: This part of the network represents the compressed input which is fed to the decoder.
Decoder: This layer decodes the encoded image back to the original dimension. The decoded image is a lossy reconstruction of the original image and it is reconstructed from the latent space representation.
An Autoencoder consist of three layers:
Image Coloring: Autoencoders are used for converting any black and white picture into a colored image. Depending on what is in the picture, it is possible to tell what the color should be.
Feature variation: It extracts only the required features of an image and generates the output by removing any noise or unnecessary interruption.
Dimensionality Reduction: The reconstructed image is the same as our input but with reduced dimensions. It helps in providing a similar image with a reduced pixel value.
Denoising Image: The input seen by the autoencoder is not the raw input but a stochastically corrupted version. A denoising autoencoder is thus trained to reconstruct the original input from the noisy version.
An autoencoder neural network is an Unsupervised Machine learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. Autoencoders are used to reduce the size of our inputs into a smaller representation. If anyone needs the original data, they can reconstruct it from the compressed data.
Leaky ReLU, also called LReL, is used to manage a function to allow the passing of small-sized negative values if the input value to the network is less than zero.
Mini-batch gradient descent is popular as:
<> It is more efficient when compared to stochastic gradient descent.
<> Generalization is done by finding the flat minima.
<> It helps avoid the local minima by allowing the approximation of the gradient for the entire dataset.
There are three variants of gradient descent as shown below:
<> Stochastic gradient descent: A single training example is used for the calculation of gradient and for updating parameters.
<> Batch gradient descent: Gradient is calculated for the entire dataset, and parameters are updated at every iteration.
<> Mini-batch gradient descent: Samples are broken down into smaller-sized batches and then worked on as in the case of stochastic gradient descent.
There are a few disadvantages of Deep Learning as mentioned below:
<> Networks in Deep Learning require a huge amount of data to train well.
<> Deep Learning concepts can be complex to implement sometimes.
<> Achieving a high amount of model efficiency is difficult in many cases.
A Restricted Boltzmann Machine, or RBM for short, is an undirected graphical model that is popularly used in Deep Learning today. It is an algorithm that is used to perform:
<> Dimensionality reduction
<> Collaborative filtering
<> Topic modeling
There are four main types of autoencoders:
<> Deep autoencoders
<> Convolutional autoencoders
<> Sparse autoencoders
<> Contractive autoencoders
Autoencoders have a wide variety of usage in the real world. The following are some of the popular ones:
<> Adding color to black–white images
<> Removing noise from images
<> Dimensionality reduction
<> Feature removal and variation
LSTM stands for long short-term memory. It is a type of RNN that is used to sequence a string of data. It consists of feedback chains that give it the ability to perform like a general-purpose computational entity.
Exploding gradients are an issue causing a scenario that clumps up the gradients. This creates a large number of updates of the weights in the model when training.
The working of gradient descent is based on the condition that the updates are small and controlled. Controlling the updates will directly affect the efficiency of the model.
Vanishing gradient is a scenario that occurs when we use RNNs. Since RNNs make use of backpropagation, gradients at every step of the way will tend to get smaller as the network traverses through backward iterations. This equates to the model learning very slowly, thereby, causing efficiency problems in the network.
RNNs stand for recurrent neural networks, which form to be a popular type of artificial neural network. They are used to process sequences of data, text, genomes, handwriting, and more. RNNs make use of backpropagation for the training requirements.
There are four main layers that form a convolutional neural network:
<> Convolution: These are layers consisting of entities called filters that are used as parameters to train the network.
<> ReLu: It is used as the activation function and is always used with the convolution layer.
<> Pooling: Pooling is the concept of shrinking the complex data entities that form after convolution and is primarily used to maintain the size of an image after shrinkage.
<> Connectedness: This is used to ensure that all of the layers in the neural network are fully connected and activation can be computed using the bias easily.
A computation graph is a series of operations that are performed to take inputs and arrange them as nodes in a graph structure. It can be considered as a way of implementing mathematical calculations into a graph. This helps in parallel processing and provides high performance in terms of computational capability.
TensorFlow has numerous advantages, and some of them are as follows:
<> High amount of flexibility and platform independence
<> Trains using CPU and GPU
<> Supports auto differentiation and its features
<> Handles threads and asynchronous computation easily
<> Has a large community
A Boltzmann machine is a type of recurrent neural network that uses binary decisions, alongside biases, to function. These neural networks can be hooked up together to create deep belief networks, which are very sophisticated and used to solve the most complex problems out there.
In Deep Learning, model capacity refers to the capacity of the model to take in a variety of mapping functions. Higher model capacity means a large amount of information can be stored in the network.
Tensors are multidimensional arrays in Deep Learning that are used to represent data. They represent the data with higher dimensions. Due to the high-level nature of the programming languages, the syntax of tensors is easily understood and broadly used.
Dropout is a technique that is used to avoid overfitting a model in Deep Learning. If the dropout value is too low, then it will have minimal effect on learning. If it is too high, then the model can under-learn, thereby, causing lower efficiency.
Hyperparameters can be trained using four components as shown below:
<> Batch size: This is used to denote the size of the input chunk. Batch sizes can be varied and cut into sub-batches based on the requirement.
<> Epochs: An epoch denotes the number of times the training data is visible to the neural network so that it can train. Since the process is iterative, the number of epochs will vary based on the data.
<> Momentum: Momentum is used to understand the next consecutive steps that occur with the current data being executed at hand. It is used to avoid oscillations when training.
<> Learning rate: Learning rate is used as a parameter to denote the time required for the network to update the parameters and learn.
Hyperparameters are variables used to determine the structure of a neural network. They are also used to understand parameters, such as the learning rate and the number of hidden layers, and more, present in the neural network.
Backpropagation is used to minimize the cost function by first seeing how the value changes when weights and biases are tweaked in the neural network. This change is easily calculated by understanding the gradient at every hidden layer. It is called backpropagation as the process begins from the output layer, moving backward to the input layers.
Forward propagation is the scenario where inputs are passed to the hidden layer with weights. In every single hidden layer, the output of the activation function is calculated until the next layer can be processed. It is called forward propagation as the process begins from the input layer and moves toward the final output layer.
Data normalization is a preprocessing step that is used to refit the data into a specific range. This ensures that the network can learn effectively as it has better convergence when performing backpropagation.
There are five main steps that are used to initialize and use the gradient descent algorithm:
<> Initialize biases and weights for the network
<> Send input data through the network (the input layer)
<> Calculate the difference (the error) between expected and predicted values
<> Change values in neurons to minimize the loss function
<> Multiple iterations to determine the best weights for efficient working
Autoencoders are artificial neural networks that learn without any supervision. Here, these networks have the ability to automatically learn by mapping the inputs to the corresponding outputs.
Autoencoders, as the name suggests, consist of two entities:
<> Encoder: Used to fit the input into an internal computation state
<> Decoder: Used to convert the computational state back into the output
The swish function is a self-gated activation function developed by Google. It is now a popular activation function used by many as Google claims that it outperforms all of the other activation functions in terms of computational efficiency.
This question is quite common in a Deep Learning interview. Make sure to answer based on the experience you have with the tools.
However, some of the top Deep Learning frameworks out there today are:
The loss function is used as a measure of accuracy to see if a neural network has learned accurately from the training data or not. This is done by comparing the training dataset to the testing dataset.
The loss function is a primary measure of the performance of the neural network. In Deep Learning, a good performing network will have a low loss function at all times when training.
There are five main steps that determine the learning of a perceptron:
<> Initialize thresholds and weights
<> Provide inputs
<> Calculate outputs
<> Update weights in each step
<> Repeat steps 2 to 4
Fourier transform is an effective package used for analyzing and managing large amounts of data present in a database. It can take in real-time array data and process it quickly. This ensures that high efficiency is maintained and also makes the model more open to processing a variety of signals.
Activation functions are entities in Deep Learning that are used to translate inputs into a usable output parameter. It is a function that decides if a neuron needs activation or not by calculating the weighted sum on it with the bias.
Using an activation function makes the model output to be non-linear. There are many types of activation functions:
Overfitting is a very common issue when working with Deep Learning. It is a scenario where the Deep Learning algorithm vigorously hunts through the data to obtain some valid information.
This makes the Deep Learning model pick up noise rather than useful data, causing very high variance and low bias. This makes the model less accurate, and this is an undesirable effect that can be prevented.
Deep Learning is used in a variety of fields today. The most used ones are as follows:
<> Sentiment Analysis
<> Computer Vision
<> Automatic Text Generation
<> Object Detection
<> Natural Language Processing
<> Image Recognition
Machine Learning is powerful in a way that it is sufficient to solve most of the problems. However, Deep Learning gets an upper hand when it comes to working with data that has a large number of dimensions. With data that is large in size, a Deep Learning model can easily work with it as it is built to handle this.
A perceptron is similar to the actual neuron in the human brain. It receives inputs from various entities and applies functions to these inputs, which transform them to be the output.
A perceptron is mainly used to perform binary classification where it sees an input, computes functions based on the weights of the input, and outputs the required transformation.
Machine Learning forms a subset of Artificial Intelligence, where we use statistics and algorithms to train machines with data, thereby, helping them improve with experience.
Deep Learning is a part of Machine Learning, which involves mimicking the human brain in terms of structures called neurons, thereby, forming neural networks.