how to solve xor problem

Towards Data Science

Aniruddha Karajgi

Nov 4, 2020

How Neural Networks Solve the XOR Problem

And why hidden layers are so important.

The perceptron is a classification algorithm. Specifically, it works as a linear binary classifier. It was invented in the late 1950s by Frank Rosenblatt.

The perceptron basically works as a threshold function — non-negative outputs are put into one class while negative ones are put into the other class.

Though there’s a lot to talk about when it comes to neural networks and their variants, we’ll be discussing a specific problem that highlights the major differences between a single layer perceptron and one that has a few more layers.

Table of Contents

Structure and properties.

A perceptron has the following components:

Input Nodes

These nodes contain the input to the network. In any iteration — whether testing or training — these nodes are passed the input from our data.

Weights and Biases

These parameters are what we update when we talk about “training” a model. They are initialized to some random value or set to 0 and updated as the training progresses. The bias is analogous to a weight independent of any input node. Basically, it makes the model more flexible, since you can “move” the activation function around.

The output calculation is straightforward.

This can be expressed like so:

This is often simplified and written as a dot- product of the weight and input vectors plus the bias.

Activation Function

This function allows us to fit the output in a way that makes more sense. For example, in the case of a simple classifier, an output of say -2.5 or 8 doesn’t make much sense with regards to classification. If we use something called a sigmoidal activation function, we can fit that within a range of 0 to 1, which can be interpreted directly as a probability of a datapoint belonging to a particular class.

Though there are many kinds of activation functions, we’ll be using a simple linear activation function for our perceptron. The linear activation function has no effect on its input and outputs it as is.


How does a perceptron assign a class to a datapoint?

We know that a datapoint’s evaluation is expressed by the relation wX + b . We define a threshold ( θ ) which classifies our data. Generally, this threshold is set to 0 for a perceptron.

So points for which wX + b is greater than or equal to 0 will belong to one class while the rest ( wX + b is negative) are classified as belonging to the other class. We can express this as:

Training algorithm

To train our perceptron, we must ensure that we correctly classify all of our train data. Note that this is different from how you would train a neural network, where you wouldn’t try and correctly classify your entire training data. That would lead to something called overfitting in most cases.

We start the training algorithm by calculating the gradient , or Δw. Its the product of:

We get our new weights by simply incrementing our original weights with the computed gradients multiplied by the learning rate.

A simple intuition for how this works: if our perceptron correctly classifies an input data point, actual_value — computed_value would be 0 , and there wouldn’t be any change in our weights since the gradient is now 0 .

The 2D XOR problem

In the XOR problem, we are trying to train a model to mimic a 2D XOR function.

The XOR function

The function is defined like so:

If we plot it, we get the following chart. This is what we’re trying to classify. The ⊕ (“o-plus”) symbol you see in the legend is conventionally used to represent the XOR boolean operator.

Our algorithm —regardless of how it works — must correctly output the XOR value for each of the 4 points. We’ll be modelling this as a classification problem, so Class 1 would represent an XOR value of 1, while Class 0 would represent a value of 0.

Attempt #1: The Single Layer Perceptron

Let's model the problem using a single layer perceptron.

The data we’ll train our model on is the table we saw for the XOR function.


Apart from the usual visualization ( matplotlib and seaborn ) and numerical libraries ( numpy ), we’ll use cycle from itertools . This is done since our algorithm cycles through our data indefinitely until it manages to correctly classify the entire training data without any mistakes in the middle.

We next create our training data. This data is the same for each kind of logic gate, since they all take in two boolean variables as input.

The training function

Here, we cycle through the data indefinitely, keeping track of how many consecutive datapoints we correctly classified. If we manage to classify everything in one stretch, we terminate our algorithm.

If not, we reset our counter, update our weights and continue the algorithm.

To visualize how our model performs, we create a mesh of datapoints, or a grid, and evaluate our model at each point in that grid. Finally, we colour each point based on how our model classifies it. So the Class 0 region would be filled with the colour assigned to points belonging to that class.

The Perceptron class

To bring everything together, we create a simple Perceptron class with the functions we just discussed. We have some instance variables like the training data, the target, the number of input nodes and the learning rate.

Let’s create a perceptron object and train it on the XOR data.

You’ll notice that the training loop never terminates, since a perceptron can only converge on linearly separable data. Linearly separable data basically means that you can separate data with a point in 1D, a line in 2D, a plane in 3D and so on.

A perceptron can only converge on linearly separable data. Therefore, it isn’t capable of imitating the XOR function.

Remember that a perceptron must correctly classify the entire training data in one go. If we keep track of how many points it correctly classified consecutively, we get something like this.

The algorithm only terminates when correct_counter hits 4 — which is the size of the training set — so this will go on indefinitely.

The Need for Non-Linearity

It is clear that a single perceptron will not serve our purpose: the classes aren’t linearly separable. This boils down to the fact that a single linear decision boundary isn’t going to work.

Non-linearity allows for more complex decision boundaries. One potential decision boundary for our XOR data could look like this.

The 2d XOR problem — Attempt #2

We know that the imitating the XOR function would require a non-linear decision boundary.

But why do we have to stick with a single decision boundary?

The Intuition

Let’s first break down the XOR function into its AND and OR counterparts.

The XOR function on two boolean variables A and B is defined as:

Let’s add A.~A and B.~B to the equation. Since they both equate to 0, the equation remains valid.

Let’s rearrange the terms so that we can pull out A from the first part and B from the second.

Simplifying it further, we get:

Using DeMorgan’s laws for boolean algebra: ~A + ~B = ~(AB) , we can replace the second term in the above equation like so:

Let’s replace A and B with x_1 and x_2 respectively since that’s the convention we’re using in our data.

The XOR function can be condensed into two parts: a NAND and an OR . If we can calculate these separately, we can just combine the results, using an AND gate.

Let’s call the OR section of the formula part I, and the NAND section as part II.

Modelling the OR part

We’ll use the same Perceptron class as before, only that we’ll train it on OR training data.

This converges, since the data for the OR function is linearly separable. If we plot the number of correctly classified consecutive datapoints as we did in our first attempt, we get this plot. It’s clear that around iteration 50, it hits the value 4, meaning that it classified the entire dataset correctly.

correct_counter measures the number of consecutive datapoints correctly classified by our Perceptron

The decision boundary plot looks like this:

Modelling the NAND part

Let’s move on to the second part. We need to model a NAND gate. Just like the OR part, we’ll use the same code, but train the model on the NAND data. So our input data would be:

After training, the following plots show that our model converged on the NAND data and mimics the NAND gate perfectly.

Bringing everything together

Two things are clear from this:

Let’s model this into our network. First, let’s consider our two perceptrons as black boxes.

After adding our input nodes x_1 and x_2, we can finally implement this through a simple function.

Finally, we need an AND gate, which we’ll train just we have been.

What we now have is a model that mimics the XOR function.

If we were to implement our XOR model, it would look something like this:

If we plot the decision boundaries from our model, — which is basically an AND of our OR and NAND models — we get something like this:

Out of all the 2 input logic gates, the XOR and XNOR gates are the only ones that are not linearly-separable.

Though our model works, it doesn’t seem like a viable solution to most non-linear classification or regression tasks. It’s really specific to this case, and most problems can’t be split into just simple intermediate problems that can be individually solved and then combined. For something like this:

A potential decision boundary could be something like this:

We need to look for a more general model, which would allow for non-linear decision boundaries, like a curve, as is the case above. Let’s see how an MLP solves this issue.

The Multi-layered Perceptron

The overall components of an MLP like input and output nodes, activation function and weights and biases are the same as those we just discussed in a perceptron.

The biggest difference? An MLP can have hidden layers.

Hidden layers

Hidden layers are those layers with nodes other than the input and output nodes.

An MLP is generally restricted to having a single hidden layer.

The hidden layer allows for non-linearity. A node in the hidden layer isn’t too different to an output node: nodes in the previous layers connect to it with their own weights and biases, and an output is computed, generally with an activation function.

Remember the linear activation function we used on the output node of our perceptron model? There are several more complex activation functions. You may have heard of the sigmoid and the tanh functions, which are some of the most popular non-linear activation functions.

Activation functions should be differentiable, so that a network’s parameters can be updated using backpropagation.

Though the output generation process is a direct extension of that of the perceptron, updating weights isn’t so straightforward. Here’s where backpropagation comes into the picture.

Backpropagation is a way to update the weights and biases of a model starting from the output layer all the way to the beginning. The main principle behind it is that each parameter changes in proportion to how much it affects the network’s output. A weight that has barely any effect on the output of the model will show a very small change, while one that has a large negative impact will change drastically to improve the model’s prediction power.

Backpropagation is an algorithm for update the weights and biases of a model based on their gradients with respect to the error function, starting from the output layer all the way to the first layer.

The method of updating weights directly follows from derivation and the chain rule.

There’s a lot to cover when talking about backpropagation. It warrants its own article. So if you want to find out more, have a look at this excellent article by Simeon Kostadinov .

Understanding Backpropagation Algorithm

Learn the nuts and bolts of a neural network’s most important ingredient.

Attempt #3: the Multi-layered Perceptron

The architecture.

There are no fixed rules on the number of hidden layers or the number of nodes in each layer of a network. The best performing models are obtained through trial and error.

The architecture of a network refers to its general structure — the number of hidden layers, the number of nodes in each layer and how these nodes are inter-connected.

Let’s go with a single hidden layer with two nodes in it. We’ll be using the sigmoid function in each of our hidden layer nodes and of course, our output node.

The libraries used here like NumPy and pyplot are the same as those used in the Perceptron class.

The training algorithm

The algorithm here is slightly different: we iterate through the training data a fixed number of times — num_epochs to be precise. In each iteration, we do a forward pass, followed by a backward pass where we update the weights and biases as necessary. This is called backpropagation.

The sigmoid activation function

Here, we define a sigmoid function. As discussed, it’s applied to the output of each hidden layer node and the output node. Its differentiable, so it allows us to comfortably perform backpropagation to improve our model.

Its derivate its also implemented through the _delsigmoid function.

The forward and backward pass

In the forward pass, we apply the wX + b relation multiple times, and applying a sigmoid function after each call.

In the backward pass, implemented as the update_weights function, we calculate the gradients of each of our 6 weights and 3 biases with respect to the error function and update them by the factor learning rate * gradient.

Finally, the classify function works as expected: Since a sigmoid function outputs values between 0 and 1, we simply interpret them as probabilities of belonging to a particular class. Hence, outputs greater than or equal to 0.5 are classified as belonging to Class 1 while those outputs that are less than 0.5 are said to belong to Class 0 .

The MLP class

Let’s bring everything together by creating an MLP class. All the functions we just discussed are placed in it. The plot function is exactly the same as the one in the Perceptron class.

Let’s train our MLP with a learning rate of 0.2 over 5000 epochs.

If we plot the values of our loss function, we get the following plot after about 5000 iterations, showing that our model has indeed converged.

A clear non-linear decision boundary is created here with our generalized neural network, or MLP.

Note #1: Adding more layers or nodes

Adding more layers or nodes gives increasingly complex decision boundaries. But this could also lead to something called overfitting — where a model achieves very high accuracies on the training data, but fails to generalize.

A good resource is the Tensorflow Neural Net playground, where you can try out different network architectures and view the results.

Tensorflow - Neural Network Playground

It's a technique for building a computer program that learns from data. it is based very loosely on how we think the….

Note #2: Choosing a loss function

The loss function we used in our MLP model is the Mean Squared loss function. Though this is a very popular loss function, it makes some assumptions on the data (like it being gaussian) and isn’t always convex when it comes to a classification problem. It was used here to make it easier to understand how a perceptron works, but for classification tasks, there are better alternatives, like binary cross-entropy loss.

How to Choose Loss Functions When Training Deep Learning Neural Networks - Machine Learning Mastery

Deep learning neural networks are trained using the stochastic gradient descent optimization algorithm. as part of the….

You’ll find the entire code from this post here.


The sample code from this post can be found here..

Neural nets used in production or research are never this simple, but they almost always build on the basics outlined here. Hopefully, this post gave you some idea on how to build and train perceptrons and vanilla networks.

Thanks for reading!

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Aniruddha Karajgi

Data Scientist at Tekion | Samsung Research | GSoC | CS at BITS Pilani ’21 | | LinkedIn:

Text to speech

DEV Community

DEV Community

Cover image for Demystifying the XOR problem

Posted on Apr 3, 2020

Demystifying the XOR problem

In my previous post on Extreme learning machines I told that the famous pioneers in AI Marvin Minsky and Seymour Papert claimed in their book Perceptron [1969] , that the simple XOR cannot be resolved by two-layer of feedforward neural networks, which "drove research away from neural networks in the 1970s, and contributed to the so-called AI winter".[Wikipedia 2013]

Let's explore what is this XOR problem...

The XOR Problem

The XOR, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOR logic gates given two binary inputs. An XOR function should return a true value if the two inputs are not equal and a false value if they are equal. All possible inputs and predicted outputs are shown in figure 1.

how to solve xor problem

XOR is a classification problem and one for which the expected outputs are known in advance. It is therefore appropriate to use a supervised learning approach.

On the surface, XOR appears to be a very simple problem, however, Minksy and Papert (1969) showed that this was a big problem for neural network architectures of the 1960s, known as perceptrons.


Like all ANNs, the perceptron is composed of a network of *units *, which are analagous to biological neurons. A unit can receive an input from other units. On doing so, it takes the sum of all values received and decides whether it is going to forward a signal on to other units to which it is connected. This is called activation. The activation function uses some means or other to reduce the sum of input values to a 1 or a 0 (or a value very close to a 1 or 0) in order to represent activation or lack thereof. Another form of unit, known as a bias unit, always activates, typically sending a hard coded 1 to all units to which it is connected.

Perceptrons include a single layer of input units — including one bias unit — and a single output unit (see figure 2). Here a bias unit is depicted by a dashed circle, while other units are shown as blue circles. There are two non-bias input units representing the two binary input values for XOR. Any number of input units can be included.

how to solve xor problem

The perceptron is a type of feed-forward network, which means the process of generating an output — known as forward propagation — flows in one direction from the input layer to the output layer. There are no connections between units in the input layer. Instead, all units in the input layer are connected directly to the output unit.

A simplified explanation of the forward propagation process is that the input values X1 and X2, along with the bias value of 1, are multiplied by their respective weights W0..W2, and parsed to the output unit. The output unit takes the sum of those values and employs an activation function — typically the Heavside step function — to convert the resulting value to a 0 or 1, thus classifying the input values as 0 or 1.

It is the setting of the weight variables that gives the network’s author control over the process of converting input values to an output value. It is the weights that determine where the classification line, the line that separates data points into classification groups, is drawn. If all data points on one side of a classification line are assigned the class of 0, all others are classified as 1.

A limitation of this architecture is that it is only capable of separating data points with a single line. This is unfortunate because the XOR inputs are not linearly separable . This is particularly visible if you plot the XOR input values to a graph. As shown in figure 3, there is no way to separate the 1 and 0 predictions with a single classification line.

how to solve xor problem

Multilayer Perceptrons

The solution to this problem is to expand beyond the single-layer architecture by adding an additional layer of units without any direct access to the outside world, known as a hidden layer. This kind of architecture — shown in Figure 4 — is another feed-forward network known as a multilayer perceptron (MLP).

how to solve xor problem

It is worth noting that an MLP can have any number of units in its input, hidden and output layers. There can also be any number of hidden layers. The architecture used here is designed specifically for the XOR problem.

Similar to the classic perceptron, forward propagation begins with the input values and bias unit from the input layer being multiplied by their respective weights, however, in this case there is a weight for each combination of input (including the input layer’s bias unit) and hidden unit (excluding the hidden layer’s bias unit). The products of the input layer values and their respective weights are parsed as input to the non-bias units in the hidden layer. Each non-bias hidden unit invokes an activation function — usually the classic sigmoid function in the case of the XOR problem — to squash the sum of their input values down to a value that falls between 0 and 1 (usually a value very close to either 0 or 1). The outputs of each hidden layer unit, including the bias unit, are then multiplied by another set of respective weights and parsed to an output unit. The output unit also parses the sum of its input values through an activation function — again, the sigmoid function is appropriate here — to return an output value falling between 0 and 1. This is the predicted output.

This architecture, while more complex than that of the classic perceptron network, is capable of achieving non-linear separation. Thus, with the right set of weight values, it can provide the necessary separation to accurately classify the XOR inputs.

how to solve xor problem


The elephant in the room, of course, is how one might come up with a set of weight values that ensure the network produces the expected output. In practice, trying to find an acceptable set of weights for an MLP network manually would be an incredibly laborious task. In fact, it is NP-complete (Blum and Rivest, 1992). However, it is fortunately possible to learn a good set of weight values automatically through a process known as backpropagation. This was first demonstrated to work well for the XOR problem by Rumelhart et al. (1985).

The backpropagation algorithm begins by comparing the actual value output by the forward propagation process to the expected value and then moves backward through the network, slightly adjusting each of the weights in a direction that reduces the size of the error by a small degree. Both forward and back propagation are re-run thousands of times on each input combination until the network can accurately predict the expected output of the possible inputs using forward propagation.

For the XOR problem, 100% of possible data examples are available to use in the training process. We can therefore expect the trained network to be 100% accurate in its predictions and there is no need to be concerned with issues such as bias and variance in the resulting model.

In this post, we explored the classic ANN XOR problem. The problem itself was described in detail, along with the fact that the inputs for XOR are not linearly separable into their correct classification categories. A non-linear solution — involving an MLP architecture — was explored at a high level, along with the forward propagation algorithm used to generate an output value from the network and the backpropagation algorithm, which is used to train the network.

The next post in this series will feature a implementation of the MLP architecture described here, including all of the components necessary to train the network to act as an XOR logic gate.

Blum, A. Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127.

Minsky, M. Papert, S. (1969). Perceptron: an introduction to computational geometry. The MIT Press, Cambridge, expanded edition, 19(88), 2.

Rumelhart, D. Hinton, G. Williams, R. (1985). Learning internal representations by error propagation (No. ICS-8506). California University San Diego LA Jolla Inst. for Cognitive Science.

Top comments (2)


Templates let you quickly answer FAQs or store snippets for re-use.

tbrodbeck profile image

Hey Jayesh. Nice post thank you! I would love to read the follow up with the implementation because I have problems of teaching MLP's simple relationships. I could not find that one here yet, so if you could provide me a link I would be more than happy.

umutlacivert_kazankaya_b profile image

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink .

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

50 CLI Tools You Can't Live Without

The top 50 must-have CLI tools, including some scripts to help you automate the installation and updating of these tools on various systems/distros.

hojjatbandani profile image

3D Card Animation Using HTML CSS & JavaScript | CSS Animation | Slide Card | Rotate Animation

soudemy - Jan 20

banjtheman profile image

How I used GPT-3 to Build 1,000 AWS Quiz Questions

Banjo Obayomi - Feb 22

varshithvhegde profile image

7 Free AI Website Tools For Everyone and Anyone

Varshith V Hegde - Feb 14

ivansimeonov profile image

Ngx-Translate: Internationalization (i18n) library for Angular

Ivan Simeonov - Feb 23

Once suspended, jbahire will not be able to comment or publish posts until their suspension is removed.

Once unsuspended, jbahire will be able to comment and publish posts again.

Once unpublished, all posts by jbahire will become hidden and only accessible to themselves.

If jbahire is not suspended, they can still re-publish their posts from their dashboard.

Once unpublished, this post will become invisible to the public and only accessible to Jayesh Bapu Ahire.

They can still re-publish the post if they are not suspended.

Thanks for keeping DEV Community safe. Here is what you can do to flag jbahire:

jbahire consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy.

Unflagging jbahire will restore default visibility to their posts.

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

how to solve xor problem

XOR problem with neural networks: An explanation for beginners

how to solve xor problem

Among various logical gates, the XOR or also known as the “exclusive or” problem is one of the logical operations when performed on binary inputs that yield output for different combinations of input, and for the same combination of input no output is produced. The outputs generated by the XOR logic are not linearly separable in the hyperplane. So  In this article let us see what is the XOR logic and how to integrate the XOR logic using neural networks.

Table of Contents

What is xor operating logic, the linear separability of points, why can’t perceptrons solve the xor problem, how to solve the xor problem with neural networks.

Let us try to understand the XOR operating logic using a truth table.

From the below truth table it can be inferred that XOR produces an output for different states of inputs and for the same inputs the XOR logic does not produce any output. The Output of XOR logic is yielded by the equation as shown below.

Sign up for your weekly dose of what's up in emerging technology.

Output= X.Y’+X’.Y

The XOR gate can be usually termed as a combination of NOT and AND gates and this type of logic finds its vast application in cryptography and fault tolerance. The logical diagram of an XOR gate is shown below.

Download our Mobile App

how to solve xor problem

Are you looking for a complete repository of Python libraries used in data science,  check out here .

how to solve xor problem

Linear separability of points is the ability to classify the data points in the hyperplane by avoiding the overlapping of the classes in the planes. Each of the classes should fall above or below the separating line and then they are termed as linearly separable data points. With respect to logical gates operations like AND or OR the outputs generated by this logic are linearly separable in the hyperplane

The linear separable data points appear to be as shown below.

how to solve xor problem

So here we can see that the pink dots and red triangle points in the plot do not overlap each other and the linear line is easily separating the two classes where the upper boundary of the plot can be considered as one classification and the below region can be considered as the other region of classification.

Need for linear separability in neural networks

Linear separability is required in neural networks is required as basic operations of neural networks would be in N-dimensional space and the data points of the neural networks have to be linearly separable to eradicate the issues with wrong weight updation and wrong classifications Linear separability of data is also considered as one of the prerequisites which help in the easy interpretation of input spaces into points whether the network is positive and negative and linearly separate the data points in the hyperplane.

Perceptrons are mainly termed as “linear classifiers” and can be used only for linear separable use cases and XOR is one of the logical operations which are not linearly separable as the data points will overlap the data points of the linear line or different classes occur on a single side of the linear line. 

Let us understand why perceptrons cannot be used for XOR logic using the outputs generated by the XOR logic and the corresponding graph for XOR logic as shown below.

how to solve xor problem

In the above figure, we can see that above the linear separable line the red triangle is overlapping with the pink dot and linear separability of data points is not possible using the XOR logic. So this is where multiple neurons also termed as Multi-Layer Perceptron are used with a hidden layer to induce some bias while weight updation and yield linear separability of data points using the XOR logic. So now let us understand how to solve the XOR problem with neural networks.

The XOR problem with neural networks can be solved by using Multi-Layer Perceptrons or a neural network architecture with an input layer, hidden layer, and output layer. So during the forward propagation through the neural networks, the weights get updated to the corresponding layers and the XOR logic gets executed. The Neural network architecture to solve the XOR problem will be as shown below.

how to solve xor problem

So with this overall architecture and certain weight parameters between each layer, the XOR logic output can be yielded through forward propagation. The overall neural network architecture uses the Relu activation function to ensure the weights updated in each of the processes to be 1 or 0 accordingly where for the positive set of weights the output at the particular neuron will be 1 and for a negative weight updation at the particular neuron will be 0 respectively. So let us understand one output for the first input state 

Example :  For X1=0 and X2=0 we should get an input of 0. Let us solve it.

Solution: Considering X1=0 and X2=0

H1=RELU(0.1+0.1+0) = 0


So now we have obtained the weights that were propagated from the input layer to the hidden layer. So now let us propagate from the hidden layer to the output layer


This is how multi-layer neural networks or also known as Multi-Layer perceptrons (MLP) are used to solve the XOR problem and for all other input sets the architecture provided above can be verified and the right outcome for XOR logic can be yielded.

So among the various logical operations, XOR logical operation is one such problem wherein linear separability of data points is not possible using single neurons or perceptrons. So for solving the XOR problem for neural networks it is necessary to use multiple neurons in the neural network architecture with certain weights and appropriate activation functions to solve the XOR problem with neural networks.

More Great AIM Stories

Openai junks diffusion for consistency models, microsoft’s ai advantage puts pressure on salesforce, zoho to innovate, why is nandan nilekani the ‘go to guy’ for goi, intelligence in a dish, borne out of hackathons, chatgpt takes over hr, apple’s boon and bane with china.

Darshan M

Our Upcoming Conferences

16-17th Mar, 2023 | Bangalore Rising 2023 | Women in Tech Conference

27-28th Apr, 2023 I Bangalore Data Engineering Summit (DES) 2023 27-28th Apr, 2023

23 Jun, 2023 | Bangalore MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group.

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox, aim top stories.

how to solve xor problem

Google claims that the Carbon footprint of ML training is reducing. Is it really? 

Google’s paper claims that the net carbon impact of ML computations for some companies could be considered zero.

Bengio & LeCun debate on how to crack human level AI

Bengio & LeCun debate on how to crack human-level AI

It can be hypothesised that this accumulated knowledge may constitute the basis for what is often called common sense.

how to solve xor problem

What pisses off data scientists the most 

For machine learning engineers, the biggest roadblock to getting models to production is access to compute resources.

how to solve xor problem

Indian IT sector’s attrition rate on the rise, no sign of respite

Attrition in the Indian IT sector is at an all-time high. For the quarter

how to solve xor problem

Explainable image classification using Faster R-CNN and Grad-Cam

Grad-Cam is an algorithm applied with CNN models to make computer vision-based predictions explainable. In this article, we will discuss how we can simply apply Grad-CAM methods with the Faster R-CNN in the PyTorch environment and make the image classification explainable.

how to solve xor problem

How this Gurugram based AI-startup is revolutionising English teaching for kids

We have found that many parents and children switch over to and prefer an English-English belt over a native language.

how to solve xor problem

What are Github’s Codespace prebuilds

Codespaces provide a one-click onboarding solution that enables developers to get started on a project quickly without performing any manual setup.

Hands-on oneAPI workshop: Getting started with Intel® Optimization for PyTorch*

Hands-on oneAPI workshop: Getting started with Intel® Optimisation for PyTorch*

Intel® has been working closely with Facebook (now Meta) to contribute optimisations to the PyTorch community.

The Rising

Why should you attend The Rising 2022

In the last three years, the event has gained a good reputation where leading visionaries share their experience of building a successful career in the ever-dynamic field of data science.

how to solve xor problem

This Jaipur based startup is using AI & IoT for Tyre Pressure Monitoring System

The TPMS (Tyre Pressure Monitoring System) is an electronic rim-mounted system that warns the driver every 20 seconds about the temperature and air pressure in the tyres of heavy vehicles without the need for signal boosters.

Our mission is to bring about better-informed and more conscious decisions about technology through authoritative, influential, and trustworthy journalism.

Shape the future of tech.

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2023

The Rising 2023: 5th edition of India's biggest gathering of Women in Tech to be held in Bengaluru on March 16 and 17th

Deep Learning Nerds Logo

Solving the XOR problem

A typical example for the use of a Neural Network is solving the XOR problem . This blog article explains the XOR problem and how it can be solved by using a Neural Network. We will see that the problem can't be solved by a simple Perceptron. Instead, we need a Neural Network with a at least one hidden layer to solve the problem.

Logical XOR

The most popular truth tables are OR and AND . These tasks can be solved by a simple Perceptron. XOR stands for 'exclusive or'. The output of the XOR function has only a true value if the two inputs are different. If the two inputs are identical, the XOR function returns a false value. The following table shows the inputs and outputs of the XOR function.

So, what is the special thing about the XOR function? In contrast to the OR problem and AND problem , the XOR problem cannot be linearly separated by a single boundaryline. Let's have a look at the following images:

XOR boundary line

Defining a Neural Network

The next step is to define a Neural Network that is able to solve the XOR problem . As mentioned above, the XOR function cannot be linearly separated by one boundaryline. This is also the reason why the function cannot be solved by a simple Perceptron. We need a more complex model. With the correct choice of functions and weight parameters, a Neural Network with one hidden layer is able to solve the XOR problem . For this, let's define the Neural Network we need.

Solving the XOR problem with a Neural Network

In our model, the activation function is a simple threshold function. If a certain threshold value is exceeded, the function returns output 1, otherwise 0. So, the threshold value indicates from which value a neuron "fires". In addition to the threshold values, we also need to define the weight parameters.

So why did we choose these specific weights and threshold values for the Network? Let's have a look what happens in each layer.

Hidden Unit 1

Hidden unit 2, output unit, calculations.

Let's check if we really get the outputs of the XOR-problem with these formulas.

The first case has the inputs x 1 = 0 and x 2 = 0 and the output should be y = 0 .

The first case has the inputs x 1 = 0 and x 2 = 1 and the output should be y = 1 .

The first case has the inputs x 1 = 1 and x 2 = 0 and the output should be y = 1 .

The first case has the inputs x 1 = 1 and x 2 = 1 and the output should be y = 0 .

As you can see, the Neural Network generates the desired outputs.

Keep in mind that the XOR function can't be solved by a simple Perceptron. We need a Neural Network with as least one hidden layer. In this article you saw how such a Neural Network could look like. If you want dive deeper into Deep Learning and Neural Networks, have a look at our Recommendations .

Priyansh Kedia

May 16, 2021

Solving the XOR problem using MLP

In this blog post, we shall cover the basics of what the XOR problem is, and how we can solve it using MLP .

What is XOR?

Exclusive or is a logical operation that outputs true when the inputs differ.

For the XOR gate, the TRUTH table will be as follows

XOR is a classification problem, as it renders binary distinct outputs. If we plot the INPUTS vs OUTPUTS for the XOR gate, it would look something like

The graph plots the two inputs corresponding to their output. Visualizing this plot, we can see that it is impossible to separate the different outputs (1 and 0) using a linear equation.

To separate the two outputs using linear equation(s), we would need to draw two separate lines like

The above graph perfectly shows why these outputs cannot be separated using a single linear equation. This was a major problem with the initial perceptrons (single layer approach).

What is the XOR problem?

As we have seen above, it is impossible to separate the XOR outputs using just a single linear equation. This is a major problem as during the training of machines, for optimized outputs, the machine is expected to form the mathematical equations on its own.

For a problem resembling the outputs of XOR, it was impossible for the machine to set up an equation for good outputs. This is what led to the birth of the concept of hidden layers which are extensively used in Artificial Neural Networks.

Let’s call the output to be Y, so

Y = A1X1 + A2X2 + A3X3 + …. + B

Here B is the bias, and A1, A2, A3 are the weights. Weights are used to control the signal (strength of the connection) of the connection.

Y can also be called the weighted sum.

The information flow inside a perceptron is a feed-forward type, meaning that the signal flows in a single direction from the input layer to the output layer. All the input layers are independent of each other.

The variation in the weight variables controls the process of conversion of the input values to the output values.

The main limitation of a single-layer architecture (perceptrons) is that it separates the data points using a single line. This has a drawback in a problem similar to the XOR problem, as the data points are linearly inseparable.

How is the XOR problem solved?

The solution to the XOR problem lies in multidimensional analysis. We plug in numerous inputs in various layers of interpretation and processing, to generate the optimum outputs.

The inner layers for deeper processing of the inputs are known as hidden layers. The hidden layers are not dependent on any other layers. This architecture is known as Multilayer Perceptron (MLP).

The number of layers in MLP is not fixed and thus can have any number of hidden layers for processing. In the case of MLP, the weights are defined for each hidden layer, which transfers the signal to the next proceeding layer.

Using the MLP approach lets us dive into more than two dimensions, which in turn lets us separate the outputs of XOR using multidimensional equations.

Each hidden unit invokes an activation function, to range down their output values to 0 or 1.

The MLP approach also lies in the class of feed-forward Artificial Neural Network, and thus can only communicate in one direction. MLP solves the XOR problem efficiently by visualizing the data points in multi-dimensions and thus constructing an n-variable equation to fit in the output values.

In this blog, we read about the popular XOR problem and how it is solved by using multi-layered perceptrons. These problems give a sense of understanding of how deep neural networks work to solve complex problems.

XOR problem - homepage

Let's imagine neurons that have attributes as follow: - they are set in one layer - each of them has its own polarity….

More from

Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ 🔵 Follow to join our 28K+ Unique DAILY Readers 🟠

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Priyansh Kedia

Android | AI

Text to speech


  1. Solved: Design Neural Network Xor Problem Data Shown 1 Tru

    how to solve xor problem

  2. Solved Problem 1.0 Part A Implement the following logic

    how to solve xor problem

  3. Demystifying the XOR problem

    how to solve xor problem

  4. Solving XOR problem using 3 conventional neurons as a 2-2-1 MLP network

    how to solve xor problem

  5. How to solve XOR problem with MLP neural network?

    how to solve xor problem

  6. XOR tutorial with TensorFlow · Martin Thoma

    how to solve xor problem


  1. Lucknow Metro Expansion Plans [2019]

  2. Istikhara Ki Dua

  3. Solve an equation using the double angle of sine

  4. 两种方法解决OBS录像时发生未指定错误2 way to solve OBS An unspecified error occurred while recording

  5. Oppo reno 6 5G network setting

  6. Industrial Mixers By Mixer Direct


  1. How Neural Networks Solve the XOR Problem

    Evalutation Training algorithm2d Xor problem. The XOR functionAttempt #1: The Single Layer Perceptron Implementing the Perceptron algorithm

  2. Demystifying the XOR problem

    The solution to this problem is to expand beyond the single-layer architecture by adding an additional layer of units without any direct access

  3. XOR problem with neural networks: An explanation for beginners

    The XOR problem with neural networks can be solved by using Multi-Layer Perceptrons or a neural network architecture with an input layer, hidden

  4. solving xor problem using multi layer perceptron

    The XOr problem is that we need to build a Neural Network (a perceptron in our case) to produce the truth table related to the XOr logical operator.

  5. Solving the XOR problem with a Neural Network

    These tasks can be solved by a simple Perceptron. XOR stands for 'exclusive or'. The output of the XOR function has only a true value if the two

  6. Neural Networks 6: solving XOR with a hidden layer

    6. Implement XOR function using McCulloch–Pitts neuron Soft Computing Machine Learning Mahesh Huddar · 155 - How many hidden layers and neurons

  7. Neural Networks 2 XOR

    Neural Networks 2 XOR. 19K views 1 year ago Week 5: ... 02 Intro to Deep Learning Part 3: Multilayer Perceptron & XOR Problem. Dr Anne Hsu.

  8. XOR Problem, Deep Learning and Artificial Neural Networks 1.2

    Shows how a neural network with one hidden layer can solve the XOR problem (whereas a network without hidden layer cannot)

  9. Understanding Basics of Deep Learning by solving XOR problem

    In our X-OR problem, output is either 0 or 1 for each input sample. So, it is a two class or binary classification problem. We will use binary

  10. Solving the XOR problem using MLP

    The solution to the XOR problem lies in multidimensional analysis. We plug in numerous inputs in various layers of interpretation and processing, to generate