"The answers you get depend upon the questions you ask." - Thomas Kuhn

Some problems are hard because of the immense resources they demand to be resolved. These problems, for a lack of a better phrase, can be thought of as economically hard. Given access to sufficient resources, such problems can be theoretically overcome. A great example is guessing passwords. Given access to sufficient compute power, brute-force methods can be used to crack these problems wide open.

**Communication on the other hand can be categorized as being a fundamentally hard problem. Such problems are resource agnostic. Their solutions require something more. Thomas Kuhn referred to these as paradigm shifts: fundamentally different ways of approaching a problem that reveal insights previously unseen. Achieving these insights involves asking the right questions. And it was one such question that sparked the Information age.

Origins

Communication itself is nothing new. From gestures to languages, humans have developed increasingly complex methods of interacting with each other. But what makes communication different from other modes of interaction is the exchange of information. The ability to transfer information from one individual to another has served as the cornerstone of developing complex commuities and societies.

George Boole, through his work on the algebra of sets, showed that notions of truth and falsehood could be mathematically encapsulated to form concrete reasoning, laying the foundation for propositional logic. This was followed by Kurt Gödel who succesfully demonstrated that seemingly abstract qualitative thoughts could in fact be expressed within the bounds of formal systems.

As methods of expressing thought evolved, so did ways of communicating it. Morse code provided the first revolution of communication, allowing people to leverage electric signals to express information through codified signals. While effective, the method but had a severe bottleneck when it came to the type of data it could transmit. In particular, the encoding and decoding processes were entirely manual, thus limiting the scope of information that could be sent through.

That's when Claude Shannon came into the picture. After publishing what's been called the greatest Master's thesis of the 20th century at the age of 21, Shannon began thinking about the following question: how can a message selected at one point be reproduced exactly at another point? His quest towards answering this led to the birth of Information Theory and the world we live in today.

Developing the intuition

Shannon's idea was simple. He had to find a way to quantify the notion of information. But how does one go about doing this? How does one assign a value to information? Consider the following two sentences:

$X$: Bell peppers are most commonly found in three colors: red, green, and yellow.

$Y$: My dog is hungry.

Let's define the information present in an object $X$ as $I(X)$. One heuristic we could use to quantify the contained information could be size. The longer a sentence, the more information it's likely to carry. If we define information as a monotonically increasing function of size, then since $\|X\| > \|Y\|$, it follows that $I(X) > I(Y)$.

But there's a problem with the above method that's actually shown in the example itself. Consider the following thought experiment. Pick a random person and ask them to read both of the above statements. Odds are they already know about the different colors of bell peppers. But what are the chances that they already know about you having a dog? Isn't $Y$ a more meaningful statement to them since it's improving their knowledge in some capacity?

This concept forms the basis of Shannon's reformulation of the notion of information. Instead of viewing it as being a measure of size or some type of content specific heuristic, he defined information as the resolution of uncertainty. What's fascinating is that while revolutionary, this insight is also one of the most natural ways of thinking about information: if it improves your knowledge, it's probably worth more than something that doesn't. This provides a smooth segue into quantifying information by looking at how much it improves certainty.