Concepts I learned in 2024

Information

Why is it a logarithm???

Why should two punched cards have twice the information carrying capacity of one?

Does it being a logarithm ever matter? What does it mean for our intuition? The number of bits does not equal the number of things it is possible to store, for example.

What is channel capacity?

Mutual information

What is mutual information?

When do linear correlations make sense? When do they not make sense?

First, one must have the notion of information - in this case, the preambles are

some set of possibilities
a notion of the frequency with which one observes members of that set

Mutual information is supposed to tell you how much information one variable has about another variable - so, for example, if one variable always equals the value of the other, the mutual information should be (1)? If the two variables are randomly related, the mutual information value should be (0). Are there other constraints? Maybe that it’s monotonic - the more one variable helps tell you about the other, the higher the mutual information? But what does it mean for one variable to tell you about another? Does it mean something about how correlated their outputs are? Do they need to both be the same kind of variable - continuous or discrete? Does mutual information imply that you are guessing two variables are linearly correlated? What does mutual information tell you - if one variable is high, the other one is likely to be high also? What about inverse relationships - it seems like it should predict those as well. Isn’t it the case that for a given function, mapping one to the other, you could know one variable from the other? Maybe it’s how complex that function is? If two variables are equivalent (each is the same as the other at the same value), is that ‘more mutual information’, vs two variables which are perfectly linearly correlated? It seems like both should be fine, but maybe if there is a nonlinear (but perfectly predictive) relationship, the mutual information should be less? For some transfer function, some set of inputs can perfectly predict some set of outputs. So maybe it’s like - if you were just given a bunch of inputs, how much could you just be like ‘here are the outputs’ vs going and having to learn a really complicated transfer function. Could the complexity of the transfer function (and what does complexity mean - how nonlinear it is?) be related to how (little) mutual information there is between the two variables? Why is the function having more wiggles bad? Well, how much information do you need to specify for a function? I guess it would be nice to just say ‘just go in this direction’ - if it wiggles a lot you have to keep track of when. But, you could just specify that as the number of degree-1 factors in a polynomial that represents it - or, actually that might just be the zeros, so you’d do the number of coefficients of the polynomial that represents that function. An (ordered) list of numbers - is the information required to encode that the relevant information? How does the information required to specify numbers in a set differ from the information required to specify numbers in a list? You could say, how much do I need to write down on a piece of paper to specify this function. But I guess - isn’t that a function of the complexity of our tools (for example, this comes back to GR looking beautiful and simple now, but complex then), and also how much the person who is looking at what you’ve written down knows? Is math the way to transfer information, because it is those patterns which seem to have some structure in the universe, to whatever degree we can mean that, independent of our interpretation? I guess, reading colah’s blogpost on visual information theory, http://colah.github.io/posts/2015-09-Visual-Information/ you could also say mutual information is how much more costly it is to use the other variable’s code for something?

Definitions:

How much one variable tells you about the other (this is mutual information, and I now realize it is symmetric, which I am confused by)
How much more costly it is to use the other variable’s code for something (this is actually cross-entropy, and is asymmetric)

I still don’t understand what information is

Preambles

Information

Correlation

Correlation is one number, which tells you (how much the dots you’re looking at are explained by a linear relationship?).

Information Geometry (haven’t learned this yet, just interested)

Information geometry sounds kind of cool! It seems like studying it requires differential geometry. It represents families of distributions as differentiable manifolds.

Preambles

Differential geometry

Quantum Field Theory

I wish, but here are some books that might be cool to check out!

https://en.wikipedia.org/wiki/Quantum_Field_Theory_in_a_Nutshell

https://www.amazon.com/Quantum-Field-Theory-Gifted-Amateur/dp/019969933X

Special Relativity

Quantum Mechanics

Operator
Decoherence
Entanglement

Linear Algebra

Eigenvalue
Derivative Operator

Fourier Transforms

Additive vs multiplicative noise