Fantasy Versus Reality

The other day, I ended up in a dispute with my partner over investment reporting. We have some money invested and when we receive the reports periodically, they may or may not provide information to us about what percent increase (or decrease) has occurred since our initial investment. Our dispute was related to both the availability of such information, as well as its accuracy. She believes the information should always be provided by the agent, and that that information is simple and reliable when presented. I believe the agent ought to offer it, but I recognize the complexity of such information and so I prefer to figure it out for myself.

It is not my intention in this post to go into the finer details and mathematics of calculating this sort of information. What I would like to focus on is the nature of what information the agent would provide. In particular, is such information real (and accurate and reliable) or merely fantasy (as in speculative and largely biased).

When I was younger, I heard someone say, “80 percent of all statistics are made up.” If you didn’t catch it immediately, this is a joke. The idea is that the statement itself is “made up,” and as such, the statistic it is purporting is also “made up.” The statement, ultimately, is entirely useless as it does not actually tell us anything useful. It is merely a joke.

However, there is some truth in this joke. Statistics is the area of mathematics concerned with taking data and analyzing that data to formulate potentially useful conclusions about it. In other words, one takes a large (often very large) pile of information (such as numbers), and they run through the data looking for various common things or different things. One can, for example, find the average of a group of numbers, which will tell them (very approximately) a sort of midpoint in the data set. Other popular midpoint finders include median and mode.

Here is a simple example:

Data Set: 5, 4, 3, 7, 6

Average: 5

In this case, 5 is clearly and easily the midpoint. All the numbers are relatively close in magnitude to 5 (within 2 in the most extreme case). Thus, the average seems to provide something useful in description of the data.

The reason there are many different methods utilized to find the midpoint is that depending on the nature of the data set itself, weird things can happen in the analysis. If within the set of numbers, there is one number that is significantly different, then the average may be pulled far in some direction, providing strange results. Here is another example:

Data Set: 5, 4, 3, 7, 6, 125

Average: 25

In this case, 25 is much less useful as the midpoint. Most of the data is hovering around 5, as demonstrated in the previous example. The single outlier has taken the average and pulled it violently away. The number 25 isn’t very helpful in describing the data anymore, though the result itself is technically accurately describing the average of the data.

Again, it is not my intent to dive into extensive mathematical proofs. But I hope that the simple examples make my point clear. It doesn’t take much to significantly change the results of a data analysis and provide vastly different results. All I did above was add a single new number to the data, and the average changed drastically.

This also leads to the main problem with statistics that most don’t think to consider: why did I choose to use average as my preferred method of analysis, as opposed to median, mode, or something else entirely? As the one performing the analysis, I necessarily have to select my tools and methods to perform my analysis. Which tools I decide to use affect the results, as does what part of the data I decide to utilize.

Selection of what part or parts of the data I will use is also a significant factor to consider. In the second example, clearly the value 125 is very unlike the other values and is having a significant affect on my result. I could simply remove the outlier, claiming it is an outlier and not representative of the rest of the data and then proceed with my analysis (which will result in it appearing the same as the first example). This sort of decision is not uncommon in statistics or science.

In both cases, the decision regarding which tools I utilize and the decision regarding which data I include, I have fulfilled the requirements of statistical analysis. I may be asked to provide good reasons for my choices, but the making of those choices is mine to make. Furthermore, this also places the responsibility upon others to question my choices. If no one questions or challenges my choices, then my results will stand very nicely.

In the argument with my partner, my point was that if the agent will provide us with a rating of the interest our investments accumulated, I would ask for details regarding how that number was attained. Unfortunately, this is not usually made very clear by agents. Often, when I have raised this question, I get pages of statistical analysis that by itself is challenging and time consuming to sort through. I sometimes wonder if they are simply trying to confuse me with large information, in the same way as one confuses by using big words when they talk. Makes them sound more intelligent than they may actually be.

I would also relate this to my anxiety when I observe companies “graciously” offering to shop around on my behalf, ensuring that I get the “lowest price” on an item. Why do those companies compare against the specific other companies that they choose to compare themselves against? Like a commercial which says their product beats the leading brand, and then you see in the fine print that the “leading brand” is simply their own lesser product. By making crafty choices, the companies are rigging the game in their own favor. As a crafty consumer, it is up to me to raise the questions back to them to tease out something of the truth.

Which brings me to the point I was wanting to raise at the beginning. Statistical analysis is a form of fiction. It looks a lot like the truth, but certainly bears some difference. How much difference is highly dependent on the choices made by those performing the analysis. The choices themselves are not objective, they are subjective, forming the foundation of the fiction being generated. They are a form of fantasy.

But most fantasy does have some relationship with the real. The centaur is a mythical creature based on the ideas of a horse and a man merged. Horses and men are real things. In the same way, the results of statistical analysis is a fantasy based on a real thing as well (based of the very real data that has been analyzed). It can sometimes be difficult to remember this fact.

This too, I think, is the source of many simulacra. Science and statistics both provide innumerable examples of these sorts of fictions, which become the basis of other fictions, and so on. If it is forgotten the original source of these things, then they simply become symbols of symbols of symbols…

It is certainly unreasonable for any person to keep track of every single fact in existence. I have to depend on the amalgamated “facts” that come from science and statistics, and other places. I myself have not performed the calculations required to predict the weather, but I still listen to the weatherperson, and I still do plan my day around what they say. In that way, I am adopting a fiction into my list of “facts.” I am accepting a fantasy as part of my reality.

But I try to always remember where my data is coming from. To acknowledge and appreciate that there are likely errors (sometimes significant ones) in my “facts.” To be wary that sometimes those errors have been placed there intentionally by various parties with a vested interest in affecting my choices and decisions. To always be aware that my world is heavily mediated, and that almost everything I know is, in truth, simply a variation of fantasy. As Immanuel Kant suggested in his Critique of Pure Reason, I have no direct connection to the real world.