Sunday, December 27, 2020

Common Errors in Machine Learning due to Poor Statistics Knowledge (By V Granville)

This article explains a lot of what we see currently with many fake news based on misinterpreted data, badly used statistics and erroneous conclusions. Sometimes innocent, often with an agenda.

We keep hearing about "science" but science is messy, data is dirty and statistics are complex to understand. Here's some examples. 

Common Errors in Machine Learning due to Poor Statistical Knowledge

Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.)

This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these “bad stats” end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.

Trusting data is another big source of errors. What’s the point of making a 99% accurate model if your data is 20% faulty, or worse, you failed to gather the right kind of data to start with, or the right predictors? Also, models with no sound cross-validations are bound to fail. In Fintech, you can do back-testing to check a model. But it is useless: what you need to do is called walk-forward, a process of testing your model trained on past data split into two sets: most recent data (the control case) and older data (the test case). Walk forward is akin to testing your data on future data that is already in your possession, it is called cross-validation in machine learning lingo. And then, you need to do it right: if the control and test data are too similar, you may end up with overfitting issues.

Trusting the R-squared is another source of potential problems. It depends on your sample size, so you can’t compare results for two sets of different sizes, and it is sensitive to outliers. Google alternatives to R-squared to find a solution. Also using the normal distribution as a panacea leads to many problems when dealing with data that has a different tail or that is not uni-modal or not symmetric. Sometimes a simple transformation, using a logistic map or logarithmic transform will fix the issue.

Even the choice of metrics can have huge consequences and lead to different conclusions based on a same data set. If your conclusions should be the same regardless of whether you use miles or yards, then choose scale-invariant modeling techniques.

Missing data can be handled inappropriately, being replaced by averages computed on available observations, even though better imputation techniques exist.. But what if that data is missing precisely because it behaves differently than your average? Think about surveys or Amazon reviews. Who write reviews and who do not? Of course the two categories of people are very different, and what’s more, the vast majority of people never write reviews: so reviews are based on a tiny, skewed sample of the users. The fix here is to have a few professional reviews blended with those from regular users, and score the users correctly to give the reader a better picture. If you fail to do it, soon enough all readers will know that your reviews are not trustworthy, and you might as well remove all reviews from your website, get rid of the data scientists working on the project, and save a lot of money and improve your business brand.

Much of this is discussed (with fixes) in my recent book Statistics: new foundations, toolbox, and machine learning recipes, available (for free) here.

 

Friday, December 25, 2020

General Myths to avoid in Data Science and Machine Learning

 Simple but clear definitions!

General Myths to avoid in Data Science and Machine Learning

What is Machine Learning, Data Science or Artificial Intelligence? is one of the most common questions which I have faced from people. Be it newcomers, recruiters or even people in leadership positions, this is a question which is puzzling everyone in its own way.

For beginners it takes the form of how do I become a data scientist? For leaders it becomes a question of whether it has an imperative business impact? and for people in the field it takes the form of what I should call myself, a data scientist a data engineer or a data analyst.

This post is an attempt to clear some of the myths and develop a basic understanding around what Data Science is, and its different interpretations in corporate world.

Myth 1: Data Scientist/Engineer/Analyst are one and same.

This is a warped myth which I have faced many times in my career and which basically does harm to both employee and the company. It’s like calling a software engineer and QA the same thing.

To put things in perspective, a Data Scientist is someone who has experience and knowledge in at least 2 of these 3 fields, Statistics, Programming and Machine Learning. Primary expectation of such an employee is to be able to work on a challenging business problem where he/she can use their knowledge to find solutions. Such a person would love to spend a major portion of their work in building predictive models and performing statistical experiments to obtain a working solution. It’s a mixture of a research and a programming job, and the nature and workload differs depending on the size of the company/team.

Data Engineering is a job where a person focuses on building the infrastructure for deploying applications performing jobs like predictive modeling, updating dashboards with streaming data, running daily jobs to generate reports and maintaining continuous flow of data. A really good knowledge of SQL is fast becoming a necessity for a good data engineer followed by knowledge of spark.

Data Analyst is a person with more of a bend towards interpreting and analyzing business results rather than being in the process of their creation. Such a person will prefer to use tools to generate those results and will spend a major portion of their time in interpreting and deriving business value out of them. Data Analysts have been in the industry a long time before data scientists came into picture and the primary tool of there choice has been Excel. In fact , even today for small amount of data excel is most efficient. At present, there are tool like PowerBI, Azure which provide the ability to perform analytics on Big Data. Primary focus however for this position is accurately communicating day to day results as well as results of new hypothesis which they test. These inputs are critical and form a base for important decision making for a business.


Myth 2: Deep Learning is Machine Learning or AI

Deep learning has no doubt become a big name nowadays, and with all the hype and marketing around it, it has also led to people believing that deep learning is an ultimate solution to every data science/machine learning problem. Truth cannot be farther away than this.

Deep learning, no doubt is one of the most complex concepts to understand in today’s scope of machine learning but that is it. Deep learning gets its name since the “neural network” implied in this framework contains multiple layers and is hence called a “deep” network. What is offered via tensorflow, pytorch or keras is just a framework to apply this concept easily.

No doubt, learning the framework is hard and framework is efficient as well but it is not equivalent to gaining expertise in machine learning. Machine learning is a vast field which takes in concepts and algorithms from a number of fields such as statistics, information theory, optimization, information retrieval, neural networks etc. and has an abundance of algorithms each of which are more useful than others in particular use cases.

Deep learning for instance has been extremely efficient in computer vision and speech recognition but it is an absolute overkill to use it in sentiment analysis or a simple prediction problem which can be solved with linear regression.

It is always a wise decision to invest time in exploratory analysis and understanding the scope of a problem before fixing on the algorithm to use for the problem.

This pic explains it the best.


Myth 3: Data Science can be picked up in 3 months.

As much as I wish this to be true, this is not the case. To be an efficient data scientist one needs to know a lot more than just importing the libraries through scikit-learn and tensorflow and calling their train and predict functions.

It is one of those illusive fields where the results are not deterministic, meaning same sequence of steps will not always end in same result. It highly depends on the quality and the quantity of the data provided and there is a lot of stuff which needs to happen before calling the “train” function.

Sure, you can learn how to call libraries and write the sequence of steps to generate a model, but that model will not always be efficient. To understand things properly one needs to have a considerable understanding of working and dependencies of the algorithm which is being applied. It is imperative to have this knowledge, or else tweaking models or explaining the results to leadership becomes a real pain.

I always remember this answer to , how to learn coding in a single night


This is a small attempt to underline and clear the prevalent myths in the field of machine learning and data science. Hope it helps.

Thursday, December 24, 2020

2020 Year in Review (By Dave Collum)

 Superb year end review by  David Collum full of keen perspective and plenty of wit. The downloadable pdf of the full article is available here.

Making sense of the craziest year we’ve yet lived through

Imagine, if you will, a man wakes up from a year-long induced coma—a long hauler of a higher order—to a world gone mad. During his slumber, the President of the United States was impeached for colluding with the Russians using a dossier prepared by his political opponents, themselves colluding with the FBI, intelligence agencies, and the Russians. A pandemic that may have emanated from a Chinese virology laboratory swept the globe killing millions and is still on the loose. A controlled demolition of the global economy forced hundreds of millions into unemployment in a matter of weeks. Metropolitan hotels plummeted to 10% occupancy. The 10% of the global economy corresponding to hospitality and tourism had been smashed on the shoals and was foundering. The Federal Reserve has been buying junk corporate bonds in total desperation. A social movement of monumental proportions swept the US and the world, triggering months of rioting and looting while mayors, frozen in the headlights, were unable to fathom an appropriate response. The rise of neo-Marxism on college campuses and beyond had become palpable. The most contentious election in US history pitted the undeniably polarizing and irascible Donald Trump against the DNC A-Team including a 76-year-old showing early signs of dementia paired with a sassy neo-Marxist grifter with an undetectable moral compass. Many have lost faith in the fairness of the election as challenges hit the courts. Peering through the virus-induced brain fog the man sees CNBC playing on the TV with the scrolling Chiron stating, “S&P up 12% year to date. Nasdaq soars 36%.” The man has entered The Twilight Zone.

Continue reading

Mindset Shift (Corporate management)

 

Mostly true, but I would go one step further: The two need to be integrated!

so:

Purpose without losing sight of profits.

Network with a minimum of hierarchy.

Empowerment with controls embedded in the system

Experimentation in a well structured (planned) environment

and finally Transparency with respect of privacy.


DATA QUALITY MANAGEMENT (YouTube Video - 5mn)


 

A very good introduction to Data Quality Management. 

More professional and less of general interest but still worth listening to if your work is remotely concerned with data.


We're Being Told South Africa's "Scary" Mutant COVID Is Even More Dangerous Than The UK's "Super COVID"

To summarize this article: Covid is a nasty bug but not the end of the world. 

As for the controls and especially control of information, they are far more dangerous and won't go away anytime soon, if ever.

Unfortunately, it goes one step further.  Over hundreds of millions of years, virus have learned one thing alone: to survive. Which is of course what they do exceedingly well. It is said that there are already multiple variants of Covid. The UK strain is one. The South African strain another one. The vaccine may or may not be effective against these two but eventually a newer more lethal Covid virus will emerge. Just a matter of time. Then we will have a real pandemic, just when people, society and the economy will be exhausted fighting the fictive one...

 

Authored by Michael Snyder via The Economic Collapse blog,

A new mutant strain of COVID-19 that has been dubbed “501.V2” has gotten completely out of control in South Africa, and authorities are telling us that it is an even bigger threat than the “Super COVID” that has been causing so much panic in the United Kingdom.  Of course viruses mutate all the time, and so it isn’t a surprise that COVID-19 has been mutating.  But mutations can become a major issue when they fundamentally alter the way that a virus affects humans, and we are being told that “501.V2” is much more transmissible than previous versions of COVID and that even young people are catching it a lot more easily.  That is potentially a huge concern, because up until now young people have not been hit very hard by the COVID pandemic.

The British press is using the word “scary” to describe this new variant, and at this point it has become the overwhelmingly dominant strain in South Africa…

The new mutant, called 501.V2, was announced in Cape Town last Friday and is believed to be a more extreme variant than Britain’s new Covid strain which has plunged millions into miserable Christmas lockdowns.

Cases in South Africa have soared from fewer than 3,000 a day at the start of December to more than 9,500 per day, with the mutant accounting for up to 90 percent of those new infections.

If this same pattern happens elsewhere as this new mutant strain travels around the globe, then “501.V2” could eventually almost entirely replace all of the older versions of COVID.

Authorities are optimistically telling us that the recent vaccines that have been developed will “likely” work against this new variant, but the truth is that they will not know until testing is done.

And if the vaccines don’t work against “501.V2”, we could be back to square one very rapidly.

For now, countries all over the globe are banning flights from South Africa in a desperate attempt to isolate this new version.  The UK, Germany, Switzerland, Turkey and Israel are among the nations that have banned those flights, but so far the United States is not on that list.

So people that are potentially carrying this new version of COVID continue to enter the U.S. on a daily basis.

For the United Kingdom, this flight ban may have come too late because two cases of “501.V2” have already been identified on British soil

Two cases of a new, “more transmissible” COVID-19 variant linked to South Africa have been identified in the UK, the health secretary has said.

Both cases are contacts of people who travelled from South Africa over the last few weeks, Matt Hancock said at a Downing Street news conference.

If the new vaccines are effective against “501.V2”, authorities believe that they already have the long-term answer to this new variant.

But if those vaccines don’t work, this pandemic could be entering a far more deadly new phase.

And of course we are hearing about more problems with these new vaccines on a daily basis.  Thousands of adverse reactions have already been reported to the CDC, and more reports continue to pour in as more people get the shots.  Here is one example from New York City

A health care worker in New York City had a serious adverse reaction to a coronavirus vaccine, officials said on Wednesday.

New York City Health Commissioner David Chokshi said during a news conference that the unidentified worker experienced a “significant allergic reaction” to the vaccine. He added that the worker was treated for the reaction, and is in stable condition and recovering.

We should not be surprised that there are major issues with experimental mRNA vaccines that are based on entirely new technology that were rushed into production without proper testing.

And of course there are tens of millions of Americans that will never take any mRNA vaccine that literally “hijacks your cells” under any circumstances.

On the other hand, most of the U.S. population seems to think that these new vaccines will bring this pandemic to an end, but if they don’t work against new mutant versions of the virus that won’t be true at all.

It is so important to take a balanced view of these things.

Unfortunately, when it comes to COVID most people fall into two camps.

The first camp is totally freaked out because they think that COVID is about the worst thing that could ever happen to the United States and they tend to favor extremely draconian measures to prevent the spread of the virus.

But the truth is that the COVID pandemic pales in comparison to other great pandemics throughout human history.  The Black Plague and the Spanish Flu Pandemic each killed at least 50 million people.  As for the COVID pandemic, the global death toll has not reached the 2 million mark even if the official numbers are accurate.  If a pandemic of this nature is freaking people out so much, what is going to happen when a truly killer plague is unleashed in our society?

The second camp either thinks that the pandemic is greatly exaggerated or that the virus doesn’t even exist at all.  Even though hordes of people are catching the virus all around us, many out there continue to deny the reality of this crisis.

I simply do not understand that.  So many people that I know around the country have gotten the virus, and that includes quite a few big names.  For example, the following is an excerpt from an article in which Daisy Luther shares what her experience with COVID was like

Days 3-5: Over the next three days, chills and fever were almost constant. My joints and muscles hurt. Getting up to go to the bathroom felt like an expedition up a mountain.  I was tired and winded. I had very little appetite and even less of an inclination to cook food so I existed mostly on peanut butter and crackers and leftover soup. I was absolutely exhausted and so cold that I shivered violently when I got out from under my bed piled high with blankets. I had super-weird dreams. My cough worsened, my head hurt, and my throat was still mildly sore.

I drank lots of water and electrolyte beverages. My thirst remained unquenchable regardless of how much I drank. I took vitamins (C, D3) and took Zinc supplements. These are my regular supplements but I doubled that.

Days 6-9: The line to get a test at the local clinic was long and filled with people who were coughing up a lung. There was no way I’d be able to stand in that line for an hour, as sick as I felt. Besides, I figured if I didn’t have Covid, I’d get it standing in the line so I opted not to be tested.

This part made me think of the worst case of the flu I ever had, except intensified by about four times. It was terrible.

I usually let a fever run its course but by Saturday I felt so awful that I gave in and began treating symptoms. My normal temp is in the 96s and my temperature throughout these days stayed between 101-103. I staggered ibuprofen and acetaminophen, and I also used a mild muscle relaxant and my Ventilyn inhaler. The meds didn’t get rid of my fever but reduced the chills to a tolerable level. I slept almost around the clock, waking up for a couple of hours here and there to check on website stuff. Fortunately, I have a wonderful team who kept things running for us. One day blurred into the next and I considered going to the doctor again, but couldn’t muster the energy. I felt like if I just got a little more sleep I’d be okay.

My cough was getting far worse and now my ribs and abdominal muscles hurt. It was a deep painful cough that caused me to clutch my chest every single time inhaled deeply.

So to summarize, yes the COVID pandemic is real, but it is not the end of the world.

More people are going to get sick, and some will suffer intensely, but the vast majority of those that get the virus will survive.

If you want to wear a mask, then wear a mask.

If you don’t want to wear a mask, then don’t wear a mask.

We should be free to make our own choices, and we should also be free to experience the consequences for those choices.

Unfortunately, there are way too many people out there that think that they have the right to censor and control what we say and what we do, and that trend is likely to only get worse as our society continues to spin out of control in the years ahead.

 

Merry Christmas!


 Times of changes are also times of opportunities...

Merry Christmas!


Tuesday, December 22, 2020

The definition of Herd Immunity according to the WHO. Strait from 1984!

 


The first definition is from Wikipedia. The second is from the WHO.

How is it even possible that doctors at the WHO would write something like this?


Johns Hopkins Newspaper Removes Study Examining COVID Death Rate

This cancelled article confirms what we know and the feedback we get from most countries: No or marginal increase of the death rate overall and nothing significant for older people. 2020 will just end up being an ordinary year...

This does not change the fact that more people than usual have ended up in the emergency wards of local hospitals all over the world but it confirms that beyond a very small sub set of people, for everybody else, Corona is nothing but a bad case of the flu. And that consequently ALL the confinements, lock-downs and other extreme measures, have been over reactions by overwhelmed and incompetent governments. 

unfortunately, the economic impact of all this is very likely to strike the economy like a tsunami in 2021. In this respect, the UK with it's early taste of Brexit may become the canary in the coal mine. Let's see...  

 

Authored by Benjamin Zeisloft via Campus Reform,

Johns Hopkins University’s student newspaper, the News-Letter, reported on a university presentation stating that COVID-19 “had no effect on the percentage of deaths of older people” and that the virus “has also not increased the total number of deaths” in comparison to historical data. However, the paper later removed the article, stating that it had been used to support “dangerous inaccuracies” on social media.

Assistant Director for the university’s Applied Economics program Genevieve Briand critically analyzed the net effect of COVID-19 on deaths in the United States based on historical data. Using information from the Center for Disease Control and Prevention, Briand identified the percentages of total deaths per age category both before and after the pandemic began.

“Surprisingly, the deaths of older people stayed the same before and after COVID-19,” said the News-Letter’s article.

“Since COVID-19 mainly affects the elderly, experts expected an increase in the percentage of deaths in older age groups. However, this increase is not seen from the CDC data. In fact, the percentages of deaths among all age groups remain relatively the same.”

Though deaths in categories like respiratory illnesses and heart disease seasonally rise and fall together in the United States, Briand noticed a strange trend.

“Instead of the expected drastic increase across all causes, there was a significant decrease in deaths due to heart disease,” in addition to “all other causes.” Additionally, “the total decrease in deaths by other causes almost exactly equals the increase in deaths by COVID-19.”

“All of this points to no evidence that COVID-19 created any excess deaths. Total death numbers are not above normal death numbers. We found no evidence to the contrary,” Briand concluded in her presentation. She told the News-Letter that “a decreased number of heart attacks and all the other death causes doesn’t give us a choice but to point to some misclassification.”

The News-Letter removed the article in late November after staff discovered its coverage of the study was "used to support dangerous inaccuracies that minimize the impact of the pandemic.” The paper linked a PDF of the original article with a watermark stating “Retracted by the News-Letter” to a statement explaining the decision.

The News-Letter explained that it made the decision independently and encouraged readers to take the article “in context with the countless other data published by Hopkins, the World Health Organization and the Centers for Disease Control and Prevention (CDC).”

The staff of the News-Letter referred Campus Reform to an op-ed written December 3 by the paper’s editorial board, which said that “the article should not have been deleted in the first place.”

“Instead of temporarily removing it from our website, the News-Letter should have immediately retracted and provided a detailed explanation of the inaccuracies in Briand’s research,” explained the editors.

“We did not intend to silence Briand; instead, we sought to put her claims in conversation with findings from Hopkins, the World Health Organization and the CDC.”

Briand told Campus Reform that the News-Letter’s decision to remove the article was its own, and pointed out that she explained "during the presentation where I found and downloaded the data from, so anyone can easily replicate my analysis.”     

PS: So not only people have stopped dying from "other causes", but they don't even bother to catch the flu in 2020! Go figure!


Influenza reported by Public Labs: 

2018- 45,881 

2019- 46,974 

2020- 106 

 Influenza reported by clinical labs: 

2018- 183,483 

2019- 250,494 

2020- 496

Sunday, December 20, 2020

The Covidians - The Purpose driven Life

 


 Another insightful article to understand the disease affecting our society. Most likely, the Covidians are here to stay! 

The Purpose Driven Life

Guest Post by The Zman

An underappreciated aspect of human society is that humans in large communities need a set of common beliefs. Large society in this context means anything larger than the Dunbar number. Once you get to that number, law codes and a way to enforce them are a necessity. One tool to do this is a common set of beliefs held by most of the members. This provides a mechanism for those rules to become a habit of mind, a shared reality we would think of as culture.

A simple example of this is a taboo against an activity. Let’s say a poisonous berry grows in a part of the forest. The people may develop a taboo about eating anything or even entering that part of the forest. Perhaps they evolve a legend about one of their gods cursing the place or prohibiting people from entering it. Once enough people believe this, it becomes part of the law code of society and it is enforced by collective action against those who break the taboo about entering that area.

It is a crude example, but we have many such examples in the modern day that are not much different from that crude example. For example, Americans remain convinced that eating fat will anger the health gods. There is no science behind the Standard American Diet, but most Americans accept it as true. They don’t think about it. It is just part of the shared beliefs they grew up with and live within, so they accept it as true. We accept all sorts of behavior and environmental nonsense on faith.

Now, it should be noted that the shared beliefs do not have to be accepted by everyone in a society for them to work. Going back to our sacred bit of the forest example, it is not necessary that everyone accept the myth. It just has to be a critical mass that accept it and is willing to act on it. Unless there is an equally compelling reason to resist the myth and the reason is held by a sufficient number of people, a minority belief will become the majority belief in practice, if not spirit.

It is not too hard to find examples of this form of minoritarianism. American history is riddled with causes that were championed by a minority. The proliferating set of taboos related to the Covid beliefs held by the ruling class are a great example. A tiny number of people, relative to the population, deeply believes in these Covid taboos, so they have launched a crusade against Covid. America looks like a weird Muslim country now, because lots of people joined this new belief.

That is another aspect of shared belief. It appears that another prerequisite for large human society is a sense of shared purpose. The answer to why we are here and what is the purpose of our lives is answered within that shared set of beliefs. “My life has meaning because I am part of this great cause of my people” is a highly efficient way of enforcing group behavior. The best rules are those eagerly enforced by a set of true believers whose purpose in life is to enforce compliance.

Move backward through popular politics in America and you see one holy crusade after another driving the political debate. Today it is driven by Covidians. Before that it was driven by white liberals thinking Obama was Jesus. Before that it was driven by the war on terrorism. When the Baby Boomers had kids in school the crusade was to fix the schools so everyone could be educated. Go back further and we had a war on drugs and, of course, the great crusade against the evils of communism.

American history has been one crusade after another. The great battle between good whites and bad whites exists because it fills that need for a purpose. In the albescence of some external foe, the good whites keep their crusading skills sharp by going to war with the bad whites over some moral cause. This not only gives purpose to their lives, but it also reinforces those shared beliefs about who they are and why they exist. The reaction from the bad whites serves much the same purpose.

If you observe the Covidians for a bit, you see this need for purpose. The HBD community, for example, rushed to be the early adopters of the new faith, sensing it was a way back into the community of the respectable. It was not a conscious decision on their part. They did not vote on their secret e-mail list to become Covidians. It was the result of a shared desire to be rehabilitated and restored to polite society, along with the general sense that their purpose in life is to inform the rulers on the human sciences.

The same can be said for the women volunteering to be the mask police in every community in America. The natural role of women has been so reduced in status that this artificial role of Covidian den mother has become highly appealing. Mothers have been far less emotional about things like mask wearing and social distancing than the army of unattached, childless females. The reason is they have kids to scratch that maternal itch, so being a Covidian den mother has little appeal.

The fact is, human societies need shared beliefs. One aspect of that shared belief is a shared purpose. Most people, certainly not all, need that space labeled “purpose” to be filled by society. This is especially true of childless females. It is also true for males with no male role to play. Stripped of natural sex roles, this need for purpose is filled with causes that give purpose to life. The people wearing masks in their cars are not sheep following orders. They are believers looking for a god.

BOMBSHELL! Putin Tells NATO Prepare for War as Top General Slain, Turkey INVADES Syria by Ben Norton (Video - 2h24)

   This interview of Ben Norton is quite a broad and knowledgeable analysis of the whole world situation right now. Quite long but very info...