Monday, February 3, 2020

Is the Wuhan Corona Virus a bio weapon?



As reported earlier, the feedback we are receiving from China is very murky and difficult to interpret.

What we know so far is that the Wuhan virus also called nCoV, it's scientific name, is very contagious although it's real R0 (R naugh) is still difficult to calculate. Most estimates currently are between 3 and 4 which would be relatively high compared to other virus. (see below)

In reality, these numbers are statistical numbers with little actual meaning related to the future since they can vary wildly over time. A contagion may start with a high R0 to see this number later dwindle as measures are taken to isolate sick persons and contain the outbreak. More important to the global spread of the virus is how it propagates between people. A long lasting virus like HIV which propagates through sexual transmission will see a slow be steady propagation ending up with a relatively high R0 whereas the flu with a lower R0 will spread much faster through the air in spite of its low rate of transmission. The worst virus known is the measles virus with a maximum R0 of 18 which can linger in the air for over 2 hours. In between, you find Zika which transmits through mosquitoes and nCoV which transmits through the air like the flu.

The second important factor is the death rate. As of today, Tuesday, February 4th the death toll is 425 but this number is highly unreliable. Although the Chinese authorities have done some efforts to become more transparent lately, initially the dead were not reported and it is still likely that many deaths are voluntarily or not misdiagnosed as other causes such as pneumonia. (In many cases it may just be because there is no diagnostic kit available!)

If we compare the fatality rate of the Corona virus to other virus, it looks relatively low at about 2%. It is still low if you double this number which may be a good estimate. (But in that case, we can also double the number of cases and end up with a similar 2% fatality rate.)
 
This has been used as an argument against the virus being a bio weapon since such a weapon would by definition have a high fatality rate. But in reality, the fatality rate is just one factor among many. Another is the number of serious cases necessitating intensive care which at over 25% is very high.

The most damning accusation to date that the virus may be a weapon came from a team based in India which asserted that studying the genome of the virus they found 4 insertions which had similarities to the genome of HIV. In this respect, it is important to note that the research was not peer reviewed and has since been retracted. Other scientists have noted that these codons are not specific to HIV and can be found in many virus including other Corona Virus so not a proof as such.

This said, it is widely reported that patients with Corona Virus respond well to HIV treatments... Again, not conclusive but certainly an interesting factor to take into consideration.

More interestingly, running the Blast statistical software on the genome of nCoV shows that 89% of this genome comes from the NG strand of SARS with some nucleotides from the KY strand which code the S-protein. The S-protein being the part of the genome which codes what binds the virus to human tissues in the lung. Reason why is appears as pneumonia and pass on so easily from people to people.
(source)
 
All this remains to be proven.

In the scientific literature, it is obvious that conclusions are closely linked to the genetic models as well as the assumptions you are using and that no consensus has yet been found on the origin of the virus beyond the fact that it is originally, probably, a bat virus which has mutated significantly.

So the original question can now be rephrased:

Knowing that the outbreak happened in Wuhan, the only city with a P4 level laboratory in China, that the virus has characteristics from different strands of the Corona Virus which makes it especially virulent and contagious, and that the virus seems to be able to propagate during the incubation phase when people show no signs of being sick, a relatively rare characteristics, could this virus be a bio weapon?

Virus do evolve naturally. This is why, more or less every year we have a new strand of the flu. They are also known in some instance to exchange part of their genome with other virus.  To answer more precisely the question, we now need to understand better the genome of nCoV. (The fact that the genome has been decoded does not mean that we understand the details of its origin and evolution yet.)

Many teams around the world are working on this subject so the answer should be clear within a month.

Still, although the damning evidences may be circumstantial, it is clear that China has spent a month from mid December to mid January trying to silence doctors who were warning us about the outbreak. Then when the scale of the pandemic became clear, the country took extreme measures by isolating a whole city, then a whole province while reassuring people that everything was under control and not as bad as anecdotal reporting suggested. Sure enough, it was worse!

The suppression of information and obfuscation is in the DNA of the Chinese government. This alone does not prove fool-play. Still altogether, all these factors are difficult to explain as pure coincidence. Statistics allow unlikely events to occur but beyond a threshold that the Corona Virus seems suspiciously close to, a human helping hand becomes the simpler explanation. So although it cannot be said yet that the nCoV virus is a bio weapon, the assertion cannot be dismissed out of hand so easily either. The possibility that Chinese scientists where indeed working on dangerous pathogens with the aim to eventually transform them into weapon as some like Dr Francis Boyle have accused them of doing although difficult to prove is real.   

Thursday, January 30, 2020

Corona Virus: No science can be based on lies and incorrect information!



How many sick people are there in China?

How deadly is the Wuhan Corona Virus?

Somehow it looks more and more like nobody can answer these questions precisely. The number of sick people, close to 10,000 now does not square with what little we hear as feedback from overwhelmed hospitals in Wuhan nor with the extreme measures taken by the Chinese authorities. Likewise, the number of death not reported because they are attributed to other causes or because bodies are sent directly to crematoriums are allegedly much higher than official figures.

This matters a lot!

In politics, it is almost a duty to obfuscate and often lies through your teeth. Then lie again until people start believing that 2+2 equal 3. But not in science. If you base your assumptions on the wrong numbers, you end up with the wrong conclusions and recommendations.

Somehow, this seems to be what is happening with the current epidemic.

Is it just a "normal" flue virus albeit without any vaccine recourse which would mean that most of what we do is overshooting or is it something else which needs to be stopped by all means? The answer to this question depends on two factors, the R0 (propensity to infect other people) and the sick to death ratio.

As always with statistics, if the input data is inaccurate, the output or conclusion will be faulty.  Again, this may well be the case here. The R0 is definitively high and the death ratio seems to be much higher than announced. But, if that is the case, then what?

A country of 1.4 billion people cannot be completely isolated. And in any case, even if it could, it would already be too late. The extreme measures also seems to be counter productive. It looks like 4 to 5 million people did actually escape from the city just before it was closed off. Who can blame them? They did the most logical thing: Run before the storm!

We are less than a month into this pandemic and it already looks out of control, with secondary cases in Japan and maybe in Thailand too. As well as all over China.

The only area in the world from which we have heard nothing yet is Africa. This is odd, because there are close to two million Chinese people living there. It is very unlikely that the virus will avoid the continent. Africa with it's hot climate may not be ideal for a flue-like virus but conversely the continent could very well become a breeding ground for the virus.

Whatever route the virus takes to invade the geographic realm, it has already invaded the economic one. The risk there is not that airlines or hotels will suffer, but that the whole supply chain will be interrupted and will have to permanently relocate out of China. If this happens, then the recession which was already in the cards will become unavoidable and arrive well before the 2020 American elections in November. 

Then what? As with all good black swans, the consequences of this one are impossible to predict. We are still not sure yet that this one is for real, but it looks much more menacing than it did a week ago. It is likely that we will know if the curve is exponential next week, or maybe not if the numbers are not reliable. But by then we will be running behind a virus which will jump by leaps and bonds across borders and incur growing economic pain.

I find it interesting to note that the last big recession started in Wall Street in 1929 (Technically it started a little earlier in Vienna but this is a detail.) but that the US was also the first country to come out of it. Could the pattern repeat this time with China?

Tuesday, January 28, 2020

An interesting distinction between AI and Human intelligence




Boeing, long the symbol of America, now the epitome of what's wrong with corporations in the US?


As Boeing is launching it's new 777x, it is important to understand that the Boeing "MAX" fiasco is not a bug but a fixture of the company and our changed society!

Evolution is a slow process, mostly invisible to our senses, except when it speeds up and becomes catastrophic. We now understand that this is how it really works, more than the slow adaptation that Darwin discovered, although it still plays a role, it is sudden events breaking down the status quo, which spur bursts of accelerated transformation to adapt as fast as possible to a changed environment that really define evolution. Most often the result is the elimination of the not so fit, the not so flexible, and sometimes just the plain unlucky.

The laws of evolution apply likewise to our societies and civilizations. They rise with new resources, new technologies and stable climates to later fall with wars and pestilences while rigidities accrued over the years play a role in their demise. And so it goes with corporations and their endless cycle of creation and destruction, just faster, which makes the process easier to understand, and sometimes act upon.

Boeing is to the airline sector what Ford is to the car industry and General Electric to manufacturing, not just the symbol of the rise of corporate America in the 20th century but the full display of birth, rise to dominance and decay as slowly financial engineering started replacing manufacturing excellence to finally explode in a fireworks of dismaying products while maintaining a sky-high market capitalization.


The 737 "MAX" may have been built by "clowns supervised by monkeys" but it is still a well built, competitive airline which would not have had any problems if the company did not try to sell it for what it is not: a 737! For this is where everything went wrong.

When Boeing launched the 737 in 1967, they had only two larger body planes in the air, the 707 and the 727, and consequently, the plane quickly found not just a niche, but created the short haul mass air transport market to become Boeing best seller plane over the next 40 years with over 10,000 jets produced. The first plane to reach this milestone.

The plane was followed by the hugely successful super jumbo-jet 747, the 757, the 767, then the equally successful 777 and finally the 787. And then nothing.

In between, the airline industry had changed tremendously. With the birth of Airbus and the arrival of cheap mass transportation, pressure on prices and deliveries had grown beyond the ability of the company to respond. But it is the 2008 financial crisis which put a nail on new projects as by then all "new" planes were just redesigned older models, the 737x (Future MAX), the 747x which failed and finally the 777x just being launched.

The advantages were obvious, faster certification, faster deliveries, no need to train pilots for a new configuration, and cheaper "new" planes...

At least this was true on paper. In reality, the planes were mostly new with major changes including for the 737Max the position of the much bigger engines which necessitated a major software adjustment called MCAS to help the pilot stabilize the plane. This should have been dealt with seriously including added training for the pilots to master this new feature. Cost savings said otherwise and the software was proactively hidden to avoid expensive hours of simulator and accelerate the transition from the old to the new models.  

This strategy was successful at first. As sales grew arithmetically from $ 68 billion in 2009 to $ 101 billion in 2018, net income was multiplied by 10, and so was the share price of the company (see below). Although this was more a miracle of financial engineering than of airplane manufacturing as the debt of the company also exploded in the meantime, with over $ 48 billion  of share buybacks during the period, with total liabilities at $ 136 billion finally exceeding total assets at $ 132 billion now.


In retrospect, it is obvious that the company's management found it more efficient to prop up the company's shares by directly buying those shares than by developing new planes. Boeing ended up with planes which were half new, masquerading as older models to lower the cost of acquisition while not fully implementing redesign ideas which would have solved technical issues while costing more to develop.

The result, with liabilities exceeding total assets, the need for another $ 10 billion of emergency funding just to paper over the current fiasco and no new planes on the drawing board, is that technically Boeing is bankrupt. (The 747x has failed and it looks more and more likely that nobody will accept the 777x as an old 777 which means another 2 or 3 years of certification for a plane which is only an upgrade of the older 777!)

Because Boeing is so important to the American economy, including its military arm, it is very unlikely the the government will let the company go under.  Still, putting the giant corporation back on its feet will be complex and probably extremely expensive. The recent failure, this past December, of its Starliner spacecraft to reach the space station only illustrates the challenge ahead.

Software and financial engineering are important aspects of modern management but they are not palliatives to engineering and product development. Boeing employees will have time to think about it while traveling on Airbus planes which are expected to take a growing share of the market in the coming years.  

Saturday, January 25, 2020

The statistics behind the Wuhan corona virus


The Wuhan Corona virus is spreading fast with the Chinese new year in full gear but is it time to panic?

In the short term and on the epidemiological front, probably not. But in the longer term and on the economic front, it might well be the straw that breaks the Chinese camel back.

Let's have a quick look at where we stand today, Sunday 26, January 2020 and at where statistics are telling us we may be in a few weeks.


The official number of sick people now close to 2,000 (up from 1,497 earlier this morning) is still relatively low. But this number does not fit with the news coming from China of overwhelmed hospitals. Nor with the more or less complete lock down, as of today, of the city of Wuhan (11 million people) and the severe travel restrictions in the province of Hubei concerning over 56 million people.

The country is facing a "grave situation" Mr Xi told senior officials, according to state television yesterday. With the city of Wuhan building two new hospitals over the coming 10 days specifically dedicated to the pandemic, clearly, the conditions in China are worse than it looks.

Chinese authorities have promised to be transparent, but the precedents are not very good. In 2003, the SARS epidemic was not recognized until very late and then mostly the information was suppressed until the virus petered out in early July of that year.

This time is different in many respects, but mostly for the worst.

First, the good news.

The Wuhan Corona virus has a relatively low R0 or R naught coefficient, currently estimated at 2.5. This coefficient is very important. It indicates the number of healthy people a sick one will infect while being contagious. If the coefficient is above "1", the virus spreads. 

For reference, these are the R0 factors of other diseases. The Wuhan Corona virus compares favorably.


Likewise, this R0 factors is not fixed. The 2003 SARS epidemic started out with a R0 of about 3 but ended at 0.4 when limitations were enforced.

But the Wuhan Corona virus has other characteristics which makes the situation much more critical. There is no antivirus and the lap between the moment a person becomes infectious and the first symptoms seems to be around a week. This would explain why the Chinese authorities were slow to react but also why the virus may already be more broadly spread out than assumed. Some alarmist estimates say that there may already be over 10,000 cases in Wuhan alone. This sounds extreme but it could be close to the truth and the reason why Chinese authorities are in panic mode.

So where do we go from here?

If you prefer to panic, the best article is from Dr. Eric Feigl-Ding

Dr. Eric Feigl-Ding" I’ll be honest - as an epidemiologist, I’m really deeply worried about this new coronavirus outbreak. 1) the virus has an upward infection trajectory curve much steeper than SARS. 2) it can be transmitted person to person before symptoms appear — I.e. it is silently contagious!"

And he goes on with a long series of tweets which are worth reading since they resume the worst case scenario. (Below, after the article)

But all these alarmist tweets are based on a study from a British Doctor which can be summarized with the following chart showing an explosion of cases over the coming weeks:

Nevertheless, these trends are based on assumptions which may prove to be incorrect and which seems to be based mostly on air travel.

To this, Dr. Stephen Goldstein answered that: "It’s one estimate, with a sketchily narrow CI that the authors have already revised down. Other estimates are lower. This is not 1918, you know that, stop trying to scare people and log off please. Thanks" 

This answer is probably correct. This is clearly not 1918. Nevertheless, now that the opportunity to stop the virus during the initial outbreak was missed, it will clearly be far more difficult and expensive to limit the economic damages in the longer term.

Let's suppose that China does all the right things and that the virus outbreak follows the SARS pattern and goes from 2.4 to 0.4 over the coming 6 months. We will still end up with around 100,000 sick people (which is not a very high number compared to the flu on any given year) and probably 3 to 4,000 casualties which again is a very low number. (Based on the table bellow)


But the economic consequences of the disease on the already slowing down Chinese economy may well be far less mild. Here's an example of the complete blockade of the city of Wuhan as of this morning! (Trains, planes and highways are already closed.)



Beyond the human tragedy of a large city without food and transportation, the banning of tour groups in all of China, interdiction of large assemblies of over 100 people (during the Chinese new year!), closing down of parks, stores and many other amenities, it is the whole Chinese economy which is grinding down to a halt for the new year with no end in sight! 

This in the end may be the real risk of the Wuhan Corona virus. Not that it will become a world pandemic although there is still a small chance that it will, but that it could be the black swan which bring the next recession with a global crash of the world economy and consequences far beyond a mere flu epidemic.


Tweets from Dr. Eric Feigl-Ding"

 1/ "HOLY MOTHER OF GOD - the new coronavirus is a 3.8!!! How bad is that reproductive R0 value? It is thermonuclear pandemic level bad - never seen an actual virality coefficient outside of Twitter in my entire career. I’m not exaggerating...

 2/ “We estimate the basic reproduction number of the infection (R_0) to be 3.8 (95% confidence interval, 3.6-4.0), indicating that 72-75% of transmissions must be prevented by control measures for infections to stop increasing...

 3/ ... We estimate that only 5.1% (95%CI, 4.8-5.5) of infections in Wuhan are identified, and by 21 January a total of 11,341 people (prediction interval, 9,217-14,245) had been infected in Wuhan since the start of the year. Should the epidemic continue unabated in Wuhan....

 4/ we predict the epidemic in Wuhan will be substantially larger by 4 February (191,529 infections; prediction interval, 132,751-273,649), infection will be established in other Chinese cities, and importations to other countries will be more frequent. Our model suggests that..

 5/ travel restrictions from and to Wuhan city are unlikely to be effective in halting transmission across China; with a 99% effective reduction in travel, the size of the epidemic outside of Wuhan may only be reduced by 24.9% on 4 February. Our findings are...

 6/ ...critically dependent on the assumptions underpinning our model, and the timing and reporting of confirmed cases, and there is considerable uncertainty associated with the outbreak at this early stage. With these caveats in mind, our work suggests that...

 7/ a basic reproductive number for this 2019-nCoV outbreak is higher compared to other emergent coronaviruses, suggesting that containment or control of this pathogen may be substantially more difficult.”!!!!

9/ ...cannot be stopped by containment alone. A 99% quarantine lockdown containment of Wuhan will not even reduce the epidemic’s spread by even 1/3rd in the next 2 weeks. Thus, I really hate to be the epidemiologist who has to admit this, but we are potentially faced with...

10/ ... possibly an unchecked pandemic that the world has not seen since the 1918 Spanish Influenza. Let’s hope it doesn’t reach that level but we now live in the modern world  with faster than 1918. @WHO and @CDCgov needs to declare public health emergency ASAP!

11/ REFERENCE for the R0 attack rate (reproductive coefficient) of 3.8 and the 99% containment models come from this paper: https://www.medrxiv.org/content/10.1101/2020.01.23.20018549v1 

12/ What is the typical R0 attack rate for the seasonal flu in most years? It’s around an R0=1.28. The 2009 flu pandemic? R0=1.48. The 1918 Spanish Flu? 1.80. This new reproductive value again? R0=3.8. (Flu reference: https://bmcinfectdis.biomedcentral.com/articles/10.1186/1471-2334-14-480 )

 13/ ...and it gets even worse, the Lancet now reports that the coronavirus is contagious even when *no symptoms*: specifically: “crucial to isolate patients... quarantine contacts as early as possible because asymptomatic infection appears possible”!

14/ Let’s pretend the 3.8 estimate is too high (there’s unpublished estimates of 2.5). even if this virus’s R0=2.5, that’s still 2x higher than seasonal flu’s 1.28 (ref above), and higher than 1918 Spanish Flu pandemic of 1.80 that killed millions. So 2.8 is still super bad folks

 15) My response to some people who think I’m trying to stoke fear... I’m a Harvard trained scientist with a doctorate in epidemiology (and the youngest dual doctoral grad from Harvard SPH). Here are my response: https://twitter.com/drericding/status/1220999410877898754?s=21  https://twitter.com/drericding/status/1220999410877898754 

Wednesday, January 15, 2020

Smart data (part 1) - An overview of client data analysis




As Niccolo Machiavelli once said, “There is nothing more difficult, more perilous or more uncertain of success, than to take the lead in introducing a new order of things.”

This is what the data revolution is about. Not adding a few spreadsheets here and there or collecting more data, but rethinking what is available as well as the data flow within a company to make it meaningful. It requires to "think different!" To make this difference more palpable we should call it "smart data" with the understanding that the "smart" which adds value to the data and transform it into information is not intrinsic to the data, it is knowledge.

Smart data is not AI as it is understood today. It is data which has gone through a process which allows information to be extracted from the data. Conversely is requires to step back from pure statistical analysis to focus on process and context and in that respect, it incorporates "intelligence".

Usually, most companies generate raw data from their operation. It can be client data such as addresses and names, POS data such as actual sales or any types of other data. These data are often poorly structured, neither clean nor accurate and almost always lack context.

This is where generating data must start. Many companies only give cursory attention to their data believing that analysis will generate the insight. This is a mistake! Analysis is only the very last step in a long process and often not the most important one as we will see.

From data mining in the 1990s to artificial intelligence tool nowadays we have made great progress in our understanding of "data" although most of the great insights came over the years by accident.

What started as quantitative and brute force analysis with data mining gave very little actual results for the simple reason that information has nothing to do with mineral and that consequently the chance of finding anything of relevance by accident (or statistical analysis however advanced) is negligible. This is tantamount to buying a lottery ticket and expecting to be a winner. Obvious correlations are just that and were therefore known long before statistics confirmed that they were reel and already included in most companies' DNA as "knowledge", business practice or intangible assets. As for non obvious correlations, they were often little more than that too and usually represented no causation whatsoever. Pure statistical tools are deterministic and therefore not conductive to insight, contrary to most people opinion.

And this from the beginning has been the real challenge for most companies. It is easy and getting easier year after year to generate data but it is extremely hard to find actionable insight in the data and conversely common to get swamped by misleading numbers and wishful confirmation of hard held pre-existing ideas.

From the early garbage-in garbage-out meme to the ability to prove anything and the opposite, data scientists have shown that real science can quickly give birth to voodoo practices after the right number of iterations and complexification. The main reason is that data analysis should head into the opposite direction: It should be kept simple, using as little data as possible but within a smart context which makes the data effective and actionable.   

So, step by step, based on our experience, let's try to see how to build such a context to make sense of the data and actually get insight from what is available without the complexity which often ends up generating vast amount of misleading information. The chapters below are only an outline which will be developed further in follow-up posts.

It is also important to note that these techniques only apply to client data and more generally "people" and are not relevant to other types of big data. Finance and markets, biology and weather modeling all use big data and statistical tools which are specific and mostly very different to the ones described below which apply to marketing and client data analysis.



Starting point

The first obstacle is to define precisely your goals.

This was the birth defect of data mining! If you do not know what you are looking for, the chance that you will find something is very low. This sounds obvious but it had to be proved the hard way for everyone to be convinced. The reason is that although goals are usually easy to define in commercial terms, they can be much harder to define in data terms because in the end it requires the ability to link data input to sales output and therefore to understand perfectly the data, the process, the context and the results.

So right from the beginning, it is clear that data can only generate information if it is transformed into what I call "smart data" first.

"Smart Data"

As mentioned earlier, smart data is data which has gone through a process which permit information to be extracted from the data. This means understanding the data itself, creating a process, a context and linking all this to actual results.

Let's look at these point one by one.

Data taxonomy (generation and normalization of data)

The very first step although obvious is often overlooked because it requires from the beginning to understand the whole process:

What data should be collected and for what purpose?

Is the data static and can therefore be transferred in batch (a client list or POS transactions for example) or is it variable and updated permanently (On-line data)? This is important because it will determine the tools which can be used to understand, visualize and analyze the data.  

What is the flexibility of the data, it's range and variability?

This is most obvious when you create a graph and everything in crammed at the bottom! Obvious with a graph but not necessarily obvious with other tools or when you do not yet understand the characteristics of the data or of its variability.

Data quality and cleaning (de-duplication and homogenization)

Data cleaning is the epitome of data analysis. Without clean data, further analysis is useless. This is something which is now well understood and almost every company is aware of the necessity to have "clean" data... and actually does very little about it!

And that is simply because it is very hard!

In Japan, this problem can easily be understood through the challenge of client's names and addresses. Names can be written in Chinese characters, Japanese characters (hiragana and katakana) or Roman characters. These can be mixed together and the addresses can be arranged in rising or declining order. The result is that two database of different origin are usually almost impossible to merge. Often because they containing large numbers of duplicated which are difficult to eliminate.

To solve this problem, it is necessary to format the data in a uniform way, Easier said than done! One way to do this is to break down the challenge into smaller ones and create as many fields as there are types of entries.

This is slightly easier in English than in Japanese but the challenge is similar.

Data maintenance and updating

Another point related to data cleaning is to know and manage the time frame of the database. Older data may or may not be relevant. The same person may appear under 3 different addresses at different points in time with little hint that this is the same individual.
 
To give an actual example, while working with Facebook, initially we succeeded in getting only 40% of address matching. This was too low to be effective. Only after much effort and reaching a little over 60% were we able to start sharing anonymized data with them and actually add value to their data analysis tools.

Data visualization

Data visualization is a first step towards data analysis which often brings more insight than anything else if done correctly.

Putting data on a map for example can highlight very simply gaps or complex correlations (geographic) which may not be obvious on a spreadsheet. Conversely, spreadsheets are more powerful when using very large numbers of data, which may look random on a map or on a graph. (Which is often the case with On-line data.) 

In this respect, tools such as Tableau can be useful to visualize the same data in very different ways and give depth and angles to a database.

Data clustering

Finally with data clustering, we are leaving the realm of raw data and entering data pre-analysis as we generate clusters, index and proxy data which will help understand the data and start more advanced analysis.

Since we have created many data clustering tools over the years in Japan, I will describe some of these tools as well as the insight we gained while building these in a specific post.

What is important to understand is that at this stage, the data is already structured, cleaned and well organized and therefore much easier to make sense of.  Although, the most important part of the equation is still missing: Context.

Creating context

To some extent, creating context is the most difficult part of data analysis and consequently the most important. Without context, reasoning is often circular and almost anything is possible. What does a 2% growth rate means without a reference to a market, a goal or past achievements?

Context is necessarily external to the data otherwise it is self referential and therefore meaningless. This concept is very important to understand as it is the reason data mining failed and the reason why the current wave of AI will eventually hit a brick wall too.

For this reason, as for data clustering, we will also soon come back to this subject in details. 

Finally data analysis

This subject conversely will not be developed simply because there is already a lot of literature about it which highlight all the tools available. Correlations, random forest or Bayesian analysis are in any case the very last step of data analysis and as explained usually not necessarily the most crucial one. (At least for 95% of the companies I have worked with which have not reached this level of sophistication.)


  

Saturday, January 4, 2020

For Softbank's Son, "Conflating Luck And Talent Is Dangerous" (article)





The longest bull market in history has segregated talents from losers or so it seems. But to get the big picture, more time is necessary for patterns and cycles to emerge. The success and recent failure of Softbank is a good metaphor, the lesson of hubris and arrogance timeless.


Authored by Scott Galloway via ProfGalloway.com,
Third Base

The Dunning-Kruger effect posits that dumb people are too stupid to know they are dumb. They are not perplexed by difficult situations but overconfident — not knowing what they don’t know. As few people believe they are stupid, or a bad driver, a more relatable component of Dunning-Kruger is incorrectly believing one area of skill translates to another.

I suffer massively from this. I’m smarter than your average bear when it comes to marketing, so I’ve come to believe that makes me an expert on pretty much anything. I don’t know much about physics but constantly reference Galileo despite knowing little besides the fact that he dared challenge the church.

There is evidence of this all over the marketplace. Great P/E guys believe they would make great VCs and vice versa. Hedge fund managers believe two years of above-market returns means they are also great operators. To disabuse anybody of this notion, take them to a Sears. Billionaires running for president, actors starting skincare lines, and tech CEOs founding media firms. Being rich also naturally makes you a great film producer.

Masayoshi Son created $64 billion in shareholder value, mostly through deft acquisitions. Mr. Son can also boast of perhaps the best venture investment in history, $20 million into Alibaba that became $100 billion. That investment is tantamount to Michael Jordan hitting a grand slam on his first at bat wearing a Birmingham Barons hat.

Mr. Son has mistaken luck in venture investing for the ability to responsibly allocate billions based on a gut feeling. The size of SoftBank investments, relative to the diligence, now looks stupid, if not negligent. A writedown on an investment in a dog-walking app may have been avoided had someone in the SoftBank diligence team taken the time to discover they were investing $300 million in … a dog-walking app.

Conflating luck and talent is dangerous. As I get older, I’m struck by how big a part luck played in my life, and how much I mistook it for skill, well into my forties. The Pareto principle shows that even if competence is evenly distributed, 80% of effects stem from 20% of the causes.

Not recognizing your blessings feeds into the dark side of capitalism and meritocracy: the notion that success is a choice, and that those who haven’t achieved success are not unlucky, but unworthy. This leads to regressive policies that further reward the perceived winners and punish the perceived losers based on income level. The most recent example of our belief that poor people are guilty: The US now has the fourth-lowest tax rate in the world, and billionaires have the lowest tax rate of any cohort.
First Base

I constantly humblebrag that I was raised by a single immigrant mother who lived and died a secretary. But truth is I was born on third base. My parents got me to first base before I was born, immigrating to the US. This took courage, desire, and a dose of selfishness. Both left families that needed them. My mom left London when her two youngest siblings were still in an orphanage.

In Europe I’d make much less money being an entrepreneur and challenging institutions. In China I’d likely be in jail. Having one of my companies fail would have bankrupted me in Europe, as the tolerance for risk or failure is scant. I have no idea what would have happened in China. In the US, a tolerance for failure meant a lifestyle my parents couldn’t have imagined crossing the Atlantic on a steamship in 1961.
Second Base

I have some talent and have worked really hard, but mostly my success is due to being born in the right place at the right time, and being a white heterosexual male. Coming of professional age as a white male in the nineties was the greatest economic arbitrage in history. Today’s 54-to-70-year-olds saw the Dow Jones increase an average of 445% from 25-40, their prime working years. For other ages, it doubles at most.

Economic liberalization (globalization, technology, market deregulation) coupled with social norms that clung to the past meant 31% of America (white males) were given license over a lion’s share of the spoils. In nineties San Francisco, I raised over $100 million for my start-ups. I didn’t know a single woman under 40 who raised more than a million. And it seemed normal. Even today, white men hold 65% of elected offices despite being 31% of the population.

Third Base

Rich, fabulous people are the ideal billboards for luxury brands. Our nation’s best universities have adopted the same strategy. Universities are no longer nonprofits, but the highest-gross-margin luxury brands in the world. Another trait of a luxury brand is the illusion of scarcity. Over the last 30 years, the number of applicants to Stanford has tripled, while the size of the freshman class has remained static. Harvard and Stanford have become finishing school for the global wealthy.

In the class of 2013 in the Ivy League, five of the eight colleges (Dartmouth, Princeton, Yale, Penn, and Brown) had more students from the top 1% of the income scale than the bottom 60%.
Fast and Slow Thinking

According to @thetweetofgod, intelligence looks in the mirror and sees ignorance; ignorance looks in the mirror it sees intelligence. The sectors that have enjoyed the greatest prosperity spread across increasingly few people — technology and finance — have created an unprecedented level of arrogance among people born on third base.

When we feel threatened, we are more prone to see each other as an enemy, rather than someone who has a different opinion. We want to dismiss and fight the whole person, rather than just what they said. From primeval times, our brains have been set up to identify “enemy” or “one of us,” that simple binary distinction. Do I trust them as a person or are they not “one of us.” When we are in our more evolved, slow thinking mode (Daniel Kahneman), we evaluate arguments. When we are in our knee-jerk, threatened fast thinking, we decide the person is our enemy and argue from our amygdala, not our forebrain.

When we are threatened, we are also less empathic. Altruistic behavior decreases in times of greater income inequality. The rich are more generous in times of lesser inequality and less generous when inequality grows more extreme. When the poor need our help more, we are less likely to offer it, because we don’t see the poor as one of us. They become “them.”

Michael Lewis writes, “The problem is caused by the inequality itself: it triggers a chemical reaction in the privileged few. It tilts their brains. It causes them to be less likely to care about anyone but themselves or to experience the moral sentiments needed to be a decent citizen.”

Insider Sources Preparing for BIG Events Happening SOON (here's what they're saying) Video - 51mn

   The world financial markets are about to blow! It is already obvious in the currency markets where almost every currency against the doll...