Friday, June 28, 2019

Boeing's 737 Max Software Outsourced to $9-an-Hour Engineers


https://www.bloomberg.com/news/articles/2019-06-28/boeing-s-737-max-software-outsourced-to-9-an-hour-engineers


It remains the mystery at the heart of Boeing Co.’s 737 Max crisis: how a company renowned for meticulous design made seemingly basic software mistakes leading to a pair of deadly crashes. Longtime Boeing engineers say the effort was complicated by a push to outsource work to lower-paid contractors.
The Max software -- plagued by issues that could keep the planes grounded months longer after U.S. regulators this week revealed a new flaw -- was developed at a time Boeing was laying off experienced engineers and pressing suppliers to cut costs.
Increasingly, the iconic American planemaker and its subcontractors have relied on temporary workers making as little as $9 an hour to develop and test software, often from countries lacking a deep background in aerospace -- notably India.
In offices across from Seattle’s Boeing Field, recent college graduates employed by the Indian software developer HCL Technologies Ltd. occupied several rows of desks, said Mark Rabin, a former Boeing software engineer who worked in a flight-test group that supported the Max.
The coders from HCL were typically designing to specifications set by Boeing. Still, “it was controversial because it was far less efficient than Boeing engineers just writing the code,” Rabin said. Frequently, he recalled, “it took many rounds going back and forth because the code was not done correctly.”

Boeing’s cultivation of Indian companies appeared to pay other dividends. In recent years, it has won several orders for Indian military and commercial aircraft, such as a $22 billion one in January 2017 to supply SpiceJet Ltd. That order included 100 737-Max 8 jets and represented Boeing’s largest order ever from an Indian airline, a coup in a country dominated by Airbus.
Based on resumes posted on social media, HCL engineers helped develop and test the Max’s flight-display software, while employees from another Indian company, Cyient Ltd., handled software for flight-test equipment.

Costly Delay

In one post, an HCL employee summarized his duties with a reference to the now-infamous model, which started flight tests in January 2016: “Provided quick workaround to resolve production issue which resulted in not delaying flight test of 737-Max (delay in each flight test will cost very big amount for Boeing).”
Boeing said the company did not rely on engineers from HCL and Cyient for the Maneuvering Characteristics Augmentation System, which has been linked to the Lion Air crash last October and the Ethiopian Airlines disaster in March. The Chicago-based planemaker also said it didn’t rely on either firm for another software issue disclosed after the crashes: a cockpit warning light that wasn’t working for most buyers.
“Boeing has many decades of experience working with supplier/partners around the world,” a company spokesman said. “Our primary focus is on always ensuring that our products and services are safe, of the highest quality and comply with all applicable regulations.”
In a statement, HCL said it “has a strong and long-standing business relationship with The Boeing Company, and we take pride in the work we do for all our customers. However, HCL does not comment on specific work we do for our customers. HCL is not associated with any ongoing issues with 737 Max.”
Recent simulator tests by the Federal Aviation Administration suggest the software issues on Boeing’s best-selling model run deeper. The company’s shares fell this week after the regulator found a further problem with a computer chip that experienced a lag in emergency response when it was overwhelmed with data.
Engineers who worked on the Max, which Boeing began developing eight years ago to match a rival Airbus SE plane, have complained of pressure from managers to limit changes that might introduce extra time or cost.
“Boeing was doing all kinds of things, everything you can imagine, to reduce cost, including moving work from Puget Sound, because we’d become very expensive here,” said Rick Ludtke, a former Boeing flight controls engineer laid off in 2017. “All that’s very understandable if you think of it from a business perspective. Slowly over time it appears that’s eroded the ability for Puget Sound designers to design.”
Rabin, the former software engineer, recalled one manager saying at an all-hands meeting that Boeing didn’t need senior engineers because its products were mature. “I was shocked that in a room full of a couple hundred mostly senior engineers we were being told that we weren’t needed,” said Rabin, who was laid off in 2015.

The typical jetliner has millions of parts -- and millions of lines of code -- and Boeing has long turned over large portions of the work to suppliers who follow its detailed design blueprints.
Starting with the 787 Dreamliner, launched in 2004, it sought to increase profits by instead providing high-level specifications and then asking suppliers to design more parts themselves. The thinking was “they’re the experts, you see, and they will take care of all of this stuff for us,” said Frank McCormick, a former Boeing flight-controls software engineer who later worked as a consultant to regulators and manufacturers. “This was just nonsense.”
Sales are another reason to send the work overseas. In exchange for an $11 billion order in 2005 from Air India, Boeing promised to invest $1.7 billion in Indian companies. That was a boon for HCL and other software developers from India, such as Cyient, whose engineers were widely used in computer-services industries but not yet prominent in aerospace.
Rockwell Collins, which makes cockpit electronics, had been among the first aerospace companies to source significant work in India in 2000, when HCL began testing software there for the Cedar Rapids, Iowa-based company. By 2010, HCL employed more than 400 people at design, development and verification centers for Rockwell Collins in Chennai and Bangalore.
That same year, Boeing opened what it called a “center of excellence” with HCL in Chennai, saying the companies would partner “to create software critical for flight test.” In 2011, Boeing named Cyient, then known as Infotech, to a list of its “suppliers of the year” for design, stress analysis and software engineering on the 787 and the 747-8 at another center in Hyderabad.
The Boeing rival also relies in part on offshore engineers. In addition to supporting sales, the planemakers say global design teams add efficiency as they work around the clock. But outsourcing has long been a sore point for some Boeing engineers, who, in addition to fearing job losses say it has led to communications issues and mistakes.

Moscow Mistakes

Boeing has also expanded a design center in Moscow. At a meeting with a chief 787 engineer in 2008, one staffer complained about sending drawings back to a team in Russia 18 times before they understood that the smoke detectors needed to be connected to the electrical system, said Cynthia Cole, a former Boeing engineer who headed the engineers’ union from 2006 to 2010.
“Engineering started becoming a commodity,” said Vance Hilderman, who co-founded a company called TekSci that supplied aerospace contract engineers and began losing work to overseas competitors in the early 2000s.
U.S.-based avionics companies in particular moved aggressively, shifting more than 30% of their software engineering offshore versus 10% for European-based firms in recent years, said Hilderman, an avionics safety consultant with three decades of experience whose recent clients include most of the major Boeing suppliers.
With a strong dollar, a big part of the attraction was price. Engineers in India made around $5 an hour; it’s now $9 or $10, compared with $35 to $40 for those in the U.S. on an H1B visa, he said. But he’d tell clients the cheaper hourly wage equated to more like $80 because of the need for supervision, and he said his firm won back some business to fix mistakes.
HCL, once known as Hindustan Computers, was founded in 1976 by billionaire Shiv Nadar and now has more than $8.6 billion in annual sales. With 18,000 employees in the U.S. and 15,000 in Europe, HCL is a global company and has deep expertise in computing, said Sukamal Banerjee, a vice president. It has won business from Boeing on that basis, not on price, he said: “We came from a strong R&D background.”
Still, for the 787, HCL gave Boeing a remarkable price – free, according to Sam Swaro, an associate vice president who pitched HCL’s services at a San Diego conference sponsored by Avionics International magazine in June. He said the company took no up-front payments on the 787 and only started collecting payments based on sales years later, an “innovative business model” he offered to extend to others in the industry.
The 787 entered service three years late and billions of dollars over budget in 2011, in part because of confusion introduced by the outsourcing strategy. Under Dennis Muilenburg, a longtime Boeing engineer who became chief executive in 2015, the company has said that it planned to bring more work back in-house for its newest planes.

Engineer Backwater

The Max became Boeing’s top seller soon after it was offered in 2011. But for ambitious engineers, it was something of a “backwater,” said Peter Lemme, who designed the 767’s automated flight controls and is now a consultant. The Max was an update of a 50-year-old design, and the changes needed to be limited enough that Boeing could produce the new planes like cookie cutters, with few changes for either the assembly line or airlines. “As an engineer that’s not the greatest job,” he said.
Rockwell Collins, now a unit of United Technologies Corp., won the Max contract for cockpit displays, and it has relied in part on HCL engineers in India, Iowa and the Seattle area. A United Technologies spokeswoman didn’t respond to a request for comment.
Contract engineers from Cyient helped test flight test equipment. Charles LoveJoy, a former flight-test instrumentation design engineer at the company, said engineers in the U.S. would review drawings done overnight in India every morning at 7:30 a.m. “We did have our challenges with the India team,” he said. “They met the requirements, per se, but you could do it better.”
Multiple investigations – including a Justice Department criminal probe – are trying to unravel how and when critical decisions were made about the Max’s software. During the crashes of Lion Air and Ethiopian Airlines planes that killed 346 people, investigators suspect, the MCAS system pushed the planes into uncontrollable dives because of bad data from a single sensor.
That design violated basic principles of redundancy for generations of Boeing engineers, and the company apparently never tested to see how the software would respond, Lemme said. “It was a stunning fail,” he said. “A lot of people should have thought of this problem – not one person – and asked about it.”
Boeing also has disclosed that it learned soon after Max deliveries began in 2017 that a warning light that might have alerted crews to the issue with the sensor wasn’t installed correctly in the flight-display software. A Boeing statement in May, explaining why the company didn’t inform regulators at the time, said engineers had determined it wasn’t a safety issue.
“Senior company leadership,” the statement added, “was not involved in the review.”


Sunday, June 23, 2019

Generative Adversarial Networks - The Story So Far

https://blog.floydhub.com/gans-story-so-far/

How to code The Transformer in PyTorch

https://blog.floydhub.com/the-transformer-in-pytorch/

Could The Transformer be another nail in the coffin for RNNs?
Doing away with clunky for-loops, the transformer instead finds a way to allow whole sentences to simultaneously enter the network in batches. With this technique, NLP reclaims the advantage of Python’s highly efficient linear algebra libraries. This time-saving can then be spent deploying more layers into the model.
So far, it seems the result from transformers is faster convergence and better results. What’s not to love?

Wednesday, June 12, 2019

40 data science techniques

https://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by-data-scientists

The 40 data science techniques
  1. Linear Regression 
  2. Logistic Regression 
  3. Jackknife Regression *
  4. Density Estimation 
  5. Confidence Interval 
  6. Test of Hypotheses 
  7. Pattern Recognition 
  8. Clustering - (aka Unsupervised Learning)
  9. Supervised Learning 
  10. Time Series 
  11. Decision Trees 
  12. Random Numbers 
  13. Monte-Carlo Simulation 
  14. Bayesian Statistics 
  15. Naive Bayes 
  16. Principal Component Analysis - (PCA)
  17. Ensembles 
  18. Neural Networks 
  19. Support Vector Machine - (SVM)
  20. Nearest Neighbors - (k-NN)
  21. Feature Selection - (aka Variable Reduction)
  22. Indexation / Cataloguing *
  23. (Geo-) Spatial Modeling 
  24. Recommendation Engine *
  25. Search Engine *
  26. Attribution Modeling *
  27. Collaborative Filtering *
  28. Rule System 
  29. Linkage Analysis 
  30. Association Rules 
  31. Scoring Engine 
  32. Segmentation 
  33. Predictive Modeling 
  34. Graphs 
  35. Deep Learning 
  36. Game Theory 
  37. Imputation 
  38. Survival Analysis 
  39. Arbitrage 
  40. Lift Modeling 
  41. Yield Optimization
  42. Cross-Validation
  43. Model Fitting
  44. Relevancy Algorithm *
  45. Experimental Design

IBM Call for Code 2019 Global Challenge

https://developer.ibm.com/callforcode

Call for Code 2019 Global Challenge
Commit to the Cause. Push for Change.
The Call for Code 2019 Global Challenge is a worldwide developer competition that seeks technology solutions for natural disaster preparedness, response, and recovery.

It is supported by the IBM Code and Response initiative, a multi-year program dedicated to creating and deploying open source technologies to tackle the world's biggest challenges. 


Launching TensorFlow Lite for Microcontrollers

https://petewarden.com/2019/03/07/launching-tensorflow-lite-for-microcontrollers/

I’ve been spending a lot of my time over the last year working on getting machine learning running on microcontrollers, and so it was great to finally start talking about it in public for the first time today at the TensorFlow Developer Summit. Even better, I was able to demonstrate TensorFlow Lite running on a Cortex M4 developer board, handling simple speech keyword recognition. I was nervous, especially with the noise of the auditorium to contend with, but I managed to get the little yellow LED to blink in response to my command! If you’re interested in trying it for yourself, the board is available for $15 from SparkFun with the sample code preloaded. For anyone who didn’t catch it, here are the notes from my talk.
Hi, I’m Pete Warden on the TensorFlow Lite team, and I’m here to talk about a new project we’re pretty excited about. When I first joined Google back in 2014, I learned about a lot of exciting internal work that wasn’t yet public, but one of the most impressive moments was when I was introduced to Raziel, who was on the speech team at that point, and he told me that they used network models that were only thirteen kilobytes in size! I only had experience with image models, and in those days even the smallest model like Inception still took up megabytes.
I was even more amazed when he told me why these models had to be so small. They needed to run them on DSPs and other embedded chips in smartphones so Android could listen out for wake words like “Hey Google” while the main CPU was powered off to save the battery. These microcontrollers often only had tens of kilobytes of RAM and Flash memory, so they simply couldn’t fit anything larger. They also couldn’t rely on cloud connectivity because keeping any radio connection alive continuously would drain the battery in no time at all.
What struck me was that the speech team had a massive amount of experience, and had spent a lot of time experimenting, and even within the tough constraints of these devices, neural networks produced better results than any of the more traditional methods they tried. This left me wondering if they would be useful for other embedded sensor applications, and I wanted to see if we could build support for these platforms into TensorFlow. At the time few people knew about the ground-breaking work that was being done in the speech community, so I was excited to help share it more widely.
Today I’m pleased to announce that we are releasing the first, experimental support for embedded platforms in TensorFlow Lite. To show you what I mean, here’s a demonstration I have in my pocket!
This is a prototype of a development board built by SparkFun, and it has a Cortex M4 processor with 384KB of RAM and 1MB of Flash storage. The processor was built by Ambiq to be extremely low power, drawing less than one milliwatt in many cases so it’s able to run for many days on a small coin battery.
I’m going to take my life in my hands now by trying a live demo, so wish me luck! The goal is that I’m going to say the word “Yes”, and the little yellow LED here will light up. Hopefully we can use this camera contraption to show this to everyone on the screen and in the livestream.
“Yes”. “Yes”. “Yes”.
As you can see, it’s still far from perfect, but it’s managing to do a decent job of recognizing when I say the word, and not lighting up when there’s unrelated conversations.
So why is this useful? First, this is running entirely locally on the embedded chip, with no need to have any internet connectivity, so it’s good to have as part of a voice interface system. The model itself takes up less than 20KB of Flash storage space, the footprint of the TensorFlow Lite code is only another 25KB of Flash, and it only needs 30KB of RAM to operate.
Secondly, the software for this demo is entirely open source. You can grab the code for it and build it yourself. It’s also already been ported to a lot of different embedded chips, and we hope to see it appear on many more over the next few months. You can check out the code yourself at
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro
There’s more documentation here:
https://www.tensorflow.org/lite/guide/microcontroller
If you want to customize the example, you can try this code lab:
https://g.co/codelabs/sparkfunTF
Third, you can train your own model using this tutorial that we provide. It comes with an open dataset of over 100,000 utterances submitted by volunteers, which we’d love your help expanding through the link here:
https://aiyprojects.withgoogle.com/open_speech_recording
The helpful thing about this is that if you have your own words or noises you want to recognize, you should be able to adapt this training approach to your own problem just by supplying new training data.
Fourth, the code is part of TensorFlow Lite, it uses the same APIs, file formats, and conversion tools, so it’s well integrated into the whole TensorFlow ecosystem, making it easier to use.
So, how can you try this out yourself? If you’re in the audience, I’m pleased to say that when you pick up your box this afternoon you’ll find your very own prototype SparkFun Edge board! Just remove the tab to switch the battery on, and you should find it preloaded with the TensorFlow “yes” example. Just try saying “Yes” to TensorFlow, and you should hopefully get a yellow light! We also include all the cables you need to program it with your own code through the serial port. These are the first 700 boards ever built, so there is a wiring issue that drains the battery more quickly than on the final devices, but you should be able to develop with them in exactly the same way as the production boards.
If you’re watching at home, you can order one of these for $15 from SparkFun. You’ll also find instructions for many other platforms in the documentation, so we’re happy to work with whatever devices you want to build your projects on. We welcome collaboration with developers across the community to unlock all the creativity that I know is out there, and I’m hoping to be spending a lot of my time in the future reviewing pull requests!
Finally, a big thanks to everyone who helped bring this prototype together, including the TensorFlow Lite team, especially Raziel, Rocky, Dan, Tim, and Andy; Alasdair, Nathan, Owen and Jim at SparkFun; Scott, Steve, Arpit, and Andre at Ambiq, and many people at Arm including Rod, Neil and Zach! This is still a very early experiment but I can’t wait to see what people build with this.


Forecasting: Principles and Practice Rob J Hyndman and George Athanasopoulos

Forecasting: Principles and Practice

Rob J Hyndman and George Athanasopoulos
Monash University, Australia

https://otexts.com/fpp2/ 


Preface

Welcome to our online textbook on forecasting.
This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.
The book is written for three audiences: (1) people finding themselves doing forecasting in business when they may not have had any formal training in the area; (2) undergraduate students studying business; (3) MBA students doing a forecasting elective. We use it ourselves for a third-year subject for students undertaking a Bachelor of Commerce or a Bachelor of Business degree at Monash University, Australia.
For most sections, we only assume that readers are familiar with introductory statistics, and with high-school algebra. There are a couple of sections that also require knowledge of matrices, but these are flagged.
At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, we suggest journal articles that provide more information.
We use R throughout the book and we intend students to learn how to forecast with R. R is free and available on almost every operating system. It is a wonderful tool for all statistical analysis, not just for forecasting. See the Using R appendix for instructions on installing and using R.
All R examples in the book assume you have loaded the fpp2 package, available on CRAN, using library(fpp2). This will automatically load several other packages including forecast and ggplot2, as well as all the data used in the book. We have used v2.3 of the fpp2 package and v8.5 of the forecast package in preparing this book. These can be installed from CRAN in the usual way. Earlier versions of the packages will not necessarily give the same results as those shown in this book.
We will use the ggplot2 package for all graphics. If you want to learn how to modify the graphs, or create your own ggplot2 graphics that are different from the examples shown in this book, please either read the ggplot2 book (Wickham, 2016), or do the ggplot2 course on the DataCamp online learning platform.
There is also a DataCamp course based on this book which provides an introduction to some of the ideas in Chapters 2, 3, 7 and 8, plus a brief glimpse at a few of the topics in Chapters 9 and 11.
The book is different from other forecasting textbooks in several ways.
  • It is free and online, making it accessible to a wide audience.
  • It uses R, which is free, open-source, and extremely powerful software.
  • The online version is continuously updated. You don’t have to wait until the next edition for errors to be removed or new methods to be discussed. We will update the book frequently.
  • There are dozens of real data examples taken from our own consulting practice. We have worked with hundreds of businesses and organisations helping them with forecasting issues, and this experience has contributed directly to many of the examples given here, as well as guiding our general philosophy of forecasting.
  • We emphasise graphical methods more than most forecasters. We use graphs to explore the data, analyse the validity of the models fitted and present the forecasting results.

Changes in the second edition

The most important change in edition 2 of the book is that we have restricted our focus to time series forecasting. That is, we no longer consider the problem of cross-sectional prediction. Instead, all forecasting in this book concerns prediction of data at future times using observations collected in the past.
We have also simplified the chapter on exponential smoothing, and added new chapters on dynamic regression forecasting, hierarchical forecasting and practical forecasting issues. We have added new material on combining forecasts, handling complicated seasonality patterns, dealing with hourly, daily and weekly data, forecasting count time series, and we have many new examples. We have also revised all existing chapters to bring them up-to-date with the latest research, and we have carefully gone through every chapter to improve the explanations where possible, to add newer references, to add more exercises, and to make the R code simpler.
Helpful readers of the earlier versions of the book let us know of any typos or errors they had found. These were updated immediately online. No doubt we have introduced some new mistakes, and we will correct them online as soon as they are spotted. Please continue to let us know about such things.

Happy forecasting!
Rob J Hyndman and George Athanasopoulos
April 2018

Tuesday, June 11, 2019

Airbus sobloo Multi-Data Challenge

https://www.copernaicus-masters.com/prize/airbus-sobloo-challenge/


Airbus Defence and Space together with sobloo are looking for solutions that use both Copernicus and Airbus Earth Observation data to deliver new services and/or applications that provide insight and have impact on areas like Natural Resources Consumption, Agriculture, Forestry, Maritime, Defence & Security and Smart Cities.

Tuesday, June 4, 2019

Interview with Will Kurt on his latest book: Bayesian Statistics The Fun Way

https://notamonadtutorial.com/interview-with-will-kurt-on-his-latest-book-bayesian-statistics-the-fun-way-63ce8aee32ed




Like most devs, I have a diverse set of interests: functional programming, operating systems, type systems, distributed systems, and data science. That is why I was excited when I learned that Will Kurt, the author of Get Programming with Haskell, wrote a a bayesian statistics book that is being published by No Starch Press. There aren’t many people that write books on different topics. I was sure that Will had something interesting to share in this new book. I wasn’t disappointed. The book is an excellent introduction, specially for those of us that have a rough time with advanced math but that want to advance in the data science field. I recommend reading the book after reading Think Stats, but before reading Bayesian Methods for Hackers, Bayesian Analysis with Python and Doing Bayesian Data Analysis.
If you like the interview I recommend that you also read the interviews we did with Thomas Wiecki and Osvaldo Martin about Bayesian analysis and probabilistic programming.
Finally I wanted to thank two members of my team (Pablo Amoroso and Juan Bono) for helping me with the interview.

Reach me via twitter at @unbalancedparen if you have any comments or interview request for This is not a Monad tutorial.
If you have an idea, you are looking for a part time CTO, need a team of devs or have maintenance work ping us: LambdaClass.

1. Why a new statistics book?
Nearly all of the many excellent books on Bayesian statistics out now assume you are either familiar with statistics already or have a pretty solid foundation in programming. Because of this the current state of Bayesian statistics in often as an advanced alternative to classical (i.e. frequentist) statistics. So even though Bayesian statistics is gaining a lot of popularity, it’s mostly amount people who already have a quantitative background.
When someone wants to simply “learn statistics” they usually pick up an introduction based on frequentist statistics and end up half understanding a bunch of tests and rules, and feel very confused by the subject. I wanted to write a book on Bayesian statistics that really anyone could pick up and use to gain real intuitions for how to think statistically and solve real problems using statistics. For me there’s no reason why Bayesian statistics can’t be a beginners first introduction to statistics.
I would love it if, one day, when people said “statistics” it implied Bayesian statistics and frequentist statistics was just an academic niche. To get there we need more books that introduce statistics to a wide audience using Bayesian methods and assume this may be the readers first exposure to stats. I toyed with the idea of just calling this book “Statistics the Fun Way”, but I know I would probably get angry emails from people buying the book help with stats 101 and getting very confused! Hopefully this book will be a small step in getting “stat 101” to be taught from the Bayesian perspective, and statistics can make sense from the beginning.
2. Who is your intended audience for the book? Could anyone without a math background pick it up?
My goal with Bayesian Statistics the Fun Way was to create a book that basically anyone with a high school math background could pick up and read. Even if you only vaguely remember algebra, the book moves at a pace that should be easy to follow. Bayesian statistics does require just a little calculus and is a lot easier with a bit of code, so I’ve included two appendices that cover enough R to work as an advanced calculator and enough background in the ideas of calculus that when the book needs talk about integrals you can understand. But I promise that there is no solving of any calculus problems required.
While I worked hard to limit the mathematical prerequisites for the book, as you read through the book you should start picking up on mathematical ways of thinking. If you really understand the math your using, you can make better use of it. So I don’t try to shy away from any of the real math, but rather work up to it slowly so that all the math seems obvious as you develop your understanding. Like many people, I used to believe that math was confusing and difficult to work with. In time I really saw that when math is done right, it should be almost obvious. Confusion in mathematics is usually just the result of moving too quickly, or leaving out important steps in the reasoning.
3. Why should software developers learn probability and statistics?
I really believe everyone should learn some probability and statistics because it really does help to reason about the uncertain world which is our everyday life. For software developers in particular there are few common places that its useful to understand statistics. It’s pretty likely that at some point in your career in software, you’ll need to write code that makes some decision based on some uncertain measurement. Maybe it’s measuring the conversion rate on a web page, generating some random reward in a game, assigning users to groups randomly or even reading information from an uncertain sensor. In all these cases really understanding probability will be very helpful. In the software part of my career I’ve also found that probability can help a lot in troubleshooting bugs that are difficult to reproduce or to trace back to a complex problem. If a bug appears to be caused by insufficient memory, does adding more memory decrease the probability of the bug in a meaningful way? If there are two explanations for a complex bug, which should be investigated first? In all these cases probability can help. And of course with the rise of Machine Learning and Data Science, engineers are more and more likely to be working on software problems that involve working directly with probabilities.
4. Could you give a brief summary of the difference between the frequentist and bayesian approaches to probability?
Frequentists interpret probability as a statement about how frequently an event should occur in repeated trials. So if we toss a coin twice we should expect to get 1 head because the frequency of heads is 1/2. Bayesians interpret probability as a statement of our knowledge, basically as a continuous version of logic. The probability of getting heads in a coin toss is 0.5 because I don’t believe getting heads is any more likely than getting tails. For coin tosses both schools of thought work pretty well. But when you talk about things like the probability that your favorite football team will win the world cup, talking about degrees of belief makes a lot more sense. This additionally means that Bayesian statistics does not make statements about the world but about our understanding of the world. And since we each understand the world a bit differently, Bayesian statistics allows us to incorporate that difference into our analysis. Bayesian analysis is, in many ways, the science of changing your mind.
5. Why did you choose to focus on the bayesian approach?
There are plenty of really great philosophical reasons to focus on Bayesian statistics but for me there is a very practical reason: everything makes sense. From a small set of relatively intuitive rules you can build out the solutions to any problem you encounter. This gives Bayesian statistics a lot of power and flexibility, and also makes it much easier to learn. I think this is something programmers will really like about Bayesian reasoning. You aren’t applying ad hoc tests to a problem, but reasoning about your problem and coming up with a solution that makes sense. Bayesian statistics is really reasoning. You agree to the statistical analysis only when it genuinely makes sense and convinces you, not because some seemingly arbitrary test result achieves some equally arbitrary value. Bayesian statistics also allows us to disagree quantitatively. It’s quite common in everyday life that two people will see the same evidence and come to different conclusions. Bayesian statistics allows us to model this disagreement in a formal way so that we can see what evidence it would take to change our beliefs. You shouldn’t believe the results of a paper because of a p-value, you should believe them because the truly convince you.
6. How Bayesian Statistics Is Related To Machine Learning
One way I’ve been thinking about the relationship between Bayesian Statistics and Machine Learning (especially neural networks) in the way that each deal with the fact that calculus can get really, really hard. Machine Learning is essentially understanding and solving really tricky derivatives. You come up with a function and a loss for it, then compute (automatically) the derivative and try to follow it until you get optimal parameters. People often snarkily remark that backpropagation is “just the chain rule”, but nearly all the really hard work in deep learning is applying that successfully.
Bayesian statistics is the other part of calculus, solving really tricky integrals. The Stan developer Michael Betancourt made a great comment that basically all Bayesian analysis is really computing expectations, which is solving integrals. Bayesian analysis leaves you with a posterior distribution but you can’t use a distribution for anything unless you integrate over it to get a concrete answer. Thankfully no one makes snarky comments about integrals because everyone knows that it can be really tricky in the simplest case. This xkcd makes that point nicely:
So in this strange way the current state Machine Learning and Bayesian Statistics are what happens when you push basic calculus ideas to the limits of what we can compute.
This relationship also outlines the key differences. When you think about derivatives you’re looking for a specific point related to a function. If you know location and time, the derivative is speed and can tell you when you went the fastest. Moving the needle in ML is getting a single metric better than anyone else. Integration is about summarizing an entire process. Again if you know location and time, the integral is distance and tells you how far you’ve traveled. Bayesian statistics is about summarizing all of your knowledge about a problem, but this allows us to not just give single predictions but also say how confident we are in a wide range of predictions. Advancement in Bayesian statistics is about understanding more complex systems of information.
7. If your readers wanted to dig deeper into the subject of the book, where would you point them to (books, courses, blog posts, etc)?
The biggest inspiration for this book was E.T. Jaynes’ “Probability Theory: the Logic of Science”. My secret hope is that “Bayesian Statistics the Fun Way” can be a version of that book accessible to everyone. Jaynes’ book is really quite challenging to work through and is presents a pretty radical version of Bayesian statistics. Aubrey Clayton has done an amazing service by putting together a series of lectures on the key chapters of this book.
And of course if you liked reading the book you’d probably enjoy my blog. I haven’t been posting much recently since I’ve been writing a “Bayesian Statistics the Fun Way” and before that “Get Programming with Haskell” but I’ve got a ton of posts in my head that I really want to get down on paper soon. Generally the blog, despite the name, is not strictly Bayesian. Typically if I have some statistics/probability topic that I’m thinking about, it will get fleshed out into a blog post.
8. In your experience, what is a concept from probability/statistics that non experts find difficult to understand?
Honestly, the hardest part is interpreting probabilities. People really lost faith in a lot of Bayesian analysts like Nate Silver (and many others) when they were predicting 80% or so chance that Clinton would win the 2016 election and she didn’t. People felt like they had been tricked and everyone was wrong, but 80% chance really isn’t that high. If my doctor tells me I have an 80% chance to live I’m going to be really nervous.
A common approach to this problem is to point to probabilities themselves and say that they are a poor way to express uncertainty. The fix then is that you should be using odds or likelihood ratios or some decibel-like system similar to Jaynes’s idea of evidence. But after really thinking about probability for along time I haven’t found that there’s a universally good way to express uncertainty.
The heart of the problem is that, deep down, we really want to believe that the world is certain. Even among experienced probabilists there’s this persistent nagging feeling that maybe if you do the right analysis, learn the right prior, add another layer into your hierarchical model you can get it right and remove or at least dramatically reduce uncertainty. Part of what draws me to probability is the weird mixture of trying to make sense of the world and the mediation on the fact that even when trying your hardest, the world will surprise you.
9. What are your thoughts on p-values as a measure of statistical significance? Could you give us a brief description of p-hacking?
There’s two things wrong with p-values. First of all, p-values are not the way sane people answer questions. Imagine how this conversation would sound at work:
Manager: “Did you fix that bug assigned to you?”
You: “Well I’m pretty sure I didn’t not fix it…”
Manager: “If you fixed it, just mark it fixed.”
You: “Oh no, I really can’t say that I fixed it…”
Manager: “So you want to mark it ‘will not fix’?”
You: “No, no, I’m pretty sure that’s not the case”
p-values confuse people because they are, quite literally, confusing. Bayesian statistics gives you a posterior probability, which is exactly the positive answer to the question being posed that you want. In the previous dialog the Bayesian says “I’m pretty sure it’s fixed”, if the manager wants you to be more sure, you collect more data and then you can say “I’m basically certain it’s fixed”.
The second problem is the culture of arbitrarily picking 0.05 as some magic value that has meaning. Related to the previous question about understanding probabilities, a 5% chance of something occuring does not make it very rare. Rolling a 20 sided die and getting a 20 has a 5% chance, and anyone who knows of Dungeons and Dragons (D&D) knows that this is far from impossible. Outside of role playing games, focusing on a die roll is not a great system of verifying true from false.
And that brings us to p-hacking. Imagine you’re playing D&D with some friends and you role twenty 20-sided dice all at one. You then point one that landed on 20 and proclaim “that was the die I meant to roll, the rest are all just test dice.” It’s still cheating even if you technically did roll a 20. That’s what p-hacking essentially is. You keep doing analysis until you find something that is ‘significant’, and then claim that’s what you were looking for the entire time.
10. Any closing recommendations on what book to read next after reading your book?
Now that I’ve finished writing this book I finally have time to start catching up on other books that I didn’t have time to read while writing it! I’m really enjoying Osvaldo Martin’s “Bayesian Analysis with Python” (I know Not Monad Tutorial interviewed him not long ago). It’s a great book that approaches Bayesian analysis through PyMC3. I really think the world of probabilistic programming is very exciting and will be more and more an essential part of practical Bayesian statistics. Another book I really want to read is Richard McElreath’s “Statistical Rethinking”. It has a second edition coming out soon so I’m slightly hesitant to get copy before that. McElreath has put up a bunch of great supporting material on his website, so I might not be able to wait until the 2nd edition to get a copy. Both of these sources would be great next steps following “Bayesian Statistics the Fun Way”. Another good recommendations would be Kruschke’s “Doing Bayesian Data Analysis”.