You may find the stuff on “opening the black box” interesting - SHAP (their GitHub readme is great), Zhao and Hastie's paper on causal inference. For tree-based methods (e.g. random forests, gradient-boosting machines), explainability is fairly solved. For deep learning on images, explainability is also fairly straightorward. Deep learning on tabular data is far less solved (e.g. the Google readmission paper - had to use a handicapped version of their neural network for explainability, and even then was not super useful). Thankfully, tree-based methods are often as good or better in terms of performance on tabular data.

In my opinion, in all healthcare-facing ML, explainability methods should be baked into the pipeline. It's so easy now, literally a few more lines of code to generate the model that explains the model (incorporating the output in a meaningful way is a little more expensive in terms of code, but if I can figure it out then so can any researcher). The thing we should be building is not a model that is reapplied across systems, but a pipeline for data ingestion, model-building, and explanation that could be rapidly deployed in different systems (e.g. for internal testing and validation). This would still disadvantage poorer systems with less ability to collect data, but maybe not by much - everyone has CBCs, data on discharges and admissions, basic demographic information, etc., and almost everyone uses some kind of EMR.

On the anthropomorphic relationships with computers, have you heard about https://en.wikipedia.org/wiki/ELIZA ? It was developed in the 60s at MIT, simple rules-based conversation “bot,“ and people developed deep personal connections, on a level equalling or exceeding the current relationships people are building with Alexa (I would argue exceeding, as the conversations were purposefully therapeutic and do not avoid sensitive subjects).

Would be interested in thoughts on why, exactly, Google/Amazon/etc. are needed. We have clinician-scientists, PhD researchers, collaboration with other universities (e.g. CWRU has great comp-sci). We can do NLP, DL, whatever, on our own. Google would have some extra data on searches, all the stuff they can get from cookies, etc., phone data. Amazon would have similarly rich data. But we get into some of the same issues we've seen trying to use Facebook, Twitter, etc. - there's a huge selection bias with who uses those services, and, even if usage nears 100%, the type of usage will vary between pts by orders of magnitude. This could help increase predictive power, but how much? This is a whole different ballgame than simply using AWS or Google Cloud to store or even process data, which is also of questionable utility (we have some rockin’ servers over at LRI, and they're buying more). Sun 17 Nov 2019 10:23:05 AM CST