Menu

Case Study: TensorFlow in Medicine – Retinal Imaging (TensorFlow Dev Summit 2017)

0 Comment


[MUSIC PLAYING] LILY PENG: Hi, everyone. So, I’m Lily. I work on our medical
imaging team in Brain. In a previous life,
I was a doctor, and I’ve been repurposed as
a product manager at Google. [LAUGHTER] One of the projects that we’ve
been working on in our group is using deep learning
for retinal imaging. In particular, we are
looking at a disease called diabetic retinopathy. Other than a mouthful,
it’s actually also the fastest growing cause
of blindness in the world, and it’s because this is a
complication of diabetes. There are 415 million people
in the world with diabetes, and each one of them
is at risk for going blind due to what we call
DR, or Diabetic Retinopathy. The key to preventing
blindness is regular screening. Every guideline worldwide
recommends about once a year screening, and it’s because
this is pretty asymptomatic until you get to a point where
there’s irreversible vision loss. And at that point, it’s a
little too late to intervene. This is done by taking a picture
using a specialized camera of the back of the eye
through your pupil, and then a doctor
grades these images. We look for these little
hemorrhages and little spots on the image, and we grade
them on a five-class scale from no disease to
the end stage, which is sort of proliferative DR. In many places in
the world, including in India where our
story originated, there are just simply not
enough doctors to do this task. In India, there is a shortage
of 127,000-some eye doctors, and because of this and
other systematic issues, about half of people actually
suffer vision loss before they’re even diagnosed. For something that’s
completely preventable, this is sort of unacceptable. Here is a picture of
some of the people who are waiting in
line to get screened. Even if you get to a place
where there is screening, there is a long wait. There is long turnaround
time, and so a lot of people end up being lost to care. The other issue is that
even when available, doctors are
surprisingly variable. Here in this graph, each color
represents a different class of category of
disease, and each row is a patient image
of that fundus image, and each column represents
an ophthalmologist. These are US board-certified
ophthalmologists, and we had given
them the test set when we were trying
to attack this problem to see what the grades
were for each of them. And as you can see,
when there’s no disease, there is pretty good agreement. There’s one person
who thinks otherwise, but everyone is the
consensus is there. And then, of course, you
look at the end stage when there’s
proliferative disease, there’s good
agreement there, too. But in between,
there’s actually a lot of variability and
disagreement about where this should actually
fit, even though there are pretty well-known
guidelines, and it’s because human
beings in general just aren’t super great at
being very precise about what we see in that image. And of course, you can see
the two highlighted rows there in black. These images got every
grade in the book, right? So depending on who you
saw, your management would be kind of different. A little bit more
about that later. Where we thought we could
help was let’s train a model. And so we actually
built a labeling tool. We started off with
130,000 images, and we’ve gotten
much more since then, and we hired an army
of ophthalmologists to help us label. And from our 54
ophthalmologists, we got 880,000 diagnoses
for these images. And you can see from
the previous slide why we did that, because
sometimes it took up to seven reads to get
something consistent. Then what we did after
we got this data cleaned up and labeled, we used
our trusty, dusty inception network that works for a lot
of image recognition tasks, from cats to puppies to
melanoma, and now DR. And we trained it to detect
these five class predictions, but we also asked it to
predict housekeeping things that may be important
for a clinician to know– whether
or not this image is of sufficient
quality for grading, whether or not this is
a left or right eye. Sometimes we get confused. And also the field
of view, which is like what part of the
retina you’re actually seeing. And then we built a
front end to this. I’m going to try a demo here. This is literally
what I call a toaster. We try to drag and
drop something. I don’t know actually how
to– how do I move the cursor? Oh, there we. Are So I’m going to
open up a web browser, and hopefully that works. I’m going to drag one
of our images over. I can’t see it. Oh, there we go. And it analyzing. It should be
faster, but the demo gods– oh, it’s cooperating. So here we are able to
tell you that there’s proliferative disease here. There is no what
we call DMU, which is a different
type of DR. And we are saying that this
is somewhere between moderate and severe, and
this is indeed something between moderate and severe. I kind of showed you how it
works on a case-by-case basis, but then how does it work
over a lot of images? Well, here we actually
published and shared how we did this work in the
“Journal of the American Medical Association,” and
this is one of the tests or the validation
sets that we use. The model was not trained or
tested on this previously. And out of 9,963 images,
we predicted whether or not it had referrable disease. The y-axis is sensitivity. The x-axis is 1
minus specificity. And our algorithm and
the two black dots, if you can actually
see it, is in black. That’s our algorithm. And then the little
colored dots are US board-certified
ophthalmologists. And to the left is good, and you
can see that essentially we’re very close to most of the
ophthalmologists in terms of performance. And in fact, if you
look at our F-score, and you compare the algorithm’s
F-score to that of the median, ophthalmologists were sort
of in the middle of the pack. One of the reasons we also
decided to publish in JAMA was because we believe that
engaging the medical community is really important to
get these technologies out into the hands of the people
who could actually use them. It was actually
quite well-received. You can see some quotes from
real doctors about our work, and so we’re really
excited about that. How did TensorFlow help us? Well, every step
of the way, I think it helped us really start
with quick prototyping. We had started our architecture,
pre-trained models, and we actually were able to
try out different variations of neuro-networks, and
we actually found– I mean, we literally
found that Inception– B3 at this point– worked the best. But we could try out
things very quickly. And we also pre-trained. So we actually pre-trained
on the classic image net, and we found there was a
boost in performance there. It also helped us to
experiment at scale, so GPU support and fast training. This allows us to run all
these different experiments, different sort of labeling. If we had new labels
or different labels, it kind of helped us do that. And finally, what I
think is really important is that it really allowed our
team to reinvest the efforts, so the blocker was no
longer in machine learning and in training. That’s been hard to do, and if
you look at what we did here, we actually applied very
straightforward ML techniques here. What was the magic
sauce was actually finding the right
problem, getting the data, getting agreement about
what was in the image, and then we were able to use
this package, these tools that were able to allow us to train
the models, that actually performed really, really well. And that also, then,
allows our team to focus on validating
the algorithm and figure out ways to deploy
it into health systems, which in itself is a huge challenge. What’s next for us? We train a model. It works really well. Now we need to actually
clinically validate it. We’ve been working with two
hospitals in India, Aravind and Sankara, and they’re
running clinical trials of the algorithm as we speak. Actually, Aravind’s
finished, and they have found essentially
the same results– that we were slightly
better than the average of their ophthalmologists there. And so what we’re doing is
working with a fellow Alphabet company, Verily, that’s a
life science-focused company, and a hardware
maker called Nikon. You may not have heard
of that little company. But the idea is now
that the algorithm works pretty well,
the bottleneck becomes the hardware, because we
need a specialized camera to take these pictures. So we’re working with the
hardware manufacturers to essentially figure out ways
to deploy lightweight hardware that’s easy to use, et cetera. Taking a step back,
one of the main reasons I got into medicine
was I was an MD-PhD, and so I really was very excited
about bringing breakthrough science from bench to bedside. And there’s a part of you,
when you go through training, and you’re a PhD, and
you’re like, this is never going to happen,
because it’s just not possible to solve
these problems. And with TensorFlow and all
the work that’s been done here, it’s actually
possible to do that. It’s possible to
train algorithms that really can help
physicians deliver care where people need it most. [APPLAUSE] [MUSIC PLAYING]

Tags: , , , , , , , , , , , , , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *