- This event has passed.
Krzysztof Geras – Towards solving breast cancer diagnosis with deep learning (AI PHI Affinity Group)
April 7, 2023 @ 9:00 am - 10:00 amFree
Thank you for coming!
The recording for this meeting is available below:
Towards solving breast cancer diagnosis with deep learning
Although deep learning has made stunning progress in the last few years, both in terms of engineering and theory, its real-life applications in medicine remain rather limited. One of the fields that has been anticipated to be revolutionized by deep learning for some time, yet proved to be much harder than many expected, is medical imaging. In this talk I will shed some light on my 7-year long journey in developing deep learning methods for medical imaging, in particular, for breast cancer screening. I will explain how we created a deep learning model that can perform a diagnosis with an accuracy comparable to experienced radiologists. To achieve this goal we needed a lot of perseverance, novel neural network architectures and training methods specific to medical imaging. I will also discuss the limitations of our work and what can likely be achieved in the next few years.
Presented by Krzysztof Geras
Krzysztof is an assistant professor at NYU Grossman School of Medicine and an affiliated faculty at NYU Center for Data Science and at Courant Institute for Mathematical Sciences. His main interests are in unsupervised learning with neural networks, model compression, transfer learning, evaluation of machine learning models and applications of these techniques to medical imaging. He previously completed a postdoc at NYU with Kyunghyun Cho, a PhD at the University of Edinburgh with Charles Sutton and an MSc as a visiting student at the University of Edinburgh with Amos Storkey. His BSc is from the University of Warsaw. He also completed industrial internships in Microsoft Research (Redmond, working with Rich Caruana and Abdel-rahman Mohamed), Amazon (Berlin, Ralf Herbrich’s group), Microsoft (Bellevue) and J.P. Morgan (London).
Artificial Intelligence in Cancer Research – AI PHI Affinity Group
(First Friday of each month)
This group was formed to discuss the current trends and applications of artificial intelligence in cancer research and clinical practice. The group brings together AI researchers in a variety of fields (computer science, engineering, nutrition, epidemiology radiology, etc) with clinicians and advocates. Students, trainees and faculty with any or no background in AI are encouraged to attend. The goal is to foster collaborative interactions to solve problems in cancer that were thought to be unsolvable a decade ago before the broad use of deep learning and AI in medicine.
I think this is a good time. So thanks, everyone, for coming. We have a very exciting talk today by Krzysztof Geras from NYU talking about how to solve breast cancer diagnosis with deep learning and all of the challenges and opportunities and developing these models. From the advertisement we’ve seen that you can’t use out of the box convolutional neural networks at all times, there’s a lot of other model challenges—model development challenges—and other challenges that are required to do it well. And Krzysztof has been tackling this for many years, so we’re excited to hear his insights and to what needs to be done to get really good performance with low false positive rates and false negative rates.
So we where we will hear from leaders in the field and discuss general trends in AI for health care. Our goal is to bring together AI researchers in a variety of fields. So we’re highly interdisciplinary, spanning computer science engineering, but also epidemiology, radiology and nutrition and and bringing together scientists, but also clinicians and also advocates. So we want to bring everyone together to hear from the experts and to discuss what the next steps of the field should be.
So the format of this is roughly a 30 to 40 minute talk in. It’s very rough. It can be longer or shorter. And then an interactive question-and-answer session. And if you have any feedback or speakers suggestions, please send them to John and or me.
We are hosted by the UH Cancer Center, the Hawaii Data Science Institute, [and] the AI PHI.
We’re really excited to host this. We also like to do a paper of the month and I thought a nice paper of the month—which came out just in 2022 from Science Translational Medicine from Krzysztof’ Lab—on improving breast cancer diagnostics with deep learning for MRI, which I assume we’ll hear about some of the work in here today. But Krzysztof has published interestingly in both the top medical journals, but also the top six conferences like CML. So he truly does span both areas as well. And so we’re really excited to hear his insights.
We also have a speaker compensation for our all of our speakers, a beer Stein branded with the IPA logo. So Krzysztof, you should be receiving that very soon.
So without further ado, I will let you share your screen Krzysztof and thanks so much for coming on.
So actually, funnily enough, you know, I think the paper that you mentioned… I really like this paper, but actually, I think it is—it’s a very well executed [and] it is a very well executed study—But I think that it’s kind of building upon many of the ideas that we have, you know, that we have done that we have used in the past. But actually, I think there are other works where we have developed more innovative technologies that kind of went, you know, slightly unnoticed.
Getting back to why I’m here today, basically I think the format of my talk is that I will just try to give you some some insights into what I went through in the last six, seven years to develop these models.
Obviously, I have some slides, but I think my slides are fairly, you know… basically I’m very happy to answer to answer the questions during the slides. I’m very happy to even not to talk about everything I have on my slides, but to focus on some elements that are more interesting to you. I’m also very happy to go deeper into certain aspects if you find them interesting.
So maybe to begin with, I would like to argue that deep learning really is the right tool for medical image analysis. And I think it is somewhat clear now, but I think it hasn’t been clear for very many people until relatively recently. But it is the right tool for this very basic reason that it is the on the right part of this chart, right> So we can think of basically all learning tasks as being on this 2D plane. Where the the x axis is the difficulty of the task and the Y axis is the number of available data.
And in this bottom left corner, you know, we have those learning tasks for which we don’t have very much data, but those were relatively simple. So, you know, a classic example of this would be something like IRS specification. Everybody has you know, everybody has tried in their very first machine learning exercise here. Here, if you have a decision tree, you can solve it to a very high degree of quality. Very often you can’t solve it perfectly, not because you don’t have the right method, but maybe because of the noise of the labels. But classic machine learning methods are completely sufficient there.
Then, if you have easy tasks and a lot of data… I mean, it’s not interesting because you can already solve them with smaller number of data.
If you have a very difficult task but you have very little data, then practically you cannot do anything. So it is also not very interesting because you just—whatever you do—you’re just never going to be able to to solve this without some incredibly strong inductive bias coming from either modeling or, for example, transfer learning. These are the tasks that are not particularly I mean, usually… you can never do anything in them.
And then there are there is a corner of that. There’s a corner of this chart where you have a lot of data and for difficult tasks. And this is really where deep learning is very useful because it has the power to model very complex phenomena in the data while at the same time has very good generalization properties.
This is an argument that should convince you but also experimentally there were plenty of successes already in medical imaging with deep learning methods. Whether this is ophthalmology, pathology, radiology of different kinds, and dermatology. At this point it is pretty well established that deep learning really is the way to go with medical data. And I think nobody would really question this these days. In particular in my talk, I’m just going to give you a small snapshot of those different opportunities by describing my experience with working with breast cancer data.
So why is breast cancer even important? Well, it is primarily important because this is a very common cancer. And going back to my previous slide, there’s a lot of data. So there is actually about 40 million cancer screening exams performed in the US… 40 million only in the US performed every year. And there is actually a lot of women who are diagnosed with cancer. And breast cancer is known to be deadly. Actually, about 30,000 women lose their lives to cancer. So it is a very good task for machine learning because we have data to solve it. And it’s an important problem. So if we make a relatively small progress in it, then there is a potential to impact the lives of very many people.
And as you may probably know, in most in most places, breast cancer screening is implemented by taking those four images of the breast because a healthy woman typically has two breasts and there are two views of the breasts. The views of the breast, the MLO view and the CC view, indicates the kind of an angle at which the breasts are squeezed.
What I’m showing you is called full field digital mammography. There are also slightly more modern versions of that now, which is called digital breast tomosynthesis. I’m going to present some of these ideas for full field digital mammography but also many of them we later applied to DVT.
So where does this actually fit in this full imaging workflow for breasts? Right. So as the name suggests, screening happens at the very beginning of this workflow. If there is a healthy, asymptomatic woman, they come for a regular yearly exam.
The good outcome is if a doctor says this is certainly a healthy patient, there is no need for further imaging. However, if there is a possibility that this woman has cancer, then we don’t go straight to a biopsy but there are some intermediate modalities to exclude the possibility of cancer. So there is an ultrasound or diagnostic mammography. Sometimes they are done—well, depending on the depending on women—one or the other is preferred. Sometimes they get both.
If these imaging modalities still don’t exclude cancer, then typically the next step is MRI. And if the cancer still can’t be excluded, then then there is no other way to look whether this person is healthy or not except pathology. So basically pathology consists of putting a small metal tube into somebody’s body. That’s a biopsy. A small metal tube is put into somebody’s body, a piece of tissue is taken out, and then someone looks under the microscope and then there can be a conclusion whether this person is healthy or not. Essentially what we would like to do ideally is to take the inputs of screening mammography and to be able to obviate the need for these intermediates imaging procedures.
So what we would like to do is… we would like to take that screening mammography and conclude with high confidence that there is no cancer or otherwise. If we predict that there is a benign or malignant change, those procedures will still have to be conducted but we are just hoping that in an overwhelming majority of cases, we will be able to say this person is healthy and there’s no need to do anything. And therefore, we could possibly save a lot of anxiety for the patients and also a lot of money for the hospital system. O
But basically, how do we how do we frame it as a learning task? We say that we have these four images as inputs, we throw this into a learning model, in our case, this is a neural network, and then we make a prediction for whether in either of the breasts there is a malignant lesion or there is a benign lesion. Although we are not really interested in identifying benign lesions that much, it is useful to also use the benign lesions as targets in that procedure because it regularizes the learning. This is an instance of something which is commonly known as (??).
Now I think we all agree that it’s easy to download PyTorch from the Internet and everybody can basically train neural networks. Like you don’t need to be any kind of a massive expert in machine learning to actually train neural nets. So, why is this even hard? LIke why do I need to work at a university and give talks about this stuff. Well, because it’s not it is not as easy as it seems.
So first, it is a very important operation. The problem of doing any kind of meaningful research in medical imaging image analysis is that there are relatively new public datasets of high quality. So while there’s a lot of great datasets of natural images, for example, ImageNet… even now the only really public dataset of mammograms is only 10,000 images. There now exists somehow semipublic datasets which are bigger but they require some agreements rather than… they’re not just freely available to download from the Internet for everybody.
So to do any of those research, we have to actually collect data from our hospital system. And in the initial version of the datasets… so this is our original datasets from 2017. In that dataset, we actually had 1 million images. Now I think we have about five times more images that we are working with.
So first, you know, this is a really important struggle for researchers that are not connected necessarily to a large hospital system. And then the other thing is that there is a more fundamental issue with medical images. And I think a lot of the differences in how neural networks for medical images should be different from neural networks for, like, making classifications in something like ImageNet comes from what I’m about to demonstrate.
So if you look at the bottom left corner, you will see something. Could someone tell me what that is? This is part of an exercise.
It’s a panda.
Yes, it’s a panda, indeed… So natural images can be sub sampled insanely without the loss of information necessary for classification. The subsample of the images actually may be a thousand times smaller than the full image. And yes, you can still tell that this is a panda. And that’s because in natural images, the prediction of the class of the image is typically made based on some features that span large fractions of the image. And that is just not the case in most medical images, it is the opposite.
So this here is a very small breast mammogram. Very small mammogram. You can’t tell whether this person is healthy or not. Even the full scale image, most radiologists would decline to make a prediction because they would say that this is not a diagnostic, this is wrong screen… They basically need a very particular environment in which they are actually comfortable in making these predictions. And that is because very often the objects that determine whether this is a healthy person or whether this person has cancer are actually extremely tiny. So we cannot apply the same trick as we do with natural images. We cannot just take this huge image, make it very small, and then just train regular neural nets. We have to accept that those images are huge and you have to accept all of the consequences, coming with this. So we have to adjust our architectures to deal with these massive images and therefore we have to modify them such that the memory in those artworks can be managed to work effectively.
In summary, the public datasets are very tiny and the hospitals are really not very keen to share data even between themselves. In addition, actually labeling medical data on pixel level is difficult. It is difficult both operationally because there is a small number of people that can perform this kind of labeling accurately, and you have to put them in front of a computer for many hours to obtain the datasets.
And it is also difficult, well, it is difficult objectively, because sometimes based on the image it is difficult to say where the cancer begins and where it ends. So there’s some inherent uncertainty in the patients we are collecting. And then what I’ve illustrated in the previous few slides, medical images actually have very different properties in their neural networks and others for natural images. Therefore we have to design different kinds of neural networks for these images and also the standard neural network architectures that everybody works with on a regular basis just don’t have any direct mechanism to explain the predictions. And it is possible to, of course, obtain these kind of explanations post hoc. But as I’m going to show you in a minutes, it’s actually better to actually make the explanation as part of the prediction.
And also I think the same way philosophically, it is difficult to evaluate the impact of machine learning in medicine because it’s a actually… to be very certain that we have actually made an impact, typically we need to put some kind of a prospective study that would take years to be conducted and then for data from this prospective study to be evaluated. It is just much harder than in many other applications of machine learning.
So now I’m going to show you one idea of how we solve some of these problems. Just as an illustration. We call this neural network globally-aware multiple reasons classifier. And I’m going to unpack this image for you step by step.
So in the first step, what we did is we have created something which we call the global module, which is essentially a neural network similar to a resonant. However, it would be typically a fairly hollow resonant so that we can feasibly store its internal space in the memory of a GPU. And the other modification that we have done is just using a special layer so… a typical neural network wouldn’t have the sigmoid layer, it wouldn’t have this convolutional layer and it wouldn’t have these… A typical resonant has some number of congressional layers. Then it has global average pooling. And then there’s globally average pool representation that is used to make classification. And what we are doing here is actually is something quite different.
So what we are doing is we are actually taking those convolutional representations and we are squeezing those representations through to only two feature maps with a one by one convolutional layer. The sigmoid layer [is] applied as a linearity and therefore we are pushing… basically what we are obtaining is, what we call, two saliency maps that we can then project back at the original input image. And what we are hoping is that the saliency map indicates the locations of the objects that indicate the class.
Then to obtain the class prediction, we do global average pooling, then that is our prediction. But through this trick of creating this saliency map, that is exploiting some properties of the neural networks—of the convolutional networks—we are essentially obtaining an explanation as a part of the architecture.
So we have this… so what we have is a prediction based on this low capacity network. And then we have this explanation in the form of the saliency maps that indicate those bits of the image that indicate the class. In our case, it will be either benign lesions or malignant lesions. Of course, now, because this is a relatively low capacity network, it’s won’t be correct every time. So what we have to do is we have to look at these saliency maps, we have to extract a number of patches, and then we can look at them with some higher capacity network and that is… a more standard resonant with a larger number of layers. From each of these patches we have got a feature vector, we can use the attention mechanism to form an attention to the presentation. And then, again, create a prediction based on this collection of patches. And then finally we can also take the representation from those formed by this attention weighted representation for the patches. We can take the presentation from the global entwork, and that’s where we can have some representation that takes into account both.
I think this is, you know, this is one way of doing that. Obviously there are other designs that would work similarly, but this one works very well.
The big advantage of this design is that it’s actually incredibly simple to train. Basically all that we do is we just, without even any weighting, we train these three optimization objectives and we add a little bit of regularization to the saliency maps. And so this regularizer is just an L1 is another one loss to… so that… the saliency maps are sparser so we don’t have these kind of values that are just near zero. But if they’re close to zero, then they get shrunk to zero. So this is how we get these contiguous regions like here.
Hey Krzysztof, that a couple of questions on the architecture.
First of all, why do you have two saliency maps?
Oh, because we make predictions for benign and malignant lesions. So one corresponds to benign lesions, one corresponds to malignant lesions. But we could have just one if we just only want it to identify malignant lesions.
And then my other question was, do you train these models separately? Like can you train each one independently and then plug them together? Or do you train all of them from scratch?
I think you could train one after the other, right? So you could probably train this one first, then this one, and then this one. But we just train them all together because it was the simplest thing to do. And it works very well. If that didn’t work, then we would then we would assume that it’s a harder optimization than we would probably have tried training them separately and then train it together. But it just wasn’t necessary.
Okay… And it actually works, right? So, I mean, this is just one example, but this is annotated input. Those are the patches that this neural network found. And these saliency maps… what you’re going to notice is that saliency maps indicating benign lesions and malignant lesions are both activited at this location and that’s for a good reason. It’s just, you know, very often it’s just the benign leisions and malignant lesions are just indistinguishable. It’s just a limitation of mammography. You just cannot hope to completely overcome this with neural networks. But it correctly identifies the lesion that was later biopsied… even though this notation was never part of the training, that’s the important part.
Okay. I mean, I won’t go very deep into this, but this is just to convince you that this was working very well. Basically, what you see is a number between zero and one where one indicates the perfect prediction. And [for] our example of this GMIC resonance we’ve got 0.9 AUC, which in our experiments was actually higher than clustered R-CNN which was actually trained with the bounding boxes as well. It is very computationally efficient.
You can even find cancer in some cases that radiologists consider mammographically occult. That’s an example of that phenomenon.
So this is an image from a woman that came in for a screening mammogram. The radiologist said that it is somewhat suspicious because there are these dense tissues, but they don’t actually know whether there is any cancer there or not. So they will ask them to do an ultrasound.
And the ultrasound confirms that this person actually does indeed have cancer.
While our model was actually capable of indicating the location of that cancer without the ultrasound. So we retrospectively… we ran it retrospectively prospectively. We asked the radiologists to confirm with the report that that’s actually indeed the location where this ultrasound image was taken.
I think this is just a very interesting case of how a neural network cannot just learn to mimic a human but, through this mechanism of us making an explanation part of the model, it can actually even discover something. And so I think that’s I think this a very interesting direction in general.
In human performance, you know we have a reader study where we have 720 exams and seven attending radiologists and we asked them for a prediction of probability of malignancy. Well we are comparing in terms of predicting the malignant lesions. Because predicting the conditions isn’t that interesting in this setting.
And what we found is that I mean it’s in the paper but so I won’t go into great details in this. But when it comes to common ways of evaluating neural networks in this context, the UC curve and the… well, the ROCC curve and the precision recall curve.
We have been able to show that if we take the predictions of the radiologists and we average them with the predictions of the neural network…
Then all of these radiologists go up in their performance. So all of these curves for all of the radiologists go up and obviously the average is also higher.
You can download this from the Internet. It is available on GitHub, both the weights and the model measure. You can do whatever you want with that, just as long as you, you know, you stick to the GPL license. Which means that you have to release your code on the same license.
Because we realize that it is difficult to compare these models, we have also created a method repository of the models where you pick a dataset and you can pick a model from a collection of models that we found on the Internet and our own models and it enables very easy comparison between different models. So I also encourage you to look at that.
And of course, like I mentioned before, these days, digital breast tomosynthesis is a… it’s already a more popular imaging modality and is going to dominate FFDM eventually. So we also applied similar ideas to 3D and it also works very well.
So you can see that as we as we look at this, the different slides of the DVT show up and… also here… a very similar model works very well.
Also you can apply the same ideas, or very similar ideas, to breast ultrasounds. We also have a paper about this. Here we are creating an attention-weighted presentation among different images that are parts of the ultrasound exam. We can compute the importance of different image scores and we apply these two different scores for different images. And so we have essentially three outcomes of this. So we have the image level scores, we have the pixel level scores for each image, and then we have the breast level prediction.
Experimentally we were also able to show that this is very effective. Here we were able to, in terms of the ROC curve, we were able either to beat or to draw all radiologists that participated in the research study. There was one real champion radiologist. There is one real champion radiologist in NYU who is always beating all other radiologists in all other studies and this radiology champ, we were not able to beat her but she’s really amazing. So yeah.
But it’s the same here, but it’s the same way in terms of the position recall. Only one radiologist. We were not able to either beat or draw.
And in terms of we actually, again, we looked at how our AI corresponds to the performance of the radiologists individually and the hybrids between radiologists and and as you can see, we were able to improve the performance by averaging the prediction of radiologists with an AI, we were able to improve performance of all radiologists in terms of AUC. We’re capable of increasing their specificity. We’re able to decrease their biopsy rates and increase the positive rate value.
Basically what we computed, I mean… the summary of this is that we computed specificity, a sensitivity for radiologists, and then we looked at AI specificity at radiology sensitivity, and it was higher than radiologists. And we also looked at AI sensitivity and radiologist specificity. And we also were able to improve these numbers.
Now… where do I see this actually going forward? I think we should on the high level, I think we should go beyond this paradigm that I can only support radiologists in existing tasks. I think it’s very limiting to think about it this way because… well, primarily because human brains and neural networks don’t work necessarily in the same way. And there is such a great wealth of tasks that are just completely implausible for humans while they are potentially relatively easy for neural networks. So let me just give you an example, some examples of what I mean by this.
So this is what radiologists essentially do, right? They look at the images. Let’s say we are in the year 2021 and they look at those images and they make a prediction for whether this person has cancer. That is essentially what they do. Now, there’s nothing stopping us from asking the neural network whether this person will develop cancer in 2023 or maybe 2026. While this is totally inconceivable for a radiologist to make this prediction accurately, a neural network has a much larger capacity to develop some high frequency patterns in the data, such that these kinds of predictions are actually plausible. There are people I know that there really are people who actually working these directions and these kind of networks work reasonably well already.
Furthermore, we can actually very effectively compare the predictions… we can very effectively compare the images from the current year to the images from the prior years. And while radiologists do this themselves to a certain degree, we can do this much more efficiently using neural networks. We can very efficiently discard very many potentially suspicious lesions just because they exist in the prior images and they are stable.
Maybe even more importantly, something I am personally actually working very hard on right now, is making sense of multi-modal predictions. It is very limiting to think about one type of imaging at a time when we actually have so many different types of images. We have the screening mammograms, we have the diagnostic mammograms, we have the ultrasound, then we have the MRI. So why not just take them all together, put them into some kind of a neural network that can reason about the different elements in these images and then make a prediction jointly based on these. So I could just tell you… well the paper that we have on this is not a published… but I can tell you that this is working very well.
Furthermore, building on the kind of representations that you can build from multimodal data, you can actually answer a lot more interesting questions. For example, not just whether this person has cancer, but maybe what type of cancer this person has or, you know, or if this person does have cancer, what is going to be their response to some kind of a therapy? There is just I mean, there’s just so many different things that we could use based on this data than we are currently doing.
The topic that I also think is very underexplored in machine learning and in application to medical imaging is actually optimizing the workflow of how the threatened images are acquired. So this is… what I’m showing you on the screen is a somewhat simplified imaging workflow for a person who’s coming to… So I specifically extracted this from my colleagues at NYU. There are very many different paths a patient can take. And all of these differentpahts are somewhat based on a mixture of existing guidelines, experience, availability of the imaging resources, etc., etc… Right. So the same patient can actually have a very different path here depending on some random factors.
This actually presents a perfect opportunity for AI to actually optimize this. And I think this is not just in breast imaging, but in very many different applications of imaging. I think there is a lot of there’s a really potentially a huge impact of optimizing these, especially as all of these procedures are very expensive. For some people certain procedures are more appropriate than the others. So there’s a huge cost that can be saved if we do this.
We are also currently very limited in the research that we have done on the human-AI interaction. In the results that I’ve been showing you, we were just averaging predictions of the AI with the predictions of radiologists. But this is a very cartoonish way of introducing their interaction. There is a possibility of very many different scenarios. Whether we use the AI as a second reader or whether we use the AI as an aid during the reading time or whether we use the AI as some kind of a controller that maybe is looking at a certain fraction of the cases which are either very high or very low. So if they are very high on the list, then maybe the radiologists even thought that this person is healthy, maybe I should override the person’s prediction or if the AI thinks that some cases are completely normal and the radiologists decided to recall maybe we should actually override the radiologist’s recommendation.
So there’s a lot of there’s a lot of questions about how these two issues interact. And I think this question also goes beyond just machine learning is more of a question of psychology and human computer interaction. I personally expect that there is going to be a lot of research in that direction in the coming years and I think is actually fascinating and very underexplored.
A different direction that I am a strong believer in, is that we should start discovering knowledge through machine learning, especially in medicine. What [do] I mean by this? Well, if anybody ever learned machine learning knows this kind of a pattern that if you are presented with a new machine learning problem, you are supposed to read some literature of the domain or talk to the domain expert and then this expertise that you acquire through yourself or from someone else will lead you to design a network that is appropriate for this particular task. And that obviously not a crazy way of thinking about this. However, these days we are actually training these neural networks on the amount of data that no human will ever be able to look at. So there is no chance that the human radiologist is going to analyze 5 million imaging exams. Right? There is just no time. There is not enough… there’s just not enough time in our lifetime to do this.
The direction that I really would like to make work is that we should be able to actually… train the neural networks with these massive datasets and then be able to squeeze some knowledge out of the neural networks that would become some kind of scientific knowledge.
So in conclusion, deep learning really is the right tool for medical image analysis. I think at this point it is without a doubt. A lot of progress has already been made. Neural networks, such as the GMIC, which I presented, will definitely be used a lot in the future because they are efficient. They can learn from on the image level labels and they produce explanations natively as part of their training and inference.
But there is a lot of exciting room still. There’s a lot of room for exciting work, especially for 3D and multimodal data. AI is actually already superhuman in many medical imaging interpretation tasks. And that effect is going to become more and more apparent to everybody in the next few years. I think this is something that people begin to realize but have a hard time accepting but this is going to be obvious. If you invite me in five years to give a talk I’m very sure that nobody is going to… there’s not going to be this kind of a conversation like, “Can AI detect things in images better than human?” It is going to be very obviously in favor of the AI.
But it doesn’t mean that our work is done. We actually have to understand how to use these models and deploy them in a safe way such that it benefits the users.
Obviously, this kind of research cannot be done by one person. It’s a large team of students and collaborators that leads to execution of this work. Well, actually, I’m showing here a subset. There’s plenty more than I actually have here.
So thank you very much and I am happy to answer any questions.
That’s great. Thank you so much, Krzysztof. Let’s open it up… It’s too bad we can’t applause because, you know, I always feel like on Zoom calls you can’t really applaud, but really, really fantastic talk.
I have a lot of questions, but let’s open it up to the floor to see who new has some questions for Krzysztof. It’s a great opportunity while we have him.
One of the things that came up and I was thinking about that you said about that super radiologist and though those guys exist and they’re fantastic and what always comes to mind when I think about people like that is that their lifetime is limited and they’re not going to be with us forever. And we’ve spent a lot of time and effort training that person. And when they’re not with us anymore, that’s gone. So, you know, your AI program, once it’s trained, can be copied and multiplied and basically live forever, right?
But it’s become a problem in lots of the areas I work in and that you get these great algorithms but the way that you try to implement them… it really goes against the standard… I guess the clinical pay structure.
For example, in one field I work in in bone, they have a great algorithm for looking at fracture risk from CT exams. It can run automatically with no interaction. They come up with a wonderful prediction that everybody likes, but they want to put that into a standard paradigm, which means that you need to have a clinical indication to even run the algorithm. And every time they run it they want to be paid. And it’s not small like a dollar. It’s significant, right? It’s the reimbursement that you would get from a normal radiologist interaction for doing that reading. And it basically steims the whole thing. You would like to take these algorithms many times and run them opportunistically. For example, deploy your algorithm in a health care center that’s never had a AI and do a baseline and say, how many did we miss and how many you know, what new information can we learn on our entire population? But just the cost structures are not really set up to do.
Have you put any thought into that about how medicine can go more the way of kind of the ideal of open A.I. where you put a lot of money into development, people are well compensated for their time and effort that they put into development, but the algorithms then effectively a service to society instead of a big profit center? What are your thoughts on that?
Yeah, I agree with you. I mean, I’ve had similar thoughts about this… Okay, so let me give you like an answer on several layers. So approximately two years ago, I was… okay, so I know some people in a large foundation that has several billion dollars of money that they want to invest over the next years. I was trying to convince them to allow me to create an institute where… I think I proposed $30 million back then. To basically create this thing, let’s say, which we would call like models that everybody can access for a variety of different kinds of types of data. That’s… of course, I’m not a director of this institute so that proposal did not go through.
But I think that’s the future of doing this. So there’s definitely… there is no reason for so many different small companies trying to create competing tools that do either exactly the same or very similar things. I think there’s basically either going to be at one point a government push or there’s going to be some kind of enough momentum either in charity or some private sector to create these kind of models that everybody can build on top of.
I think to a certain degree it is already happening. You know, for example, NVIDIA I think releases these kind of models that they call foundation models. So, eventually… it has to go this way. That’s going to be the natural evolution of many different fields. Someone is going to create this massive model that’s going to be the foundation for building some kind of more specialized models by using the representation learned by this massive model.
Yeah. I mean, when I go to the doctor and I have an exam, I would like to have any opportunistic algorithm run on it that is possible to give me as much information as I can every visit. And that’s definitely not happening now.
I know. Yeah. But basically I think this is going the direction that you want, in my opinion.
So I feel like if you had a giant data set that was publicly available and anybody could train on… that would, might be just as good, right? I think the data is kind of a bottleneck for a lot of these things. So can you say something about the dataset you have? Is that the biggest mammography dataset in the world?
Do you have any thoughts on federations? And who’s the big players in giant datasets?
I think my dataset is probably not the biggest in the world right now. I think there are actually… just at a certain point, I concluded that my dataset is big enough that I can do machine learning on it. And I didn’t put that much effort into expanding the dataset with data from other hospitals, you know, etc..
I think there is the dataset created at Emory University, which is sort of semipublic, which is actually larger than my dataset. So the way [its] semipublic is that I believe they allow everybody to access 20% of it after some kind of a vetting process. But, you know, I imagine that if you collaborate with them maybe they will let you access everything.
For this kind of data, it is harder than other types of data because those because those images are actually huge… So one problem is that… like I’ve been trying to release these kind of datasets for years and I’ve been trying to approach it from different angles. Basically every time I tried I just found it so hard that I just was giving up, even though I was actually reasonably persistent. Hospital systems as such, they’re not very keen on releasing this large dataset because there’s lots of risks associated with releasing them. They are also thinking about it… I don’t know how that would be, maybe, in Europe there’s different thinking about it but in America, people are thinking about these datasets as property of the hospital system that maybe they will want to make some money on in the future. So it’s a… I mean, it’s kind of very natural, right? I think probably you’ll have to think about it this way.
But even if it was possible… even if I could just today throw these datasets like an out there, it is still, you know, several hundred terabytes. Where do I even put this? How do people even download that? I don’t know. I mean, I think there is no… I don’t know of any good way of doing this. Actually, there is. Some people I know, for example, set up a server that they let you access if you have their permission. So there is a university in Sweden where basically you can ask, like if you fulfill certain criteria, they will let you log into their computer and you can do some experiments on their dataset through the interface they enable you to use. But they don’t let you download things and search.
That’s the Karolinska?
Mm hmm. Well, I have another question. You’re familiar with the paper that Carolina Freeman did that tried to compare performance across different age models and for different situations from either pre selecting high risk cases or being a second reader. And the interesting thing that she found was that she found that most of the study studies really didn’t pass the criteria of being high quality studies. Do you know the paper I’m talking about? It was just two years ago.
I was wondering if you were familiar with that because it provided a guide for study design for AI, to make sure there was the least biases as possible. In your dataset, do have a pretty good… do you keep or even know of like things like race and ethnicity? Is that a variable that you have access to for your mammography dataset?
Yeah. So yeah, we controlled that. We had done some experiments. I’m not sure if there are in any papers, but we have extracted these and we checked whether or not performance consistency across difference subgroups and we found that there wasn’t like there weren’t any massive changes between different races. I think what’s what made a bigger difference was the density of the breast. Because it’s just obviously harder to see the cancer in the very dense breasts. But these models are generally fairly robust, you know?
If they’re trained on robust data, right?
Yeah. If the data is sufficiently large and it’s not sub sampled in some uncontrollable way, then these models are typically quite robust to do this kind of selection of the subgroups.
Any other questions?
In your approach for the deep learning, are you following a large… the LLM type of models with birth or microbirth? And are you looking at generating confidence levels in your decisions? So you can integrate, with the LLMs, you can integrate very large datasets. The problem is that as a university or small companies, those might not be affordable to kind of start pumping these terabytes into the models. So that’s one way… Are you following the LLM type of approaches for making the decision? And then when you’re making the decision, are you looking at generating confidence levels or intervals or for risk or variances to in the decision making? Thank you.
Maybe I’ll answer your second question first about generating, let’s say, indicating some kind of uncertainty about the prediction. So I’m not. The short answer is I’m not doing that right now. But this is something that I’m actually working on with a collaborator. So, you know, uncertainty in a way can be decomposed, let’s say, in some kind inherited uncertainty of the data. So the model is not confident when it says there’s like 70% chance of the cancer but not 90. So you cannot do very much about that. But the useful thing that you could possibly do is we it gives you its predicition you can estimate how confident it is about this particular prediction. So that’s what I’m working on.
In terms of a Large Language Models, I’m not sure if I fully understand your question because large language models don’t make predictions on image data. But they are useful, for example, I think in analyzing pathology reports…
There is there is a big move into the transformers, basically, and there’s a big push on transformers where you’re integrating and fusing multiple information to do it. I mean, that’s basically what I’m kind of going at.
I see. So yeah, so I am working in such directions as well. So I can tell you. Okay, these papers dont get published so I cannot give you the full conclusions but the conclusions we have is that the division transformers I see work reasonably well on this kind of images. You can train them to a similar accuracy as convolutional neural networks. But there isn’t a huge difference.
I think what is the more interesting application of transformers is to train neural networks to make predictions based on the sequence of exams, for example. Or incorporates some other side information into classification, so maybe if you could integrate patient history or something like that. It’s somewhat awkward to integrate this kind of information to a convolutional neural network. It’s much more natural to do this in the context of transformers.
I have another question. It has to do with quality control. So let’s say, for example, your algorithm is implemented. It’s being used in the field. Multiple different systems are seen to be as valid using retrospective data. And then there’s a… how do you how do you assure on an individual system that the algorithm stays in calibration for its accuracy over time, even though you might have changes in detectors, slide upgrades, algorithm changes on the preprocessing of the images? How do you how do you do quality control with AI detection and classification algorithms in the field in a practical way?
I think the short answer is that I believe… I’m not aware of anybody actually… So I don’t want to convince you of something untrue but I am, at least, I haven’t seen such practical system that would actually do that with medical AI. However, there is this kind of a big idea which is called… I forgot what… but there is this kind of trend in thinking in this manner.
I don’t think there’s like one single universal way of showing this and ensuring that these models don’t go astray. I think it’s more of a collection of sanity checks. So, for example, you could… maybe one idea that you could apply is to maybe see whether the representations that you are learning from your neural network are not too different from the representations that you’ve learned about the historical right and doing some kind of anomaly detection there. You know, you could you could do a lot of… I imagine that you could do a lot of monitoring of the inputs. So, for example, like low level statistics of the inputs to make sure that they actually fit what your constraints was trained on. But I think whatever you’re going to come up with is going to be a collection of tricks rather than some kind of systematic solution, I believe… I don’t have an idea for like what exactly this new neural network to control the old neural network would look like.
I don’t know if you saw the paper at RSBA but there was a presentation from someone in the U.K. had implemented an AI algorithm for control and recall rates. He monitored the recalls of patients and the rate over time. And it was 24 different centers, and they were averaging about 10%. Then something happened one at a time at each one of the centers where recall went up to 30% and he couldn’t figure out what it was until he did an investigation. And it turned out to be a software upgrade that the algorithm was not dealing with very well. So it’s a critical problem. And I think that it impedes the global implementation to have it be a stable solution in the field because every ten years or so systems change technology.
Just this is one more follow up question. You mentioned how you applied your algorithms to single slices of the DVT. Did you did you retrain your full field algorithm for the single slices of the tomosynthesis, or did you find that it worked pretty well without retraining?
Oh, so it’s neither.
So it’s actually it’s a slightly different network. So it is basically looking at essentially it is looking at the entire volume.
Okay. So it’s a completely different algorithm. Okay.
I mean, it’s not a completely different it’s an algorithm is another way of using similar ideas, but it’s similar. It’s using ideas, but it’s a let’s say this is not just an application of, you know, like we didn’t just take the model on to be true to demography and we applied to the beauty to it. So there’s a model train from scratch.
Okay, I understand we’re running out of time, but any other questions?
I had one more question about the ultrasound models you’re working with. So we’re very interested in the ultrasound models because, you know, there’s a lot of Pacific islands that don’t have to mammography. And Arianna is a student. Her microphone’s down. But she was very interested in knowing whether you have plans to release some of the ultrasound models that you guys have built since those are very interesting to us.
So… I probably should not say very much about this when I’m being recorded… Yeah. So, you know, like my honest intention is to always release everything we do, but I’m not in full control of how we release it because I don’t control it. Basically I’m not an owner of these models. I mean, this is constantly on my mind to release those other models that we’ve created in the past because we’d have some backlog of the models that we created and we haven’t released…
Okay, so long story short is that if you email me, I think I actually have a way now to share it with you rather than sharing it publicly.
Great. Yeah, there’s lots of things we’d like to follow up with you on as well. Peter, you want to close this out or you want me to do it?
Yeah. I mean, thank you so much for your support. This was very educational and very inspiring and lots of interesting ideas to talk about in the future. No explainable A.I. Multimodal. How do we fuse the two? We should keep the conversations going. So let’s give another round of applause to Krzysztof and have a great weekend.
- April 7, 2023
9:00 am - 10:00 am
- Event Category:
- Artificial Intelligence in Cancer Research – AI PHI Affinity Group
- Artificial Intelligence and Precision Health Institute (AI PHI)
- View Organizer Website