Machine learning of genotype-phenotype associations in colorectal cancer from mutations and tumor images
The immune system plays a critical role in fighting cancer. In colorectal cancer, immune cells infiltrate tumors, influencing tumor growth and patient outcomes. Understanding the spatial distribution of these immune cells can reveal important patterns about response to different treatments. By using segmentation neural networks, we can automatically detect and classify individual cells in tumor slide images. This allows us to study their locations and how they relate to the tumor’s mutational profile.
- Literature overview: Research existing segmentation models for different cell detection in images.
- Data preparation: Gather and preprocess tumor slide images from the TCGA database.
- Segmentation model selection: Choose a neural network model to identify immune cells in the images.
- Spatial data processing: Extract cell type and spatial location information to build a structured dataset.
- Machine learning analysis: Develop a model to analyze how immune cell distribution correlates with different charateristics of tumor (tumor mutational burden, neoantigen count, etc).
The diagnostic slide images from TCGA are quite large and stored in SVS format, which contains high-resolution pathology images.
To analyse locally, I created a subset of tumor slides, converted them to PNG format, and uploaded them to Google Drive for easier access. You can download the subset from the following link:
- Go to the GDC Data Portal.
- Select "Colorectal" as the cancer type. This will create a new cohort.
- Save your cohort by clicking the save button as shown in the screenshots.
- Navigate to the Repository page.
- Ensure that you are working with the Colorectal cohort.
- In the Filters section, locate the "Experimental Strategy" panel.
- Select "Diagnostic Slide".
- Click "Add All Files to Cart" to add the selected images.
- Navigate to your cart by clicking on the cart icon in the top-right corner.
- Since the dataset is large, it is recommended to use the GDC Data Transfer Tool.
- Download the tool from GDC Data Transfer Tool.
- Follow the instructions provided by GDC to download your selected images efficiently.