From Deadline to Data
Dr. Carlos Moreno is an Associate Professor in the Department of Pathology and Laboratory Medicine and in the Department of Biomedical Informatics, as well as a Winship Cancer Institute faculty.
Dr. Moreno needed to process whole exome sequencing data for about 40 prostate cancer samples with the Genome Analysis Toolkit (GATK) software. His pipeline was running on the on-premises BMI cluster, where he could only process one sample at a time due to the availability of the compute nodes in a cluster that is shared among a number of users.
To meet a deadline, he reached out to the AWS at Emory Service Team about speeding up the analysis in the cloud. After a quick meeting to go over the requirements and determine that AWS was a good fit, the team provided Dr. Moreno and his student the instructions to use AWS EC2 and S3, as well as set up an EC2 as a template. They also provided assistance in modifying his script to scale the analysis onto multiple EC2 instances. Subsequently, Dr. Moreno was able to launch about 17 EC2 instances in parallel and rapidly finish the analysis in about 48 hours.
“I reached out to the AWS team and they were very accommodating and helpful. They worked several hours with me on Zoom (and after hours) to help me get started. I am very appreciative of the service that they provided. I learned a lot and couldn’t have done it without their help,” says Dr. Moreno. He is currently working on an abstract and hopes to include some of those results.