Multi-omics through AWS at Emory
Dr. Eliver Ghosn.
“We are using AWS for so many things.”
When Dr. Eliver Ghosn (Asst. Professor, Immunology, SOM) moved to Emory from Stanford University about three years ago, he knew he would need partners in IT who could help his new lab with enormous computational needs. His solution was turning to the AWS at Emory team. “They were truly helpful in getting me settled in at Emory,” says Ghosn.
In the Ghosn Lab, they study the development and function of the mammalian immune system at a single-cell level. Biomedical research is moving at a pace where, especially in regard to cellular biology, the data you can get from a single cell is huge. For Eliver and his colleagues, it became almost impossible to analyze those data sets on local computers. They needed to move to data science to mine all of the datasets that they generate in the lab.
Ghosn needed to use AWS to analyze all of their single-cell multi-omics data. Multi-omics is single-cell RNA sequencing in which they measure many different parameters of each cell, then run the test on thousands of cells. Multiply that by the 25,000 genes in the genome and you begin to see how many data points they have.
“We brought the first single-cell platform to Emory, designed by 10x Genomics,” says Ghosn. “We acquired the instrument to run this platform so that we could do all of our work in-house. Initially, I was doing all of the analysis manually. 10x Genomics had a pipeline called Cell Ranger that used a Linux system. But the datasets we would get would be close to a terabyte in size so running them on a local computer was no longer feasible.”
Plus, the computing power they need kept changing so they couldn’t keep buying computers. They decided to move to AWS.
At first, they kept the Linux system but transferred all of their data to AWS, where they ran Cell Ranger to analyze it, then transferred the output data back to the Linux system. But then the AWS Team wrote a script to automate this process and now they can run a new analysis each week and keep the massive amounts of data on AWS.
The Ghosn Lab is divided between basic research science and technology development. Ghosn’s lab students all have user accounts with AWS and can control their data and access. They also have several bioinformatics students from Georgia Tech who come to the lab to do research and help them run visualizations of the data. By utilizing sponsored accounts, the Tech researchers can use Emory’s AWS to run a matrix of the data that allows many different pipelines to run analyses of the data.
Laughs Ghosn, “And we are constantly updating our analyses. Our lab is expanding.”
Another element of the AWS at Emory service that has been helpful for Ghosn’s research was that the massive data generated by genomic sequencing could be transferred directly from the sequencer into their AWS account without having to download it to local machines. With scripts written by the AWS at Emory Team automating the process, the process has become quicker and more reliable.
Additionally, AWS provides backups that keep their data much safer than storing on an Emory local server.
“Automating this whole system allows us to keep our lab in the top tier and become leaders in the field,” says Ghosn “This lab was the first to bring the 10x Genomics platform to Emory and after three years we have developed a reputation for generating quality results, both reliably and quickly.”
With regards to his collaboration with the AWS at Emory Team, “I tell our colleagues to go ahead and get an AWS account. It makes life easier and gives you peace of mind that your scientific work is safely kept, uploaded where it needs to go, can be accessed from anywhere, and doesn’t require a computer science background to run it. AWS at Emory has all of this and more.”