https://www.nature.com/articles/d41586-018-03358-3
Researchers have used artificial intelligence (AI) to discover nearly
6,000 previously unknown species of virus. The work, presented on 15
March at a meeting organized by the US Department of Energy (DOE),
illustrates an emerging tool for exploring the enormous, largely unknown
diversity of viruses on Earth.
Although viruses influence
everything from human health to the degradation of trash, they are hard
to study. Scientists cannot grow most viruses in the lab, and attempts
to identify their genetic sequences are often thwarted because their
genomes are tiny and evolve fast.
In recent years, researchers have hunted for unknown viruses by sequencing DNA in samples taken from various environments. To identify the microbes present, researchers search for the genetic signatures of known viruses and bacteria
— just as a word processor’s ‘find’ function highlights words
containing particular letters in a document. But that method often
fails, because virologists cannot search for what they do not know. A
form of AI called machine learning gets around this problem because it can find emergent patterns in mountains of information. Machine-learning algorithms parse data, learn from them and then classify information autonomously.
“Previously,
people had no method to study viruses well,” says Jie Ren, a
computational biologist at the University of Southern California in Los
Angeles. “But now we have tools to find them.”
For the latest
study, Simon Roux, a computational biologist at the DOE Joint Genome
Institute (JGI) in Walnut Creek, California, trained computers to
identify the genetic sequences of viruses from one unusual family,
Inoviridae. These viruses live in bacteria and alter their host’s
behaviour: for instance, they make the bacteria that cause cholera, Vibrio cholerae,
more toxic. But Roux, who presented his work at the meeting in San
Francisco, California, organized by the JGI, estimates that fewer than
100 species had been identified before his research began.
Roux presented a machine-learning algorithm with two sets of data — one containing 805 genomic sequences from known Inoviridae, and
another with about 2,000 sequences from bacteria and other types of
virus — so that the algorithm could find ways of distinguishing between
them.
Next, Roux fed the model massive metagenomic data sets. The
computer recovered more than 10,000 Inoviridae genomes, and clustered
them into groups indicative of different species. The genetic variation
between some of these groups was so wide that Inoviridae is probably
many families, he said.
Viral learning
In a separate
study, Deyvid Amgarten, a bioinformatician at the University of São
Paulo in Brazil, deployed machine learning to find viruses in compost
piles at the city’s zoo. He programmed his algorithm to search for a few
distinguishing features of virus genomes, such as the density of genes
in DNA strands of a given length. After the training, the computer
recovered several genomes that seem to be new, says Amgarten, who
presented his results at the JGI meeting. The final step will be to
learn what proteins those viruses produce, and see whether any of them
speed the rate at which organic matter breaks down. “We want to improve
the efficiencies of composting,” he says.
Amgarten took his cue from a machine-learning tool reported last year, called VirFinder1,
from Ren’s team. VirFinder is programmed to notice combinations of DNA
letters, such as AT or CG, in DNA strands. Ren applied the algorithm to
metagenomic samples from faeces of healthy people and those with liver
cirrhosis, a condition caused by diseases ranging from hepatitis to
chronic alcoholism. Once the machine classified groups of viruses in the
samples, the team noticed that particular types were more or less
common in healthy people compared to those with cirrhosis — suggesting
that some viruses might play a part in disease.
Ren’s is a
tantalizing finding: biomedical researchers have long wondered whether
viruses contribute to the symptoms of several elusive conditions, such
as chronic fatigue syndrome (also known as myalgic encephalomyelitis)
and inflammatory bowel disease. Derya Unutmaz, an immunologist at the
Jackson Laboratory for Genomic Medicine in Farmington, Connecticut,
speculates that viruses might trigger a destructive inflammatory
reaction — or they might modify the behaviour of bacteria in a person’s
microbiome, which in turn could destabilize metabolism and the immune
system.
With machine learning, Unutmaz says, researchers might
identify viruses in patients that have remained hidden. Further, because
AI has the ability to find patterns in massive data sets, he says, the
approach might connect data on viruses to bacteria, and then to protein
changes in people with symptoms. Says Unutmaz, “Machine learning could
reveal knowledge we didn’t even think about.”
No comments:
Post a Comment