Q&A: Tiffani Williams, computer scientist, on creating an open source tree of life

July 29, 2013

Tiffani Williams
Tiffani Williams

The Open Tree of Life project culls years’ worth of segmented scientific research in an effort to create a current, open source version of our knowledge on thousands of plant and animal species. Tiffani Williams, a computer scientist at Texas A&M University who is working on the project, said the Open Tree of Life will eventually be a Wikipedia-like living document for scientists and the community to edit and use for research.

I spoke recently with Williams about the segmented nature of the tree of life, the challenges of the project and how an open tree of life could impact science in schools. Below are excerpts from our interview.

What is the tree of life and why should people care about it?

One way I explain the tree of life is to think about it from the human perspective. A lot of us are interested in understanding our family tree. We want to know about our grandparents and great-great grandparents and down the line. Part of that is this whole notion of where we fit in the world. Who are we? That’s certainly one aspect of a family tree. But there’s another aspect too. For example, when you go to the doctor, they’ll ask you about your family history. High blood pressure and heart disease [in your family] can be signs that you might be impacted, as well. We as human beings have this notion of appreciating our family history. All the tree of life does is take that to another level. Instead of thinking of a family in terms of your human ancestors, the tree of life is the world’s ancestry, which includes all of the world’s organisms. It’s still thought of as a family tree, but the context is a lot more broad.

When thinking of the tree of life, we want to have a family tree that shows the relationship between all of the world’s organisms. Right now we have about 1.7 million species that we know of, but that’s not all of life. There are some who estimate that there are probably around 10 million organisms. Others predict it’s around 100 million organisms. You’re talking about millions of species. But we don’t know how much life actually exists. Sometimes we think we know everything about the world, especially when we talk about life forms. But we don’t.

Many species are becoming extinct at rapid rates. Big cats — the lion, jaguar, snow leopard — are on the endangered species list at some level. There are others we don’t have a lot of information about. They’re not in museums. We can’t study them as our technology improves. But if they become extinct, we will never have had a chance to really understand how they actually were an important part of our world. A lot of people talk about the tree of life from the medicinal perspective. But for me, it’s this whole notion of understanding who we are and understanding our place in the world.

In your PopTech talk, you said research on the tree of life is very segmented. Some scientists are studying insects and others are studying flowering plants and all of the thousands of other distinctive species. How will the Open Tree of Life project unify them?

If we wanted to connect our two family trees, we would look at our separate trees to find points where they intersect. If we go back far enough in history, are there going to be any ancestors that have something in common? We can use that connection of points to put our two trees together. We need to figure out how to take all the separate trees and merge them into a single one. We have 1.7 million species we know exist. We build this base tree, a foundation tree. Then all the different trees we can get our hands on — trees researchers have published on spiders and beetles and big cats — are layered on top. That will give us a view of where we are with our knowledge of the tree of life.

This is not the tree everyone will agree on. The tree represents our state of knowledge. It relates to the different analyses we found to date. The tree is a combination of all of the researchers’ efforts to understand parts of the tree of life. Our goal is to understand it more holistically. There are going to be parts that we may not have enough information about. There’s certainly a lot of conflict. We’re showing what is possible now. We want to get the community involved to continue making refinements and improvements of the tree, so that it more accurately reflects the truth.

How long have you been working on this project and what have you done so far?

The project has been around a year. One of our goals was to get a tree of life out to the public, but also to the scientific community, within the first year. We’d like more people to help with giving us trees. Many older papers simply describe the tree. There is not necessarily a visual artifact of the tree except the drawing in the paper. Our first release of this tree has been about taking trees already in digital form in different databases and building them into this first tree of life. The next step will look at what’s missing. Are there any essential studies that haven’t been incorporated?

We try to get every paper we can find that talks about a new relationship between some organisms of interest. If we don’t have the digital version of the tree, we contact the author. A lot of our e-mails weren’t answered. At the end of the day, I hope the community will see the value in producing a scientific artifact that is made available so people can use it, especially if you want your work to be part of something larger.

What are the other challenges involved with this project?

The next challenge for me is about how to manage expectations. We’re saying we’re building a tree of life and it’s true. But really we’re building a tree of life based on the information we were able to obtain that’s out there. We’re using our current state of knowledge to put those findings together in a meaningful way. Once the tree is out there, we’ll see how well of a job we’ve done in terms of helping people interpret it. That’s a big challenge.

Another challenge is more technical. We have all these different trees and we’re trying to come up with a meaningful way to merge them. It could be that two findings just outright conflict each other. How do you represent conflict in the tree of life? Handling conflict in the tree of life is something that not only are we facing now, but we will be facing that for quite some time. People use different data. People study different perspectives. Their results are fine in their environment. But when we put them together in a holistic way, they may conflict. We want to show that, but it’s unclear what’s the best way to do that.

What do you hope this looks like one day? What’s the long-term goal?

This project is community-enabled. We’re the people getting it started, but the goal is longevity. How much is the community able to embrace this project? How can we entice the community to embrace this effort of building a tree, even though the initial efforts were just a few people? We’d like it to be thought of in terms of Wikipedia. I don’t use Wikipedia all the time. But there have been a few times when they disable the site and you realize, ‘I depend on Wikipedia.’ That would be great for the tree of life in the sense that people who do phylogenic studies can’t imagine working without access to it.

What’s next for you and this work?

I’m a big believer in the importance of education. I think about how having a tree of life could inform how we teach science in our schools. Does the tree of life allow us to produce some curricular changes? Does it allow us to have children with a better connection with the world because they can use the tree of life? Those are the kinds of things I’m interested in for the future.