Wednesday, December 23, 2009

Paradigm 4

If there's one thing I hate, it's the monotony of generating data. I'm working on running sequencing on 48 samples right now, and let me tell you that is not a bag of laughs. I am all too aware that what I am doing could probably be done faster and better by a halfway decently designed machine, and consequently I just sit there daydreaming of a world in which I have that machine. This leads to pipetting errors which leads to more frustration which, as you can imagine, leads to more daydreaming. It's a cycle of violence on the microliter scale.

Which is all just an intro to explaining why I loved this article in the New York Times on the data deluge. Sequencing machines (among other things) are generating so much data that it's actually the analysis that becomes the limiting step. Eventually there will be so much data output that there will be little need for pipetters and a great need for analyzers. That's right, the limiting reagent is actually human brain hours.

Mind you not just any human brain hours will do. To understand this data we need people well versed in computer science. Computers, after all, are the only things capable of reading off the billions of bases involved in anything approaching reasonable time. A human reading of just a single human genome at one base per second (no breaks, no sleeping) would take over 95 years to complete.

Our limiting reagent brain needs to be versed in statistics, to allow for the fact that any comparisons made on the genomic scale and possibly between large populations. Signal is well hidden by noise, and the noise isn't even necessarily as random as we would like it to be. After all, this is a living, breathing genome we're talking about, not a string of A's, C's, G's and T's as we often imagine it.

Which brings us to the third necessity. The analytical brain that we need also, ideally, should have a strong understanding of molecular biology and the biology of any disease in question. Cells are complicated. Ridiculously complicated. But we do know a pretty enormous amount about how they operate. A thorough understanding of this prior knowledge helps us ask more pointed questions of the data in hand.

Have I overdetermined the system yet? Probably. In the end, deciphering this data is going to take a lot of collaboration. I've seen a lot of attempts at all-in-one prepackaged analysis engines for sequencing data. None of them, so far, looks very impressive. Moreover understanding the output of such packages is its own special challenge, since their inner workings is often closed source or poorly documented. Thus it's often hard to trust or interpret the results that you don't generate yourself.

So will this data flood answer the big questions of our age? Are we going to find the cure for cancer? Perhaps at least the cure for a cancer? If we do it will be not only because of our ability to design and execute good experiments, but also our creativity in sifting the results.

Friday, December 11, 2009

The Panopticon, Part II: Control

Having just opened a machine of revolutionary scientific power, you convene your graduate students to discuss the possibilities.  

Yes I know this is ridiculous, what PI respects and trusts his grad students enough to share this kind of information? He'd just go straight for his fellow PIs, right? Just roll with it, people.

Sitting down with your students you first carefully explain to them the circumstances of finding the machine and the mysterious booklet with its unbelievable claims (see Part I). Your students sit around and listen attentively and with increasing eagerness lean forward in their chairs. They've seen the strange device stowed away in the corner of the lab, and they know this is not one of your endless hypothetical. You end your explanation quickly and ask:

"So, what should we do with the machine? We have 25 uses and we'd better use them well!"

Abe: "Well clearly we should start analyzing people with Our Favorite Disease! We can look at 25 of them, which should give us a good sense of what's going on in OFD."


Ben: "What's going on in OFD? Do we even know what we're looking for?"

Abe: "Well first of all we can see if parasite X is present in OFD. I mean that theory has been kicking around for a long time now."

Charles: "But what if there are no parasites, or only a few parasites? Do you want to waste all of these experiments just looking for parasites?"

Abe: "Well that's the great thing about it! I mean if you really can look at everything, then we can answer a lot of questions at once. Like what about the theory that OFD patients have greater p123 signaling? We'll be able to count the p123 and answer that one right off the bat, too."

Ben: "Hold on bucko, what are you saying? If we just look at 25 OFD patients we won't have any idea whether p123 is high or low. We'll just know what it is on average in those patients, we have no basis for comparison."

Charles: "Yea Abe, slow down. We need to think through this. What are the proper controls?"

Abe: "Proper controls? Look we only have 25 runs of this thing, we need to be focusing on interesting samples, not normal everyday people. What if we don't look at enough sick patients and miss something important?"

Charles: "We can't do this without controls. There's just no way. You have to be able to compare the patients to some estimate of what's normal, and there's no other way to know what's normal without using some runs of the Panopticon. I mean, we could use previous estimates of p123 or parasite prevalence in the general population, but there's no way that we can really believe that those are accurate."

Ben: "Yea, I remember hearing that p123 may have three or more isoforms that have been undetected in our blotting assay. The Panopticon is powerful enough to see those."

Delta: "It will see those... (Mysteriously) but what else will it see?"

Ben: "What do you mean, Delta?"

Delta: "What else will it see, under the surface? It's true we may see p123 isoforms we expect, but what about those we don't expect?"

Ben: "(Condescending) Well we'll look at those, too. Now Abe do you see why we need to run some normal patients through the machine, too?"

Abe: "Yea I guess, but I think we should do as few as possible."

Ben: "Well yea, I mean you only need to run a few controls."

Charles: "Do you guys just completely not get it? We're not running the kind of control where we know what to expect. This isn't like running a PCR with water instead of DNA template. We don't just need to know what's normal, we need to know the variability of normal. Sure, we might put two people in and they might not have parasites, but what if the third one would have? If we see parasites in half of our patients we still won't know if that's normal for the general population?"

Abe: (Sighs) "Well what do you recommend? We can't waste all of these runs."

Charles: "Split it down the middle. Half the runs, or I guess 13 if it makes you happier, could be patients, 12 could be normal people."

Abe: "I'd say it was a shame, but I guess we can still answer so many different questions."

Delta: "The more questions you ask, the more will slip through your fingers."

Ben: "Well I know that doesn't make any sense. The more questions the merrier."

Delta: "Seeing everything is like seeing nothing. It is a true Panopticon. The original Panopticon was a tower in the middle of a prison. Each cell faced the center, and the guards could see all cells from their vantage point. So will you be within the Panopticon. You see all, but in this sight you become imprisoned. Just as you can view each cell, you will find you can see none of them."

Abe: (Laughing) "So speaks the oracle! Did that make any sense?"

Ben: "Not that I can tell"

Charles: "No"

Delta: "Go ahead then."

Abe: "I don't know what Delta is talking about, but let's just run 20 people, 10 healthy, 10 sick. We'll have 5 left over just in case something goes wrong."

You decide to allow your graduate students to proceed. They agree to try the machine on ten patients and ten healthy people, analyze the data. You are pleased that they've come to the right conclusion, and put together a case-control design. You worry, though, about Delta's ominous prophecy for this experiment. Perhaps you will find out what she means when the data comes in...

Continued later.

Tuesday, December 8, 2009

On purposeful learning

I was cleaning out my Onenote files the other day and came across a poem by WB Yeats I liked and saved:

"What Then?"

His chosen comrades thought at school
He must grow a famous man;
He thought the same and lived by rule,
All his twenties crammed with toil;
`What then?' sang Plato's ghost. `What then?'

Everything he wrote was read,
After certain years he won
Sufficient money for his need,
Friends that have been friends indeed;
`What then?' sang Plato's ghost. `What then?'

All his happier dreams came true -
A small old house, wife, daughter, son,
Grounds where plum and cabbage grew,
Poets and Wits about him drew;
`What then?' sang Plato's ghost. `What then?'

`The work is done,' grown old he thought,
`According to my boyish plan;
Let the fools rage, I swerved in naught,
Something to perfection brought';
But louder sang that ghost, `What then?'

I mean to post a science-based post soon but this has been a continuation of conversations we've had in the past semester. What is the end goal of accruing knowledge for all of you? How will we use it? Or will we end up spinning our wheels with minutiae discoveries only for the sake of accruing grants we require for personal survival?

A Risky Statement on Gender

Before I get myself into too much trouble, I want to mention that N=2.

So I am sitting in my bioinformatics class during presentations and I am noticing an interesting phenomenon that I can't help but post about. Each student is supposed to present, but students can form teams.

Here's the result: The two girls in the group are part of a two person team. There are no two person teams with two males (of five). In the teams with a male and a female, the female begins the presentation and the male jumps in about 2 minutes into the 15 min presentation and (loudly) completes the remainder of the presentation. The female sits patiently for the balance of the presentation, but the male never hands the torch back.


My tentative conclusion, the barriers to women in science are often subtle, and are not only institutional.

Tuesday, December 1, 2009

A Comment on Bioinformatics

So this is my summary of bioinformatics approaches to sequence alignment, phylogeny, and structural prediction:
  1. Create a set of assumptions so that you can use dynamic programming
  2. Use dynamic programming
  3. Forget how ridiculous your assumptions were
There you go. I just saved you hours of coursework.

Monday, November 23, 2009

The Hoard addendum

So I think my post about the Hoard was somewhat rambling and failed to make any clear point. My concern is this: science has outgrown its incentive structure. Good science needs to be done on a scale that involves many hardworking individuals, not all of which will be writing the manuscript and scribbling their name at the front of the author list.

Take, as an example, a massive study to sequence genes in cancer X. This is going to involve intellectual contribution from dozens of people, if not hundreds. Doctors will be involved in enrolling patients and providing detailed clinical annotations of their progress. Surgeons will carefully select samples. Pathologists will carefully grade those samples and possibly select which can truly be classified as cancer X and sent to the lab. Lab workers will perfect protocols, churn them through a pipeline they design. Bioinformaticians and statisticians will then undergo a rigorous analysis of resulting data, perhaps even writing new programs and methods for their processing.

The result, surely, is going to be an enormous multiauthor publication or series of publications. None of this could happen without a few organizing minds at the top, and they, rightfully, claim a great deal of the credit. But what of the multitude of other researchers also involved? They are sandwiched somewhere between et. and al., without the due credit many of them deserve. Much of their work is crammed, often in tiny font, into the often discarded 'methods' section.

To come back around, I am concerned for my place as a growing scientist in the sea of collaborative studies. I truly enjoy the thrill of pushing back scientific boundaries, and specifically I think that modern sequencing offers a powerful tool to do just that. But as a new graduate student who will not be leading the studies, I am concerned that these projects lead me nowhere towards a first author publication. I'm therefore treading water with respect to fulfilling my requirements for graduation. I want very much to be a good scientist and even more to work on important discoveries, but I worry that some ways of doing that come at the expense of my own career.

Saturday, November 21, 2009

The Panopticon, Part I

First off, I just want to mention that the word panopticon is one of my favorites. This post is going to be about something that might more properly be called "Heisenberg's Device", but I am going with The Panopticon. The traditional use of panopticon is more dire than the one I describe, although I hope to draw a line between the two in later posts.

I'm going to lay out this story over a couple of days, since I don't want to spend too much time on it each day. A little suspense never hurt anyone. No device like the one I describe exists, or really could exist, but I think it's an interesting point from which to inquire about our scientific methods.

Please forgive my use of the second person. I like it because it makes the whole thing feel like the lead up to a question, which is exactly what this story is intended to be.

Here we go:

You are a biologist tasked with the study of disease X. You've been studying the disease for decades, and you understand a great deal about the workings of this illness, from the molecular biology to the physiology to the pathology and epidemiology. Your efforts have been fruitful. Bit by bit you've knocked back the boundaries of ignorance and revealed certain properties of disease X that have garnered you recognition in your field, though done less to actually cure it.

You come in to work in your lab one day and notice a large package sitting outside in the hall. It has your name on it, but no other identifying information. Curious, you begin to rip into this strange gift. When you tear all the packaging away you find you have a device that looks not unlike a refrigerator coated in a series of snaking tubes, valves and wires. Like a refrigerator it has a door in the front, which can be sealed with three enormous locks. Behind it has an industrial scale plug, a usb port with cord and, tucked away in a small leather pouch, a booklet.

Of course you grab the booklet, looking for some information, any information, about what this strange shipment is. Have your graduate students gotten carried away with themselves? Have the minus 80 freezer designers gone all 'steampunk'?

The front of the booklet has only the following words in bold letters

PANOPTICON

Twenty-five uses

Tantilized, you open the manual and read further. It reads:

Greetings! You are one of a select group of scientists that has been chosen to receive our latest revolutionary technology. We are aware of your tireless pursuit of knowledge and your prior successes, and we are assured that you will know how best to use our device to maximal benefit. The machine we have provided (the Panopticon, hereafter), is just another tool in your pursuit of the cure. We provide 25 uses, all this machine is capable of, at no cost to you or obligation of future purchase.
This device has been developed in secret by our labs and is able to perform perhaps the most rigorous scientific assay conceivable. With just simple setup procedure, this machine is capable of detecting and reporting the precise position and movements (within Heisenburg's limitations, of course) of every atom and molecule contained within the chamber. This data can be downloaded to a computer where it can be stored for analysis and detailed examination. The machine can track this information for five seconds before it becomes inactivated due to the risk of power overload. Never fear, though, the technology is perfectly safe and can even assay living subjects at no harm to or effect on the assayee (see: FDA certification information in Appendix F).
The details of operation and the proper interpretation of the data output formats are provided in this manual, but we have removed key elements of the Panopticon's operation which we consider trade secrets. We hope that you will make use of your 25 assays fruitfully. Good luck!
You sit down at one of your labs benches, confused. Every position of every atom and molecule in the chamber? For five seconds? No microscope or imaging technology even comes close! It's like transmission electron microscopy in real time, on a massive massive scale. And in living subjects!  You take a moment to ponder the implications


So that's it for today. I'll explore the reactions to the machine in later posts.



Friday, November 20, 2009

The Hoard

Science is built on the daring and unconventional ideas of a few going against the hive mentality of the pack. Things that seem like strange artifacts in the data emerge as cryptic signposts pointing to bold new ideas. The bizarre non-Newtonian behavior of light led to relativity, and the equally bizarre downregulation of supposedly overexpressed genes in petunias eventually became RNAi.

But in the era of big science, how do we pick what anomalies are worth following up on? As science becomes more and more dependent upon technology to progress, any line of inquiry becomes expensive both in dollars and man hours. Within this paradigm, is independent investigation possible?

I would argue that in many ways it is not, but our current model still does little to promote the large scale collaboration necessary for modern research. There is still a big emphasis (in biology at least) on first author publications rather than roles in large collaborative studies. Collaborative openness, while often touted as a cornerstone of an institution, often does not make careers. We like to put a name on a given discovery, or maybe two names. We award Nobel Prizes to a handful of researchers, not a collaborative team.


That's not to say that we're not trying. Take The Cancer Genome Atlas (TCGA). This massive effort by six major research centers will work to sequence tumors and matched normal tissue from the same individual across dozens, then hundreds, then thousands of individuals. Eventually it will trace the contours of the cancer genome at extremely high resolution by using very powerful (but expensive) sequencing technology on a massive scale. This is the sort of google maps of the cancer genome. This atlas will be available to everyone, publicly, at no cost to the users.

It will be interesting to see if this new paradigm works. As great as these new, massive, collaborative databases are I think we have one risk, which is a sort of inverse tragedy of the commons. If the data is public and everyone has access to it, no one will have a vested interest in following up on that data. Publishers just aren't as impressed with computational follow up as they are with studies that generate new data. Furthermore we seem to have very little interest, in our modern scientific society, in 'negative' results. Much of what these databases will do, I think, is wash away the apparent significance of correlative studies done with our scientific eye trained on just a few pathways. If we look at two genes and see that they go up and down together we call them a signaling pathway. But if we look at 2000 genes and see them go up and down together, do we still come to the same conclusion?

It's a rocky road we embark upon.

Tuesday, November 17, 2009

On the dilution of science and loss of value

The other day, Will and I were discussing the vast number of scientific papers already published and what little room there will be for our own contributions. In our despair, we forgot a crucial part of life, that quality will always outlast quantity; in a strict capitalistic sense, while there might be imitators, the ideas of good products will always be bought and sold. Take for instance the iPhone app store. At its conception, the few apps that existed served very specific purposes and developers were able to charge a fair but profitable price for their services. Once the app store began to get diluted with apps such as "Goal2Action" and "Looptastic Gold", the power of the market once again shifted to consumers to find and support quality apps. So in regards to science, how far can I take my favorite metaphor? Who are our consumers?

The scientific community plays a large role in judging the quality of our work, but in a deeper philosophical sense, Time plays the final judge in determining quality. Because as scientists, what we're really striving for is to discover truth. Pontius Pilate once asked "What is truth?" and the answer is maintained by methods that are still unreachable to us. Therefore the discovery of truth will always remain the final barometer for our work. Scientific discoveries such has Mendel's postulations on genetics or Galileo's astronomical work (haha) has stood the test of time. Even Darwin's evolutionary thought has accrued evidence to defend itself. And while our contributions might not be as large, if they reflect truth, they should withstand both the scrutiny of our peers and also the interrogation of Time.

And that is what we should be aiming for, to understand truth, as opposed to trying to publish for the sake of our career. Ambition is not necessarily evil, but it should not be the forefront of our motivations. Too often in science we see work being done because it is required for a grant, as opposed to a more "pure" motivation of just wanting to know for the advancement of knowledge. Such work de-values the body of science because the act of merely spinning your wheels gains you no distance. And the community itself is to blame for adopting an environment that is a microcosm of the capitalistic world at large, a model that struggles to succeed because the end goal is always selfish achievement.

So while it can be discouraging to see the volumes and volumes of scientific literature, I find peace that not all of it is noteworthy, not all of it is significant, and not all of it is true. And I remain hopeful that as I continue to seek truth, I will find it.

Friday, November 13, 2009

The Blog to Nowhere

So I never really introduced this blog. Since I am not really sure who is reading it (anyone?) I want to take an opportunity to say a few things, if only as kind of a mission statement for my own reference.

First off, this blog is about mulling through the big ideas in science. I've noticed that graduate school often follows a long decline from the big picture to a sort of niche myopia, where you are aware only of the things going on in your subfield. The technical details of the day to day work of research overwhelm those high minded considerations that made science interesting in the first place. There are so many exciting things happening in biological research, and cancer research in particular, that I think deserve a wider conversation.

Secondly, I want to write about the culture of science and science education. In some ways science is nothing more than a culture. It's a way of thinking about ideas, communicating those ideas and evaluating their utility. The way we do science is inherently linked to the institutions we've built up to pursue it. Better technology and better science is as much about designing the right culture of inquiry as designing the right experiment.

Thirdly, I hope to improve, if slightly, my communication of scientific ideas. I hope to make each post clear, contained and concise. For some reason the scientific writing style has become an impenetrable thicket of technical language. So many of the journal articles I have read are accessible only to the most up-to-date members of the field. They can be unreachable even to those using the same model organism, but studying different areas. To my mind this is a deep weakness. An idea is only as good as its communicability, and while it's great to be the first to discover something you've done very little if it doesn't reach the person who can use it to maximal impact. Wherever possible I hope to practice avoiding the technical language and giving the complete background.


Lastly I want to have a record of my naive youthful hopes and dreams when I'm a grizzled senior grad student so that maybe I can keep the flame alive when the going gets rough.

Wednesday, November 11, 2009

Biology and Black Holes

I attended a talk Monday in which the speaker proposed that biology was on the cusp of a new era. This era, he claims, will see biology increasingly resembling the discipline of astronomy. We will more and more be training our genomic telescopes on small parts of the genome, building models, and testing that model against other parts of the genome.

While I don't disagree with him that this is where the field is going, I think we need to do everything we can to avoid the possible consequences. The astronomical model seems, to me, to be a black hole of correlative, descriptive analysis with no useful predictions or clear connection to human application. We risk getting lost in a biological string theory, which makes no substantive claims but which drains brainpower and dollars from more applicable efforts.

What do I mean by this? Take CHiP-seq studies, for example. These are incredibly powerful experiments that document behavior of transcription factors throughout the genome, and they reveal new interactions and possible regulatory pathways. But they often do so in a deluge, and teasing apart the specific and biologically relevant (read: predictive and useful) associations from the less specific, less relevant results can be an entire career's worth of work. In short, we have more data than we know what to do with.

I don't want to sound totally negative about biological astronomy. If we train our telescopes specifically towards disease states we may be able to sort out relationships between genomic, epigentic, trancriptional or other states and prognosis or even treatment responses. Massive biological data coupled with excellent clinical annotation could go a long way towards personalized medicine. But in the complex regulatory network of the cell it seems unlikely that any simple interpretations for these data sets will emerge anytime soon. For now, we should expect biological astrology; We can make some predictions about the future but be damned if we know anything about the mechanism.

Thursday, November 5, 2009

Dear Graduate Student Professors

I am currently sitting in genetics class reflecting on the quality of graduate education in science. Let's just say that the class is taught with a 'challenging' style. I'm going to list a few key requirements of good graduate school teaching that are often disregarded.

  1. Enunciate. This is not that hard, people.
  2. Get to know your students' names. Grad classes tend to be small, and any one of us could be a future colleague. At least make an effort.
  3. Look at your slides before class and understand the flow of your presentation. By this I mean understand the logical progression of the ideas. Make sure that everything you need to understand a slide is presented before your arrive at that slide.
  4. If you're explaining an experimental result or protocol, take your time. These tend to make a lot of sense after you understand them, but are impenetrable the first time you look at them. That's usually because there are a lot of tools that go into the experiment with which students may be unfamiliar. How can you understand a genome wide association study if you don't know what a SNP is?
  5. When you plan to ask questions, make sure that the answer is actually available given the information you've presented. If students can't guess the answer then it's a bad question or you've failed to lay out the setup to the question. If a student answers your question incorrectly explain clearly but without condescension why the answer is wrong.
  6. Do not get stuck on one slide. Your students, as interested as they may be, will start to tune out.
  7. Balance and manage questions in class. This is an art. Don't get bogged down, but don't race through material and leave everyone in the dust. Actively ask your students about your pacing, perhaps on an individual basis to avoid peer pressure.
  8. Vary your cadence. Show the importance of a particular part of your presentation with vocal emphasis. 
  9. Don't be afraid of the blackboard. It's a great way to draw out and clarify a point.
  10. Try to make sure that if you're teaching without a textbook that students who get lost have a written resource to which they can refer that has complete information (this doesn't have to be powerpoint slides, but that's one good place).
Some of the above are so damn simple that I can't understand why anyone wouldn't be able to manage them.  Others are not all easy to live up to, but your students will massively appreciate your efforts if you try.

PS: To add a few more...
  • Never ever ever condescend to your students. It's immature and unprofessional.
  • Avoid the adversarial model of professor vs. student.
  • Did I mention enunciate?

Sitting in class

Hello dear readers. This is my first post.

As I'm sitting in class, I'm wondering about effective techniques for students to learn. Most of these effective techniques are not included in this class. To be fair, graduate school classes are no worse than medical school classes. Just replace one bad lecture with another. However, at least medical school will eventually allow us to take a hands on approach to the material with third and fourth year rotations whereas graduate school will never allow me to apply my skeletal knowledge of transposons in the lab.

Wednesday, November 4, 2009

Karmic Koalas Crash Eclipse

So I took the plunge and updated my system to 9.10 (karmic koala) from 9.4 (jaunty jackalope). So far I'm happy with the small aesthetic changes and what seems to be a mild speed increase, but I was miffed to find out that the new release broke my version of Eclipse. I used the workaround found here, which worked well. Ubuntu is great and all, but its moments like this that I wonder if it'll ever make it to the mainstream.

Saturday, October 31, 2009

Reference Genome(s?)

Of the many interesting subjects at the genome informatics meeting at Cold Spring Harbor Labs, one of the more interesting surrounded the release of the latest human genome reference. I'll call it hg19.

The issue surrounding the release is as follows. Many people have been using the old reference, hg18, for years. Not only is it more familiar, but the coordinates have become a sort of cannonical framework on which we have draped massive amounts of auxilliary information. No doubt some programmers (myself included, unfortunately) have hard coded coordinates from the old genome into their utilities. Converting will be sort of like the switch away from 2 digit dates at y2k. It's a much smaller scale, but no less of a pain for those who have to do it.

What's more interesting is that the new genome, and increasingly future genomes, includes so called "alternate assemblies". These are regions of the chromosome for which different well sequenced individuals have chromosomal rearrangements and mutations large enough and different enough to require a completely different reference sequence. This, too, is likely to create a new mess of complications in dealing with the informatics side. Sure, your sequence matches to chromsome 17, but whose chromosome 17? Is it the "alternate" or the "reference" chr17? Both?

The plot will only thicken as the 1000 genomes project becomes more accessible to public use. Imagine not 2 or 3 alternates, but thousands! Moreover, what sorts of changes warrant an alternate assembly? And what frequency? How will sequences be mapped to these assemblies, in their ever expanding multiplicity? How will the assemblies be stored locally? It seems impractical to have one thousand 3 GB  files floating around with individual genomes.

Of course an alternative is to keep the main reference and create a database of changes to that reference, such as dbSNP. The question there becomes how to manage large and unusual sorts of changes, such as large indels and chromosomal rearrangements. Moreover, won't we be apt to bias ourselves against the 'non-reference' alleles since all of our mapping algorithms will use the reference only?

The appropriate path to take, in my mind, is to abandon the present model of a reference genome as a flat file with a list of nucleotides. We need to think of it, increasingly, as a graph, with alternate paths through that graph representing individual human genomes. Such a model will allow us to map more easily to variant genotypes, and could compress the amount of hard disk space required store multiple genomes in our clusters.

We'd risk, of course, decreased accessibility to new users. A text file with a list of A's, T's, G's and C's is easy for people to understand and start to work with. That's why any conversion to a new standard would have to be spearheaded by a well funded group with the means to create a suite of conversion tools to make a seamless transition back to fasta files for those who are used to them. Such a group would also need to design new utilities for alignment, variant calling, visualization and downstream analysis (among other things). Moreover, they'd have to come up with a new coordinate system (scary) to map back and forth between old and new results.

This is no small task, but it needs to happen. It's time to stop treating all genomes as vaiants on Watson or Venter.

Thursday, October 29, 2009

First Post! The War on Cancer: Silver Bullets or Atom Bombs

Sitting here in the Genome Informatics meeting in Cold Spring Harbor Labs I am newly awakened to the vast broader community working on many of the scientific problems that I often feel I toil on in isolation. Moreover I'm amazed by the amount of collective knowledge and competence in dealing with the informatics problems associated with sequencing by synthesis technology specifically and genomic biology broadly.

So here we go! I'm going to take a stab and creating a public space for my thoughts and perhaps even a site for dialogue with other researchers. Which is to say: comment liberally! Correct my foolishness or debate ideas. I certainly don't expect much traffic, but leave a note if you stop in.

Now for the main course.

Silver Bullets or Atom Bombs:

In conversations with one of my previous PI earlier this week I got the chance to muse about the oncoming deluge of cancer genomics data. I think everyone in bioinformatics, and in the biology community in general, is thrilled with the enormous number of new tools now available through sequencing, but there remains the hovering question: what are we going to do with all of this new data? More to the point, what are we looking for?

Two possibilities are well known. Each stems from a sort of overarching category of cancer therapy. On the one hand you have the atom bombs. These are the old standbys of the arsenal, the drugs that blast a cell with toxic insult and hope for the best. These often target the genomic DNA, either through antimetabolites or through damaging alkylating agents. Presumably the cancerous cells, with their rapid replication, will suffer the most, but there are many victims of the therapy and the toxicity is intense.

On the other hand you have the (usually newer) silver bullets. Here I am thinking of imatinib or trastuzamab. These therapies are intended to be trained killers, entering the body and using a cancer cell's weak points to mount a targeted assault on proteins the cancer cell depends upon. Without these signals the tumors remit, often with less toxicity to the patient.

World Maps

There are two ways in which the genomic revolution could obviously help cancer treatment. The first is a sort of world map. Right now oncologists drop their atom bomb therapies based on tumor pathology, which uses information like the tumor's location and appearance under a microscope for categorization. Some categories of tumor respond very well to some therapies, others extremely poorly. But some tumor types seem to have very heterogeneous responses to the same therapy. Some patients experience full remissions while others see very little benefit. Furthermore patients differ in their ability to withstand an atom bomb therapy, with some suffering severe toxicity and dying from the treatment, rather than the disease.

For these, we need a more precise world map. If you're going to be dropping atom bombs, even as a last resort, you sure as hell would like to be dropping them in the right place. Finer scale categorization of tumors based on either their genotype or expression profiles (or both) could hone our categories and put clearer boundaries on the map. Perhaps carboplatin is a good drug for melanoma, but only if the cells have a specific cell cycle checkpoint defect. There might be no way to see such a defect just from histology.

Just as you need to have a map of where to bomb, you need a map of where NOT to bomb. You want to destroy Melanomatown but spare Bonemarrowburg. Some patients need to be spared the treatment due to toxicity susceptibilities of their normal, healthy cells. There might be genomic markers of such susceptibility in the genome.

This line of inquiry amounts to careful epidemiology and delineation of differences between tumors and individuals. No new therapies are necessary, just better characterizations of the patients and their disease.

The Golden Gun

Then there's the prospect of new targets. Perhaps now we can find new targets which, like BCR-ABL in CML, fall to a single, focused attack. To me this is the more tantalizing possibility. Perhaps we can find a whole host of tumors which have such pin point weaknesses and hit them hard. To quote Brian J. Druker et. al.

...[O]one of the major issues is to identify appropriate targets for drug development. Although the Abl kinase inhibitor has been useful for clinical concept validation, several features of CML may make the success of a kinase inhibitor as a single agent unique for this cancer. The Bcr-Abl tyrosine kinase, present in 95% of patients, is sufficient to cause the disease, and in early disease, it may represent the sole molecular abnormality. Few other malignant diseases can be ascribed to a single molecular defect in a protein kinase.  (link)

Protein kinases make nice targets because of their susceptibility to small molocule inhibitors, but they are by no means the only good target possibly. With more detailed understanding of cancer genome architecture we may find the variants which, like CML, have a "kingpin" gene regulating the whole cancer phenotype. These may have been too subtle to see by histology, but shine out in either expression analysis or genome resequencing. Once identified, we can design a slew of new bullets to go into a sort of golden gun. Simple assays can be designed to determine whether a patient will be susceptible to one of these targeted therapies, and we can hit the tumors exactly where they are weak. By hitting the kingpin of the proliferation we could reduce the possibility of drug resistance and drug toxicity.

How?


This is just my conception of the field. Many people are already discussing these issues, and I don't pretend that there's a single original idea here. But it's fun to play with big ideas, and you have to start somewhere. I've sort of answered the large question of "what are we trying to do?" but I will end with a series of questions that to me have no obvious answer.

How do we focus our efforts in finding targets?
How do we pull relevant genomic/phenotypic/histologic etc. features out of an ever growing list?
How do we target kingpin changes in cancers for which there is no easy pharmacologic target?
How does cancer 'evolve' drug resistance and how do we prevent it?
How do we categorize patients and tumors with treatment regimens, given that the N decreases the more finely we partition the individuals?
How do we make genomic information available to clinicians?

I'll leave it at that. If you read this far thanks, and I hope I didn't go overboard.