Wednesday, December 23, 2009

Paradigm 4

If there's one thing I hate, it's the monotony of generating data. I'm working on running sequencing on 48 samples right now, and let me tell you that is not a bag of laughs. I am all too aware that what I am doing could probably be done faster and better by a halfway decently designed machine, and consequently I just sit there daydreaming of a world in which I have that machine. This leads to pipetting errors which leads to more frustration which, as you can imagine, leads to more daydreaming. It's a cycle of violence on the microliter scale.

Which is all just an intro to explaining why I loved this article in the New York Times on the data deluge. Sequencing machines (among other things) are generating so much data that it's actually the analysis that becomes the limiting step. Eventually there will be so much data output that there will be little need for pipetters and a great need for analyzers. That's right, the limiting reagent is actually human brain hours.

Mind you not just any human brain hours will do. To understand this data we need people well versed in computer science. Computers, after all, are the only things capable of reading off the billions of bases involved in anything approaching reasonable time. A human reading of just a single human genome at one base per second (no breaks, no sleeping) would take over 95 years to complete.

Our limiting reagent brain needs to be versed in statistics, to allow for the fact that any comparisons made on the genomic scale and possibly between large populations. Signal is well hidden by noise, and the noise isn't even necessarily as random as we would like it to be. After all, this is a living, breathing genome we're talking about, not a string of A's, C's, G's and T's as we often imagine it.

Which brings us to the third necessity. The analytical brain that we need also, ideally, should have a strong understanding of molecular biology and the biology of any disease in question. Cells are complicated. Ridiculously complicated. But we do know a pretty enormous amount about how they operate. A thorough understanding of this prior knowledge helps us ask more pointed questions of the data in hand.

Have I overdetermined the system yet? Probably. In the end, deciphering this data is going to take a lot of collaboration. I've seen a lot of attempts at all-in-one prepackaged analysis engines for sequencing data. None of them, so far, looks very impressive. Moreover understanding the output of such packages is its own special challenge, since their inner workings is often closed source or poorly documented. Thus it's often hard to trust or interpret the results that you don't generate yourself.

So will this data flood answer the big questions of our age? Are we going to find the cure for cancer? Perhaps at least the cure for a cancer? If we do it will be not only because of our ability to design and execute good experiments, but also our creativity in sifting the results.

Friday, December 11, 2009

The Panopticon, Part II: Control

Having just opened a machine of revolutionary scientific power, you convene your graduate students to discuss the possibilities.  

Yes I know this is ridiculous, what PI respects and trusts his grad students enough to share this kind of information? He'd just go straight for his fellow PIs, right? Just roll with it, people.

Sitting down with your students you first carefully explain to them the circumstances of finding the machine and the mysterious booklet with its unbelievable claims (see Part I). Your students sit around and listen attentively and with increasing eagerness lean forward in their chairs. They've seen the strange device stowed away in the corner of the lab, and they know this is not one of your endless hypothetical. You end your explanation quickly and ask:

"So, what should we do with the machine? We have 25 uses and we'd better use them well!"

Abe: "Well clearly we should start analyzing people with Our Favorite Disease! We can look at 25 of them, which should give us a good sense of what's going on in OFD."


Ben: "What's going on in OFD? Do we even know what we're looking for?"

Abe: "Well first of all we can see if parasite X is present in OFD. I mean that theory has been kicking around for a long time now."

Charles: "But what if there are no parasites, or only a few parasites? Do you want to waste all of these experiments just looking for parasites?"

Abe: "Well that's the great thing about it! I mean if you really can look at everything, then we can answer a lot of questions at once. Like what about the theory that OFD patients have greater p123 signaling? We'll be able to count the p123 and answer that one right off the bat, too."

Ben: "Hold on bucko, what are you saying? If we just look at 25 OFD patients we won't have any idea whether p123 is high or low. We'll just know what it is on average in those patients, we have no basis for comparison."

Charles: "Yea Abe, slow down. We need to think through this. What are the proper controls?"

Abe: "Proper controls? Look we only have 25 runs of this thing, we need to be focusing on interesting samples, not normal everyday people. What if we don't look at enough sick patients and miss something important?"

Charles: "We can't do this without controls. There's just no way. You have to be able to compare the patients to some estimate of what's normal, and there's no other way to know what's normal without using some runs of the Panopticon. I mean, we could use previous estimates of p123 or parasite prevalence in the general population, but there's no way that we can really believe that those are accurate."

Ben: "Yea, I remember hearing that p123 may have three or more isoforms that have been undetected in our blotting assay. The Panopticon is powerful enough to see those."

Delta: "It will see those... (Mysteriously) but what else will it see?"

Ben: "What do you mean, Delta?"

Delta: "What else will it see, under the surface? It's true we may see p123 isoforms we expect, but what about those we don't expect?"

Ben: "(Condescending) Well we'll look at those, too. Now Abe do you see why we need to run some normal patients through the machine, too?"

Abe: "Yea I guess, but I think we should do as few as possible."

Ben: "Well yea, I mean you only need to run a few controls."

Charles: "Do you guys just completely not get it? We're not running the kind of control where we know what to expect. This isn't like running a PCR with water instead of DNA template. We don't just need to know what's normal, we need to know the variability of normal. Sure, we might put two people in and they might not have parasites, but what if the third one would have? If we see parasites in half of our patients we still won't know if that's normal for the general population?"

Abe: (Sighs) "Well what do you recommend? We can't waste all of these runs."

Charles: "Split it down the middle. Half the runs, or I guess 13 if it makes you happier, could be patients, 12 could be normal people."

Abe: "I'd say it was a shame, but I guess we can still answer so many different questions."

Delta: "The more questions you ask, the more will slip through your fingers."

Ben: "Well I know that doesn't make any sense. The more questions the merrier."

Delta: "Seeing everything is like seeing nothing. It is a true Panopticon. The original Panopticon was a tower in the middle of a prison. Each cell faced the center, and the guards could see all cells from their vantage point. So will you be within the Panopticon. You see all, but in this sight you become imprisoned. Just as you can view each cell, you will find you can see none of them."

Abe: (Laughing) "So speaks the oracle! Did that make any sense?"

Ben: "Not that I can tell"

Charles: "No"

Delta: "Go ahead then."

Abe: "I don't know what Delta is talking about, but let's just run 20 people, 10 healthy, 10 sick. We'll have 5 left over just in case something goes wrong."

You decide to allow your graduate students to proceed. They agree to try the machine on ten patients and ten healthy people, analyze the data. You are pleased that they've come to the right conclusion, and put together a case-control design. You worry, though, about Delta's ominous prophecy for this experiment. Perhaps you will find out what she means when the data comes in...

Continued later.

Tuesday, December 8, 2009

On purposeful learning

I was cleaning out my Onenote files the other day and came across a poem by WB Yeats I liked and saved:

"What Then?"

His chosen comrades thought at school
He must grow a famous man;
He thought the same and lived by rule,
All his twenties crammed with toil;
`What then?' sang Plato's ghost. `What then?'

Everything he wrote was read,
After certain years he won
Sufficient money for his need,
Friends that have been friends indeed;
`What then?' sang Plato's ghost. `What then?'

All his happier dreams came true -
A small old house, wife, daughter, son,
Grounds where plum and cabbage grew,
Poets and Wits about him drew;
`What then?' sang Plato's ghost. `What then?'

`The work is done,' grown old he thought,
`According to my boyish plan;
Let the fools rage, I swerved in naught,
Something to perfection brought';
But louder sang that ghost, `What then?'

I mean to post a science-based post soon but this has been a continuation of conversations we've had in the past semester. What is the end goal of accruing knowledge for all of you? How will we use it? Or will we end up spinning our wheels with minutiae discoveries only for the sake of accruing grants we require for personal survival?

A Risky Statement on Gender

Before I get myself into too much trouble, I want to mention that N=2.

So I am sitting in my bioinformatics class during presentations and I am noticing an interesting phenomenon that I can't help but post about. Each student is supposed to present, but students can form teams.

Here's the result: The two girls in the group are part of a two person team. There are no two person teams with two males (of five). In the teams with a male and a female, the female begins the presentation and the male jumps in about 2 minutes into the 15 min presentation and (loudly) completes the remainder of the presentation. The female sits patiently for the balance of the presentation, but the male never hands the torch back.


My tentative conclusion, the barriers to women in science are often subtle, and are not only institutional.

Tuesday, December 1, 2009

A Comment on Bioinformatics

So this is my summary of bioinformatics approaches to sequence alignment, phylogeny, and structural prediction:
  1. Create a set of assumptions so that you can use dynamic programming
  2. Use dynamic programming
  3. Forget how ridiculous your assumptions were
There you go. I just saved you hours of coursework.