Sunday, October 31, 2010

The Hazard of High Throughput

We live in a high throughput age. Science is no exception. Microarrays, high throughput sequencing and spectrophotometers generate data on a scale and scope that would have taken years, decades, or centuries with the the old generation of technology. We are generating data on a scale that could not be conceived of a decade ago.

Great power... let's see what comes next....

The challenge, of course, is with the analysis. The data now generated is so massive that it cannot be simultaneously visualized or processed in its raw form. Nor can t-tests get you where you want to go with statistical analysis, so many are the comparisons being made.

Enter the world of the Bonferoni correction and the Benjamini-Hochberg false discovery rate. These statistical methods allow us to sift through such enormous data sets to focus on results that are significantly different from random expectation.

The hazard comes with these methods' complexity and somewhat obscure statistical assumptions. Many scientists are very well versed in the hypotheses of their discipline, but less so on the mathematics. There are so many ways to go wrong in applying these methods in a cookie cutter way that it boggles the mind. Along the lines of "Correlation implies causation" there are other such gems as: "Difference in significance does not imply significant difference".

This last mistake was featured prominently in stage 1 of a statistical analysis in a manuscript from a good lab that just passed my boss' desk. This lab has produced prominent publications in the past, and I was surprised to see this in their analysis.

What most surprised me was that the statistical method, buried deep within the methods section at the end of the manuscript, did not arouse the ire of my boss or the other lab member who read the paper. It was viewed as a good enough an answer to a tough problem. That, plus the prominence of the last author, led to a minor note somewhere in the review.

Reviews don't have dissenting opinions, but let me put one here. Statistical methods are important. So important that, in a paper that uses a high throughput method at its start, they often form the backbone for every follow up experiment. They should not be relegated to a footnote in the back and they should, wherever possible, be declared before the data are even generated, to avoid the nefarious problems of overfitting.

Papers that use statistic need statistically minded reviewers. If we aren't careful, we'll be fooled by randomness.

Sunday, October 17, 2010

Formulary of Ten

Here's a quick question: if you had to trim the formulary of a hospital down to just 10 adult meds, what would they be? I'm sure a careful and informed individual could make a comprehensive choice based on "Quality of Life Year" information and some epidemiology of the area. Here's what I am thinking:

1) Morphine
2) Aspirin
3) One or two antibiotics (Guess who doesn't remember micro enough to know which ones? I am guessing Cipro/levo and Vancomycin)
5) Insulin
6) Beta-blocker
7) A statin
8) Doxorubicin (What my boss called: the one chemo you'd bring to a desert island)
9) Warfarin
10) Lidocain

Thoughts? It might be a little over-covered for heart disease and under-covered for diabetes. Also, is it even worth it to have a chemo agent, from a utilitarian perspective? I know that's heretical from someone who is interested in cancer research, but how much life are you really adding with one therapy alone?

Also: does Morphine/lidocain deserve a place? Pain management is important, and I didn't want to leave it out. But when there are other diseases being left uncovered, should this be a top priority?

Anyone care to offer an opinion? It's an interesting way of thinking about priorities in U.S. medicine, and pondering how much we take for granted. What medicines are those that we can't make due without?

Thursday, October 14, 2010

Peer Review

If you've spent some time in graduate school you might have learned a thing or two about the open secret about peer review: it's not entirely done by peers and it's not always that thorough of a review.

Like it or not, grad students often find these reviews on their desk. I've heard of PIs leaving the job to grad students alone, though this does not occur in my lab. It's understandable, though, that this might happen. PIs have huge demands on their time. As invaluable as their opinions are, there is no way some PIs can field every manuscript. Apparently they have their own concerns about peer review (some of which apparently involves colorful dinos)

But what of the reviews performed by graduate students? There are two ways of looking at the duty of peer review.

Looked at one way, a peer review is a learning opportunity. Often the material details development on the edge of some field that contacts at least tangentially upon the student's research. It's a chance to see another lab's raw, early draft manuscript, and learn what merits publication in high level journals and what does not. It is a learning experience, and a chance to contribute to the body of science as a whole.

But let's just step out of the shiny world of gumdrops and candy canes for a moment. Peer review can be an inane chore. While students provide some value added to the journal and the author of the manuscript, it's harder to see where the review process benefits their progression in the ladder. Their role in the review is, effectively, anonymous, and comes with no honor, distinctions, gold stars, pats on the head, brownie points or first author publications.

Stated simply, there is no clear match between incentives and the quality of the peer review. Mistakes are often buried somewhere deep in the unending, jargon-filled paragraphs of the methods section. Whether a grad student takes the effort to check these methods, line by line, comes down to a question of how many hours (if any) of sleep they might prefer to have that evening.

Where's the incentive to dig in? Some grad students seem to possess a deep personal drive to throw other scientists under the bus, but it's probably a minority at most institutions. We can't rely on pure sadism to drive the scientific engine. There must be a way to reward careful and well considered reviews, particularly where they find obscure errors and tenuous methods.

I don't claim to know what the reward should be or how it could be structured. I've considered the possibility that confidential peer review is a mistake, and that publications should instead be edited by the journal seeking to publish. If Nature wants the value added of an expert opinion in the field, let them pay for it! Certainly they demand payment for their subscriptions, so why should their product be provided for free?

At risk of shifting to a seemingly radical alternative, perhaps open access and open comment system is the way to go. Take all comers that pass a basic editorial spot check, and allow insightful, observational comments to come from the community. Those comments can then be tied to the reputation of those who make them. Great insights can be noticed, and unnecessary bickering (I'm looking at you, reviewer #3), can be ignored. Online systems for grading and sorting comments based on reputation systems exist in many forms. Perhaps its time to turn them loose on science.

Concurrently, give labs the hard task of determining their own publication threshold. Perhaps more self review will go on in-house if authors know that they can publish whatever they please, and that they'd better get it right the first time. Perhaps some systems could allow papers to have a 'versioning' system, wherein they could be updated (to a point) to reflect public comment.

I don't claim that pay-to-review or open-comment systems solve the problems inherent in the current publication regime, but I think they deserve consideration. Let us at least recognize that good work deserves good incentives, and that the adding motivation to peer-review can only improve our scientific rigor.

Grad Year 1, Tiny Learnings

Tiny learnings from grad year 1, stated briefly, for my own purposes. Trite? Maybe. Suck it up.

Mentorship is everywhere, always take the opportunity to meet people doing different things in different labs. The guy down the hall may solve the problem that's been killing you all month. That actually happened to me today, and it was great.

Look out for #1, a little bit. Cognizant of the goal being science, remember that not everyone has your interests at heart. Example: you are very cheap labor for your PI and he/she isn't itching to see you leave. This cuts both ways, remember that you might not be respecting someone else's need to look out for #1. Example: Your PI has duties besides being on-call for you 24/7.
Related Fact: When your PI says it's easy, it's probably hard. If he says it's doable, it'll probably consume all of your attention for as long as you want to work on it. Remember that you are missing years of experience on the person you're talking to.

Write up your research goals early and often, and set some timelines. I almost never make my own deadlines, and almost always find new other goals that I forgot. But you can get a long way by taking a step back and seeing the big picture in which you are swimming, rather than that one method that hasn't been working for a month. Stupid library preps...

All that is gold does not glitter, and all that glitters is not gold. Corollary: Not all new technology is what it claims to be. Stronger claim: It never is. Stay frosty.

When you are on a roll, rock it. There is no more scant resource then genuine excitement. Push it.

Tons of papers are wrong. Downright wrong. Sometimes scandalously wrong. It's embarrassing, and some of it represents systemic problems with peer review and that whole mess. You can moan about it for a long time (I did). Not sure if that's worth the bellyache, but I'll keep you posted.

Don't make second-hand goals that stake your success on the assessment of the Nature/Science/NEJM intelligentsia, or the ivy pillars of the academe.  Those gals and guys have their own little world, and it should come as no small surprise that the people in that club house aren't always all that fun to play with anyway.

Say something, or suck it up. I love to whine. It's a problem that I have. I'm still waiting for an example of when it has helped me. If you have problems with something that's going on in the lab, nip it in the bud and have a conversation. You might learn something, you might stop the problem, or you might at least feel better for having it off your chest. Otherwise suck it up.

The goal is knowledge. The goal is not a first author publication. Find the pleasure in solving the puzzles and exploring the science; it is its own joy. The other rewards may find their way to you somehow or another. Remember what you are building, and why you are building it. Remember your ideal.

Saturday, August 21, 2010

Words that I am getting sick of

This is a silly thing to write about, but the are some phrases I am getting tired of. Very tired.

  • X abrogates the activity of Y signaling - Seriously? Abrogates? Again? Biologists bring this one out at every opportunity, probably because they think it makes them look scholarly.
  • X is essential to our understanding of Y - It's usually not. It might help, but it's not essential.
  • Almost infinite - This one goes without saying. It's a lot, I get it. No need to go overboard
  • On the ground - You don't see this in science, but it's everywhere in the news. Mix it up, people.
That's all for now. Next time I see "abrogates" a paper is getting thrown across the room.

Tuesday, May 18, 2010

The Enzyme is NOT the Protein

In my previous post I emphasized that the map is not the territory. I took String Theory off the shelf, dusted it off a little, and then mercilessly beat it up for no good reason. That was all well and good, but I want to address a point about biology that I think we sometimes loose site of.

The enzyme is not the protein. More precisely, the enzymatic activity is not the protein. Really said most completely, the function is not the protein.

Really, I am attacking a map. It's a map many have seen. It looks something like this. Or even like this.

As biologists, we often want to identify a protein's function. We're sort of obsessed with it, I guess. If a protein is necessary to survive, presumably it does something in the cell that allows life to happen. Maybe it helps break down bad stuff, or build up good stuff. Maybe it replicates DNA. Maybe it guards against invaders somehow. Perhaps it's a messenger of some kind, 'transducing' signals through the cytoplasmic ether.

We're helped along by the fact that a lot of genes really do seem to have a particular task to which they are assigned. Hexokinase has a very specific enzymatic reaction that it appears to be dedicated to catalyzing. It lives for the service of catalysis. Our very understanding of genes is driven largely by the observation of mendelian inheritance of genes that break these rigidly defined and clearly necessary functions. Animals with defects in these sorts of genes often suffer most obviously from some metabolic dysfunction, and we assign the gene to the metabolic disfunction.

Now that we have moved past simple metabolism to much more murky phenotypes, we seem to still be tied to the idea of proteins as acting to fulfill a certain function. It's as though they are machines designed to act as some cog that a watchmaker planned to use. Some examples: p53 protects against tumors. It's a tumor suppressor. Hif is a hypoxia sensor. VEGF is an angiogenesis factor.

Why can't these proteins have hobbies? Let's remember that evolution pressures a cell to survive, not to be elegant. In as much as survival is elegant, I guess that gets the cell there. But in the end the thing we're talking about is the most complicated gamish of protein you could think of, and it will do just about anything to get by. Whose to say that VEGF, in its off hours, doesn't swing by the glycolytic cycle for a little regulatory interlude. Perhaps Hif, during lunch hours, cruises by the spliceosome for a little slice and dice. Even the 'housekeeping gene' and paragon enzyme GAPDH appears to spend lazy sundays in the nucleus, a horrifying prediction for the one protein one function minded.

Let us remember that's what really happens in the cell is a very very complicated mess of reactions. Once a day in some cell in the human body I'd guess that every possible protein interaction pairing can and does occur. There's no reason that the cell hasn't evolved to use some of those strange pairings to give it a little more juice towards the end of its endless quest for self replication.


Which is all just to say that a protein doesn't need an easy to pin down function to be very important for the cell. Nor does a transcript with a well defined function necessarily not have other very important roles.


Here is the question we should be most concerned with. In all the cartographic glory of drawing out these maps, have we missed something essential? I've argued above that we've probably missed the fact that some proteins act in different places, and again missed that some places might occasionally be occupied by different proteins. I would argue that this is more than a frivolous attack, it explains why our experiments are so difficult to replicate and real advances are only rarely driven by deduction alone. We've only just scratched the surface of the combinatorial possibilities.

Monday, May 17, 2010

The Map is NOT the Territory

I know very little about Alfred Korzybski, and even less about general semantics for which he is famous. I do, however, know his most famous quote: "The map is not the territory". So it is with complete academic ignorance that I co-opt the term for use in the world of biology. You've been warned.

Here's the idea, as I conceive it. You have a map. You have a model. You have a theory. Your theory is awesome, beautiful, exciting and entrancing. Let's say it's string-theory. Let's say it's relativity. Let's say it's evolution. Whatever.

This thing you have, this idea, it's a map. It's a guide to something. It's a flat piece of paper that represents and distills something about reality. It is not its own reality. It is not its own truth. It's serenity is not the same as the actual cold hard truth of the thing that it describes.

In the case of a map, like the kind you hang on your wall, this is obvious. No one is lining up troops to defend the borders on the map in the atlas on your coffee table. They are lining up to defend the real borders of the real states on the real rivers of the real earth.

But in the ivory tower there is a proneness to confusion. String theory, perhaps the best example I can think of, is lauded for its elegance and seeming brilliance. Few people could imagine an explanation for the complexity of quantum physics and relativity in a set of dimensions coiled down so small that they cannot be perceived at our scale of life. It is an impressive theory.

Where is the territory to go with it? String theory has yet to make a single testable prediction. Its details are so arcane that those who study it seem to inevitably be lost in its folded dimensions, content to treat the theory a platonic ideal to which the universe we live in might aspire to reach.



The map exists in service of those who live in the territory. Our theories exist in service of our abilities to make predictions and interact with our world. We draw Australia into our map because it helps us to make predictions about the consequences of certain actions (e.g. hey, I wonder what will happen if I sail South from Indonesia).

In Biology we are plagued by pseudo-predictive models. We spend a lot of time flailing around trying to come up with "mechanism" to explain our observations. We often find that we can come up with two or three. Sometimes we bother to test the predictions that our hypothetical mechanism would imply. Too much of the time the data show muddled and confusing support. We often pick and choose the experiments we want to perform that seem to bolster our point, and dig in for the academic fight over the arcana we've brought into the world.

In the end, the mechanism doesn't mater. The elegance of our theory doesn't matter. We can fight over the lines on the map until Armageddon, but what matters is whether we've done something positive in the real world. This should be completely obvious, but I am shocked at how quickly I am loosing touch with that simple fact. Keep your wits about you, and tuck Korzybski's saying in the back of your mind.

Sunday, April 18, 2010

Evolution and Financial Markets

I am way out of my league writing this post, but life is short and its fun to get ideas out there.

I was listening to one of those iTunesU podcasts about behavioral economics and I was moved to throw the following hypothesis out into the intrawebular ether. It's an explanatory hypothesis, and I can't figure out any useful predictions that it might make, but when has that ever stopped anyone before?

The idea is this: The boom and bust cycle of the markets is driven by an ingrained human tendency to barrel forward full bore when we and those around us have resources, and hoard resources when we and those around us have little. The key here is the social nature of the tendency, we don't just pay attention to our own resources, but also of the perception of the total resources available in the environment.

Furthermore, this tendency might be the result of evolutionary pressure. When the environment is perceived as resource rich, the most evolutionarily favorable strategy is to voraciously consume those resources rapidly. This has two benefits. First, it makes those resources available to an individual for making offspring. Second, it removes those resources from the pool, keeping them away from equally hungry competitors.

Biology has shown time and time again that this approach has consequences. In almost every species studied that I know of, permitting organisms to eat as much as they want shortens their lifespans and decreases their overall health. These animals are willing to make a sacrifice to be able to consume as much as possible and have the best chance of leaving the population with abundant offspring.

However this technique only works so long as the resources are present. When hard times come, animals convert to a different strategy, sometimes a radically different strategy. In C. elegans, for example, a relative starvation causes the worm to choose a completely different life cycle. The worm forms a 'Dauer' instead of an adult. This smaller form lives many fold longer, and waits until better times to spring into its adult form. When resources are scarce, survival beats out consumption.

And of course, humans are no exception. In good times, when resources are abundant, we are prone to consume those resources. Moreover, we are prone to turn into the voracious creature of our evolutionary history, making sacrifices for our future financial health to ensure we are not left behind in the rush. When time are hard, however, we throw a bit of a switch. We act much more cautiously, recognizing that surviving through to the next boom is our top priority. We are much more cautious with our investments, and put in the due diligence that was starkly absent in the good times.

I know I haven't really identified anything new here, but it was an idea that had never occurred to me. What does all this mean? My one conclusion would be that any large financial system, inasmuch as it tends to aggregate human perception, will be prone to the boom and bust. The deep psychological root of the human tendency (this is something that goes all the way down to worms, after all) means that it will happen again, and it there is probably little that we can do to stop it.

The Open Revolution

Read this: http://www.nytimes.com/2010/04/18/education/edlife/18open-t.html

The question raised most loudly, I think, is how educational institutions and the educational process should change in a world where educational materials are much much more freely available. Like everything else in the internet age, there are two fundamental things missing. The first of these is human contact. You know, the kind you get from actually sitting in the room with a group of fellow students working on the same thing. The second is a filter. How on earth do you find the good and reputable information in the vast sea of the internet. Who compiles a menu of course offerings that pull together the richest and most savory elements. After all, you could try reading wikipedia from a to z, but I don't know that would really be an education.

Finding a way to incorporate contact and filters into university education are challenges, but they are challenges that learners can solve independently. There's always meetup.com for finding a learning group, and online aggregate ratings can steer towards the good courses (if imperfectly). But there is one problem that fundamentally cannot be solved by a single person working to elevate their education on an individual level.

This problem is certification. Once you complete a course at a traditional university it gives you credit towards something called a degree. This degree is backed by the institution, and comes with a reputation that acts as a signal to people in the marketplace. As odd and unsavory as it may sound, the degree 'brands' you, and gives you the benefits (and sometimes detriments) that brand confers. There is nothing I can think of that allows an independent learner to acquire such a certifying brand. Institutions that acknowledge and certify learned skills are desperately needed if the open courseware revolution is going to take another leap forward.

Friday, March 19, 2010

The Six Degrees of Functional Genomics

I enjoyed a whirlwind functional genomics talk by Steve Kay today about the circadian rhythm in a number of model organisms. Perhaps most interesting was his outline of the functional genomic approach. They were, paraphrased, as follows.

1) Identify the elements of the circadian rhythm machinery
2) Model and generate quantitative hypotheses about this machinery
3) Synthetically reproduce the machinery

He went on to describe how we are working on wrapping up step 1 and just beginning to dip our toes into step 2.

I should start by saying that I think the research described was truly impressive and brought to bear a large number of high throughput techniques to answer a question in a way that went beyond the usual model of figuring out what "your favorite gene" has to do with process X. He's turned things on their head and asked what does gene Y have to do with "my favorite process", and answered the question en masse.

However after he delved into the previous work on circadian rhythm I was left a little worried. In a process which, like the cell cycle machinery, transcription and translational control is so key, how can you avoid the fact that a vast swath of the cell's general control mechanisms for these fundamental processes will in some way also affect the circadian rhythm? At some point, any element of the cell that isn't completely inert is going to affect any process you can choose, albeit in perhaps a small way.

Which brings me to the Six Degrees of Functional Genomics. No part of the cell exists in isolation. No process in the cell can, really, be excised from the context of the larger cell. What we're talking about, after all, a tiny little bag of water packed chock full of different proteins. True, there are compartments, but these compartments have a nasty way of communicating with each other. In the end, every protein in the cell is functionally related to every other protein of the cell, given enough degrees of separation. Quantitative models are likely to look more like weather models than a nice damped oscillator. In the end, it's all very dependent on initial conditions and most predictions will be probabilistic in nature.

To try to develop quantitative models of cellular behavior given current knowledge may be like trying to model all of human social interaction by the Facebook "friends" network. It's heavily biased for certain kinds of interactions, like those between college dorm mates. It's true, these relationships are very important, and they tell you a lot about how the cell behaves day to day. But there's almost certainly a set of interactions that we don't know we don't know, like the interactions with parents and family (not everyone wants their parents seeing all those facebook photos).

To this extent it probably is important to try to develop the networks we currently have. We do still want to find key elements of our processes of interest. Hopefully we don't get so carried away looking for the next gene that we forget the complexity of cellular function try to build ever bigger piles of genes in our category of interest.

I'm aware that I am setting up a bit of a straw man, here, and I don't pretend that Dr. Kay is unaware of these kinds of concerns. But I think when the broader scientific community looks to functional genomics and computational biology for answers, they need to be aware of the fundamental limitations. Frankly I am not sure if the broader scientific community looks to functional genomics and computational genomics for much of anything, but that has its own problems.

Friday, February 5, 2010

Tool Time

To start my note off on a tangent, I want to recommend the "getting to work early" paradigm. Arriving to work before 7am finds very few souls clogging the arteries of the building, and very few distractions to sidetrack this grad student's taxed neural network. It is a time for settling in and thinking about the big picture. It is also a time for using all those adverbs that your PI has stricken from your science writing. Even the word "abrogate" gets boring if repeated endlessly like a sitcom laugh track.

So let me expound, veritably explicate, upon the following question: What are the bioinformatic tools that I wish I had for my research? The answer comes like a dam burst. There is simply too much material to stay above water.

  • Multiple Reference Short Read Mapping: By this I mean a tool that corrals the multiplicity of reference human genomes and variant annotations and links them together for read mapping. This might sound silly, since most reference mappers can handle a SNP here or there. But there are a number of "everything that can go wrong will" sorts of scenarios, where a common SNP variant or two can lead to horrifically erroneous mapping. This propagates into very confusing results down the line. Such results have to be carefully untangled by hand before they reveal their fundamentally invalid core. With the tools we now have available in the human genome, multiple reference mapping is becoming a must have app. So if you're out there WashU, Broad, Sanger, BGI, hear my prayer.
  • Base Quality Retouching: Like the photographs that grace the covers of the latest supermarket magazines, the output that spools off of an Illumina GA needs a little retouching. Sometimes it needs a lot of retouching. There are the issues of PCR duplicates, nucleotide chemistry and failed cycles. These are technical problems that may or may not go away any time soon. In the meantime, we need better base quality numbers. Is that set of 8 reads calling an A instead of G a SNP? Hard to say if you can't believe your base call qualities. MarkDuplicates (picard) and the GATK coming out of the Broad might have this problem mostly solved, but they remain to be packaged into a neat little bundle and handed out like candy to the rest of us.
  • The Mapping Quality Problem: To anyone who has played with the high throughput sequencing technology should know about this problem. What does it mean that a read maps to a given location? Suppose it maps to one location perfectly, but 25 with one mismatch. Suppose instead that it had mapped to one location with one mismatch and only two with two mismatches. Which gets the better mapping quality? How are these situations even comparable? I have my own thoughts on a Bayesian way of handling this situation. Maybe just saying the word Bayesian is enough to conjure my solution, and maybe it's too naive to be useful in implementation. Regardless, we need an answer sooner rather than later, lest interesting loci perish for want of a good sequencing read to feed them.
  • The SNP caller to end all SNP callers: This really does not deserve an explanation. (1) We want SNPs. (2) We want confidence scores for those SNPs that is remotely close to correct. The first part is done, we can call SNPs, but I'll be damned if I believe the kinds of confidence scores we assign to them. Getting this problem solved really requires getting the three problems above solved first.
  • Structural Variants for the Rest of Us: The gsMapper has a nice little tool for calling structural variants. Of course, 454 reads are quite amenable to this kind of work due to their length. Paired end Illumina reads should be perfectly functional too, though. I have yet to see an easy to use structural variant caller whose results I can sink my teeth into. I've seen a number of ad hoc tools, and some very high level tools which are nearly impossible to use. To get this problem solved rightly we probably need the mapping quality problem solved first. 
  • De PseudoNovo RefSembly: No I am not just trying to smash words together to sound smart. I bring this tool up because, in my ideal world, the tools are bountiful, the data overfloweth, and every grad student is above average. In this imagined world we also have this little gem of a tool for particular problems. Sometimes you have a reference. Sometimes you have multiple references. Sometimes you have some reads that map to the reference, and some reads that you think might represent some new genetic material. You'd like to map to the genome, but you'd also like to put together those delicious additional morsels into something that approximates a meal. For this you want Ref-Sembly, a tool that uses the a priori information from the genome you are working from but also openly allows and embraces the possibility of additional sequence. Such a tool should make a best guess at what such underlying sequence is and provide information about how that sequence might connect to the reference you've dutifully provided. Currently, I think people map reads to the genome and just cram the unmapped refuse into a de novo assembler. I'm not going to say that this is wrong, but, in my heart of hearts, I don't feel that it is fully right. Assembly off of a reference needs to be more nuanced than a garbage compactor. 
  • A Visualization Suite that Doesn't Crash My Computer When I Try to Look at Tens of Thousands of Reads: Does my request defy the bounds of computer science? Is my measly 8 gigs of RAM insufficient for your hungry java app? All I know is this: there is currently no replacement for putting eyes on data. I can see an indel coming from a mile away if I can visualize my reads. IGV is my current tool of choice, but it craps out (for me) when the coverage gets deep. Unfortunately this is where I need the tool the most. Maybe the answer is that I should get some more sticks of RAM, but I have to imagine that the coverage is only going to get deeper, and the problem will continue to mount. 
  • A Visualization Suite that Produces Poster-Ready Images: UCSC genome browser comes close. Very very close. But the difficulty in customizing the visualization and the the granularity of the images (with their horrific font) makes this a step down from my ideal. If only there was a "whimsical" button to enhance the graphic appeal of the data it already displays, then I think we would be there. If I am looking across 100 kb of sequence I need my exons to have a little more flair then a vertical line one pixel thick. My guess is that my hypothetical reader is now laughing that I didn't notice the "Visualize, with Feeling" button, tagged with the infamous 'blink' html, that sits dead center on the home page at genome.ucsc.edu. Maybe that person will email me.
That's it for now. I think I have exorcised the adverb demon that haunts my scientific writing. I return to the keyboard and the pipette knowing that my salvation is temporary, and that the thirst for flowery exposition shall rise again.

Sunday, January 24, 2010

Presenting to Peers

So this is just a random comment, but I realized recently how infrequently a grad student gets the opportunity to present to his/her peers. Well actually I didn't realize it, a friend of mine in the program realized in and brought it to all of our attention. He offered to start off a set of presentations that we, as students, would give to each other.

This may sound a little ridiculous at first. You're thinking: "What on earth is he talking about? Poster sessions, journal clubs, lab meetings, maybe a departmental presentation or two aren't enough?" Well, yea, I guess those are quite considerable. Our peers form most of the audience for those presentations. But in those circumstances there is almost always an authority figure present. There's your PI or even other PIs and members of your thesis committee. There's a program director or maybe even a judge who is looking for the presentation that wins a prize.

And that is great. Don't get me wrong; we need that. But where is the opportunity for us to grow into independent scholars? Where is the opportunity for us to shape a presentation style that is designed to speak to an audience that understands the material at our level, and on our own terms? I've noticed that the talks that are most captivating at conferences are those that are delivered with a sense of familiarity and comfort with both the subject mater and the audience. They're often delivered with humor and the occasional hint of wry, self-deprecating humility. If we want to train not just scientists, but communicators, we should give them the chance to train their art in an uninhibited setting. I'm concerned that always reporting our results in front of people we need to impress may further entrench systems of jargon and insular academic perspective.

I'm not trying to badmouth lab meeting or committee meetings. Those are opportunities to expose our line of research to outside challenge, and sometimes even outside attack. We need that. We need to learn to think like scientists. That means constantly revisiting our own assumptions and our own familiarity with our discipline. But maybe every once in a while the big guys could step out of the room and us baby scientists could talk about what we do to each other. It might save a little adrenalin for another day, and it might foster lines of cooperation between students that could last through our careers.  And who knows, maybe we wouldn't have to sit through so many presentations that sound like someone reading through their alphabet soup.

Friday, January 15, 2010

Too many articles, too little time

How on earth do you drudge through the literature? I guess a good number of people read the abstracts and skim for interesting figures. A few others hone their area of interest into such a tiny corner that they can actually be completely up to date (in, say, VHL and HIF1alpha interactions in hypoxic conditions in HeLa cells under 10% FBS concentration, or whatever).

But where is the fun in that? As someone who got interested in science because it was just, so damn cool, I don't want to give up that connection to the broader picture. I'm not particularly interested in participating in the construction of my very own, brand new pigeon hole. I can see the wooden outlines now: using sequencing technology for subtyping of malignant melanoma in a clinical setting using gene list X. I will know everything about how to use sequencing technology Y to look at melanoma genes X in patient group Z. I mean I will be able to take the kids to school with my most scholarly, mind numbingly in depth knowledge of XYZ. But I will have forgotten the wider goal, the bigger picture. I will have specialized in the war of 1812 only to find that World War-Eleventy-Two is raging on without me.

Solutions? Google Reader sure seems like a good shot at trying to stay mildly up to date. Even there I get ruggedly behind. What I really need is a second brain that can sift through all the sludge to find those nuggets of thrilling wisdom. Anyone know where I can get one of those?