I am stuck on a database design problem: trying to understand how best to design a database for the archiving I am trying to do. I am not an expert in this field and could use some ideas or suggestions. I will define my specifications as best as I can. Let me know if you need more details.
to give you a background: we design primer libraries: each library is simply a set of regions we are focusing on. eg: library 1 can have 5 regions: chromosome 1 : position 1 to 10, chromosome 2: position 1 to 500 and so on and so forth. there is no limit to the number of regions a single library can include. we typically see about 1000-3000 regions in one library. (
next we run samples using these libraries. This means we take a sample and look at each region defined in the library of interest (to put it crudely). what I want to collect is performance metrics on how well a library performed across multiple samples. We have defined performance metrics which we can collect for each region within each primer library for as many samples as were run. what is the best way to store this information in a table / database structure? The simple one I can think of is having a single table for all libraries like below with these columns. Each sample will go in as a bunch of rows (one per region).
SampleName AnalysisID LibraryName RegionID Region metric being tracked
However, the main objective of creating this database is so that it is easy to look at "poorly" performing regions across different samples and determine consistency. for this, it would help if we had only one row per region of a particular library and each sample being a new column (somewhat like an excel table). But then, different samples get run for each library and therefore if sample names were columns, we would have a large number of columns growing quickly and lot of empty regions where sample 1 was run for library 1 but not for library 2. So does that mean I create a different table for each library? - Maybe. It just doesn't sounds right.
Is there a better solution? thanks for reading this email. Your expertise will be much appreciated.
I don't know that I'm going to be able to give you a specific answer to this question without seeing everything you've done for design up to this point and having more of an understanding of the data being requested. But, I'll say this...
From the sounds of it, you're attempting to either pivot your data or denormalize it before you've actually done a normalized structure. My strongest suggestion is to create the normalized structure that accurately and easily stores the data first. Then figure out what you need to do for reporting purposes. You have alternatives for this. You can create additional structures as needed, or you can use pivot methods, views, or just simple queries to restructure the data stored in more meaningful ways.
Don't try to force the structure to support formatting or reporting. Instead, focus the structure on storing the data in the most efficient manner. Then, from there you can do the other work as needed.
answered Aug 06, 2012 at 01:59 PM
Grant Fritchey ♦♦