I've broken this post into 3 parts due to character limit.
Hello,
I’m trying to figure out how to organize data from lab results so that the table relationships are in 3NF, and from that create a query(ies) allowing for the relative difference (RD) and running average RD
for each sample between instruments for the home lab and the off-site labs to be plotted. Specifically, I’m having trouble creating the queries for steps 2, 3, and 8, but I’ll show the entire process in case anyone has a better suggestion on how to get the data from the start to the end.
Note: I’m sure someone will ask about this, so I’ll explain this now. Currently, lab procedure is to run each sample twice, but there is no guarantee that won’t change in the future. Because of that, I can’t simply have 4 column for each run for each sample because it may be 2 runs at the off-site lab and 1 run at the home lab, 3 runs at both labs, etc. which will ruin 3NF. As such, each run has its own entry record. This way if in the future we change the number of runs done for each sample the database will still be in 3NF.
As an example scenario, let’s say I wanted to compare the RD and the running average RD between instrument 1 and the instruments used at the home lab. Here is an example of a query from different tables which has all the relevant data which I start off with for this process:
Run ID | Date | Lab Name | Instrument ID | Sample ID | Run Time (sec) |
1 | 11/6/19 10:05:00 AM | A | 1 | 101 | 500 |
2 | 11/6/19 10:05:00 AM | A | 1 | 101 | 520 |
3 | 11/7/19 12:00:00 PM | A | 2 | 102 | 350 |
4 | 11/7/02 12:00:00 PM | A | 2 | 102 | 380 |
5 | 11/7/19 1:00:00 PM | B | 3 | 103 | 395 |
6 | 11/7/19 1:00:00 PM | B | 3 | 103 | 392 |
7 | 11/11/19 2:00:00 PM | Home | 4 | 102 | 375 |
8 | 11/11/19 2:00:00 PM | Home | 4 | 102 | 345 |
9 | 11/11/19 2:10:00 PM | Home | 5 | 101 | 440 |
10 | 11/11/19 2:10:00 PM | Home | 5 | 101 | 450 |
11 | 11/11/19 2:20:00 PM | Home | 4 | 103 | 400 |
12 | 11/11/19 2:20:00 PM | Home | 4 | 103 | 400 |
13 | 11/12/19 2:00:00 PM | A | 2 | 211 | 343 |
14 | 11/12/19 2:00:00 PM | A | 2 | 211 | 343 |
15 | 11/12/19 3:00:00 PM | B | 3 | 205 | 535 |
16 | 11/12/19 3:00:00 PM | B | 3 | 205 | 560 |
17 | 11/14/19 9:00:00 AM | A | 1 | 214 | 295 |
18 | 11/14/19 9:00:00 AM | A | 1 | 214 | 310 |
19 | 11/18/19 1:50:00 PM | Home | 5 | 205 | 540 |
20 | 11/18/19 1:50:00 PM | Home | 5 | 205 | 542 |
21 | 11/18/19 1:55:00 PM | A | 1 | 330 | 425 |
22 | 11/18/19 1:55:00 PM | A | 1 | 330 | 520 |
23 | 11/18/19 2:00:00 PM | Home | 4 | 214 | 315 |
24 | 11/18/19 2:00:00 PM | Home | 4 | 214 | 320 |
25 | 11/18/19 2:10:00 PM | Home | 5 | 211 | 360 |
26 | 11/18/19 2:10:00 PM | Home | 5 | 211 | 350 |
What I want to end up with is this, which will allow me to plot RD or Running Average RD vs sample ID is this:
Run ID | Date | Lab Name | Instrument ID | Sample ID | RD | Running Average RD |
1 | 11/6/19 10:05:00 AM | A | 1 | 101 | 14.61% | 14.61% |
17 | 11/14/19 9:00:00 AM | A | 1 | 214 | -4.72% | 4.94% |
Here are the steps I’ve figured out so far.