Sorry for being unclear. In the .csv file the first row are the names of the loci (e.g. 84_16, 90_71, 122_16 etc.). These names aren’t really important and are computer-generated (I think) from the DNA sequencing process.
The rows represent the fish. Every 2 rows is a different individual. Each row represents half the DNA (2 strands, so 2 rows). The program is designed to go through both strands, at each locus for each fish. It’s easier to explain with an example
1 = A
2 = C
3 = G
4 = T
If we look at the first strand of the first fish (FC75_Sesoko2011_Fish1Vsens5) at the first locus (84_16) we get a value of 4 in the .csv. This means we increment the T at that locus by 0.5. The second strand for that fish at the same locus is also a 4, so we increment T again by 0.5 (so it’s now at 1.0). The values for the first fish ((FC75_Sesoko2011_Fish1Vsens5) at the first locus (84_16) would be A = 0.0, C = 0.0, G = 0.0 and T = 1.0.
She wants the data to output for the first locus in the header row (ignoring the first two columns) to look like:
84_16_A, 84_16_C, 84_16_G, 84_16_T
and the row for that fish would be 0.0, 0.0, 0.0, 1.0
This process would extend for all loci, so the first fish at the 2nd locus (90_71) would be
90_71_A, 90_71_C, 90_71_G, 90_71_T
and that row would continue with 1.0, 0.0, 0.0, 0.0
The third locus (122_16) for the for the first fish is
122_16_A, 122_16_C, 122_16_G, 122_16_T
and that row would continue with 0.0, 0.5, 0.0, 0.5
In total the output would look like this for the header row just for the first 3 loci:
Fish_ID,Population,84_16_A,84_16_C,84_16_G,84_16_T,90_71_A,90_71_C,90_71_G,90_71_T,122_16_A,122_16_C,122_16_G,122_16_T
The second row would be:
FC75_Sesoko2011_Fish1Vsens5,1,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.5,0.0,0.5
Basically all rows, excluding the the first two columns (fish name and population), will have a 0.0, a 0.5, or a 1.0.
I hope that’s intelligible. Please ask if you need me to explain it better.