User Tools

Site Tools


genetica:parsing2

How to analyze interSNP results of DB_TEST_BestMarkerCombi2.txt

The main purpose here is the same that at SingleMarker but a little more tricky.

The interSNP file contents are,

No
Chr_1
rs_No_1
Pos_No_1
p-Single-marker_1
Chr_2
rs_No_2
Pos_No_2
p-Single-marker_2
p-value
p-value_corr
SNP1_Allele_A
SNP1_Allele_B
SNP2_Allele_A
SNP2_Allele_B
beta_x1
se_x1
beta_x1D
se_x1D
beta_x2
se_x2
beta_x2D
se_x2D
beta_x1x2
se_x1x2
beta_x1x2D
se_x1x2D
beta_x1Dx2
se_x1Dx2
beta_x1Dx2D
se_x1Dx2D

:-P Lot of helpless info here. But don't despair, we are going to make it quickly now.

First, parse it again with awk for sanity. Now we will try just CHR SNP BP and P, let see if plink can do the job

echo "CHR SNP BP P" > tmpfile; tail -n +2  myresults.file | awk -F'\t' '{print $2,$3,$4,$5"\n"$6,$7,$8,$9}' >> tmpfile

I tried this,

echo "CHR SNP BP P" > test_clean_combi.txt; tail -n +2  /home/data/Bonn_GWAS/TEST5/intersnp_TEST5/outputs/ADMURimpQC2_TEST5_BestMarkerCombi2.txt | awk -F'\t' '{print $2,$3,$4,$5"\n"$6,$7,$8,$9}' >> test_clean_combi.txt

and then

plink --bfile ~/data/Variomics/ADMURimpQC2 --clump test_clean_combi.txt --clump-p1 0.01

with bad luck,

$ wc -l ADMURimpQC2_TEST5_BestMarkerCombi2.clumped
236 ADMURimpQC2_TEST5_BestMarkerCombi2.clumped

So what now? Let's try with P=0.1. This is not wrong at all. The point here is that each SNP has a low effect over the disease but a combination of them could has some noticeable effect.

plink --bfile ~/data/Variomics/ADMURimpQC2 --clump test_clean_combi.txt --clump-p1 0.1
$ wc -l ADMURimpQC2_TEST5_BestMarkerCombi2.clumped
1979 ADMURimpQC2_TEST5_BestMarkerCombi2.clumped

So now run the test and,

8-o But of course, it is too good to be true. A closer look at the awk command shows we are using the p values for single marker of each SNP. This is at least curious and I think it must be taken into account in some way.

But, let's see what happens if the combined p value is used.

$ echo "CHR SNP BP P" > test_clean_combi2.txt; tail -n +2  /home/data/Bonn_GWAS/TEST5/intersnp_TEST5/outputs/ADMURimpQC2_TEST5_BestMarkerCombi2.txt | awk -F'\t' '{print $2,$3,$4,$10"\n"$6,$7,$8,$10}' >> test_clean_combi2.txt
$ plink --bfile ~/data/Variomics/ADMURimpQC2 --clump test_clean_combi2.txt --clump-p1 0.1

So far so good,

$ plink --bfile ~/data/Variomics/ADMURimpQC2 --clump test_clean_combi2.txt --clump-p1 0.1

give us,

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Web-based version check ( --noweb to skip )
Connecting to web...  OK, v1.07 is current

Writing this text to log file [ plink.log ]
Analysis started: Thu Mar  7 09:22:41 2013

Options in effect:
        --bfile /home/osotolongo/data/Variomics/ADMURimpQC2
        --clump test_clean_combi2.txt
        --clump-p1 0.1

Reading map (extended format) from [ /home/osotolongo/data/Variomics/ADMURimpQC2.bim ] 
1034238 markers to be included from [ /home/osotolongo/data/Variomics/ADMURimpQC2.bim ]
Reading pedigree information from [ /home/osotolongo/data/Variomics/ADMURimpQC2.fam ] 
1088 individuals read from [ /home/osotolongo/data/Variomics/ADMURimpQC2.fam ] 
1088 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
319 cases, 769 controls and 0 missing
511 males, 577 females, and 0 of unspecified sex
Reading genotype bitfile from [ /home/osotolongo/data/Variomics/ADMURimpQC2.bed ] 
Detected that binary PED file is v1.00 SNP-major mode
Before frequency and genotyping pruning, there are 1034238 SNPs
1088 founders and 0 non-founders found
1130 heterozygous haploid genotypes; set to missing
Writing list of heterozygous haploid genotypes to [ plink.hh ]
Total genotyping rate in remaining individuals is 0.986847
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 1034238 SNPs
After filtering, 319 cases, 769 controls and 0 missing
After filtering, 511 males, 577 females, and 0 of unspecified sex

Parameters for --clump:
          p-value threshold for index SNPs = 0.1
      Physical (kb) threshold for clumping = 250
     LD (r-squared) threshold for clumping = 0.5
        p-value threshold for clumped SNPs = 0.01

Reading results for clumping from [ test_clean_combi2.txt ]
Extracting fields SNP and P
Indexing on all files
Writing clumped results file to [ plink.clumped ]

Analysis finished: Thu Mar  7 09:27:40 2013

and surprising, the results are also very good,

But quite different. Why?

genetica/parsing2.txt · Last modified: 2020/08/04 10:58 (external edit)