Table of Contents
Longitudinal de FACEHBI v5
Al terminar la v5, tengo 3 puntos de imagen. Tres puntos no dan para mucho, dada la cantidad de covariables eonvolucradas pero siempre puedo darle alguna vuelta a ver que se hace.
qdec table
https://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalTwoStageModel
Primer paso: hacer el qdec table. Voy a suponer que ya tengo una tabla con todas las variables que quiero analizar para cada punto temporal. Algo como esto,
[osotolongo@brick04 long_facehbi_v025]$ head 'longitudinal_lang_025.csv' facehbi_id_fac,LID,internal_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac F001,0001,20090806,0,0,71,0,5,22,17,13,,,,,, F001,0001,20090806,2,0,73,0,5,20,22,19,50,51,50,50,50,51 F001,0001,20090806,5,0,76,0,,24,18,14,51,51,50,50,47,49 F002,0002,20131084,0,0,70,1,9,23,25,22,,,,,, F002,0002,20131084,2,0,72,1,9,24,23,29,52,52,51,52,50,54 F003,0003,20130456,0,0,70,0,6,14,17,11,,,,,, F003,0003,20130456,2,0,72,0,6,10,25,17,51,51,42,51,48,44 F004,0004,20080130,0,0,76,0,8,18,15,20,,,,,, F005,0005,20141272,0,0,68,1,9,22,26,28,,,,,,
Esto básicamente sale de la tabla de FACEHBI que se comparte cada X tiempo y que se supone que da todo limpio, correcto, etc .
Ahora, una vez resolvimos todos lo problemas de manipulación de archivos previos, vamos a construir el archivo qdec.
Una vez que tenemos las reglas hay que ordenar la info segun el formato qdec.
y entonces, si hacemos,
[osotolongo@brick04 long_facehbi_v025]$ ./make_qdec0.pl > qdec0.table [osotolongo@brick04 long_facehbi_v025]$ long_mris_slopes --qdec qdec0.table --meas thickness --hemi lh --do-pc1 --do-label --stack-pc1 lh.qdec0.thickness-pc1.stack.mgh --isec-labels lh.qdec0.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/ [osotolongo@brick04 long_facehbi_v025]$ mri_glmfit --osgm --glmdir lh.qdec0.thickness-pc1.fwhm10 --y lh.qdec0.thickness-pc1.stack.fwhm10.mgh --label lh.qdec0.fsaverage.cortex.label --surf fsaverage lh [osotolongo@brick04 long_facehbi_v025]$ mri_glmfit-sim --glmdir lh.qdec0.thickness-pc1.fwhm10 --cache 3 pos --cwp 0.05 --2spaces
Estamos buscando en que region hay un cambio porcentual del grosor para la muestra. (o algo asi)
long to cross
Voy a hacer una cosa rara. como tengo tantas covariables va a ser muy complicado que con solo tres puntos como mucho pueda hacer algo. Pero puedo pasarlo a cross-sectional y correr un fsgd. A ver como sale esto.
[osotolongo@brick03 long_facehbi_v025]$ head qdec0.table fsid fsid-base years gender age education facehbi_0001 flong_0001 0 Female 71 5 f2cehbi_0001 flong_0001 2.1 Female 73 5 f5cehbi_0001 flong_0001 3 Female 76 5 facehbi_0005 flong_0005 0 Male 68 9 f2cehbi_0004 flong_0005 2 Male 70 9 f5cehbi_0002 flong_0005 3 Male 73 9 facehbi_0006 flong_0006 0 Female 64 7 f2cehbi_0005 flong_0006 2.1 Female 66 7 f5cehbi_0003 flong_0006 3 Female 69 7 [osotolongo@brick03 long_facehbi_v025]$ long_qdec_table --qdec qdec0.table --cross --out cross.qdec0.table Parsing the qdec table: qdec0.table Collapsing the qdec table to CROSS Writing the qdec table: cross.qdec0.table [osotolongo@brick03 long_facehbi_v025]$ head cross.qdec0.table fsid years gender age education flong_0001 1.7 Female 73.33333333333333 5.0 flong_0005 1.6666666666666667 Male 70.33333333333333 9.0 flong_0006 1.7 Female 66.33333333333333 7.0 flong_0007 1.6666666666666667 Male 61.333333333333336 9.0 flong_0008 1.8333333333333333 Female 57.333333333333336 8.0 flong_0009 1.7 Female 69.33333333333333 8.0 flong_0010 1.7 Male 70.33333333333333 9.0 flong_0013 1.8 Female 71.33333333333333 5.0 flong_0014 1.7 Female 67.33333333333333 8.0
Bueno, puede ser interesante.
- qdec2fsgd.sh
#!/bin/bash ifile=$1 ofile=${ifile%.dat}'.fsgd' echo "GroupDescriptorFile 1" > ${ofile} echo "Title FACEHBI_LNG" >> ${ofile} echo "Class Main" >> ${ofile} long_qdec_table --qdec ${ifile} --cross --out tmp.cross awk '{print "Input",$1,"Main",$2,$3,$4,$5,$6}' tmp.cross | sed 's/Input fsid Main/Variables/;s/Female/0/;s/Male/1/' >> ${ofile} echo "0 0 0 0 0 1" > ${ifile%.dat}'.mtx'
y ahora conesto lo que hago es,
./qdec2fsgd.sh var9.qdec/qdec.table.dat mri_glmfit --y lh.qdec0.thickness-spc.stack.fwhm10.mgh --fsgd var9.qdec/qdec.table.fsgd --C var9.qdec/qdec.table.mtx --surf fsaverage lh --glmdir var9.qdec/lh.cross.glmdir mri_glmfit-sim --glmdir var9.qdec/lh.cross.glmdir --cache 3 neg --cwp 0.05 --2spaces
Testing first one
[osotolongo@brick03 throttle]$ ls -ltr total 56 -rw-rw-r-- 1 osotolongo osotolongo 32797 May 22 10:45 longitudinal_lang_025.csv -rw-rw-r-- 1 osotolongo osotolongo 3801 May 22 10:45 lmatch.csv -rw-rw-r-- 1 osotolongo osotolongo 9967 May 22 10:45 flong.csv -rwxrwxr-x 1 osotolongo osotolongo 2878 May 22 10:56 make_qdec.pl [osotolongo@brick03 throttle]$ mkdir var9 [osotolongo@brick03 throttle]$ head -n 1 longitudinal_lang_025.csv | sed 's/,/\n/' | cat -n 1 facehbi_id_fac 2 LID,internal_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac [osotolongo@brick03 throttle]$ head -n 1 longitudinal_lang_025.csv | sed 's/,/\n/g' | cat -n 1 facehbi_id_fac 2 LID 3 internal_id_fac 4 visit_fac 5 first_visit_0_2 6 age 7 sex 8 level_education_fac 9 fe_letter_fluency_np 10 fe_category_fluency_np 11 fe_verb_fluency_np 12 pyramid_palmtrees_images_fac 13 pyramid_palmtrees_words_fac 14 kissing_dancing_images_fac 15 kissing_dancing_words_fac 16 boston60_free_fac 17 action_naming_free_fac [osotolongo@brick03 throttle]$ ./make_qdec.pl fe_letter_fluency_np > var9/qdec.table [osotolongo@brick03 throttle]$ head var9/qdec.table fsid fsid-base years gender age education fe_letter_fluency_np facehbi_0001 flong_0001 0 Female 71 5 22 f2cehbi_0001 flong_0001 2.1 Female 73 5 20 f5cehbi_0001 flong_0001 3 Female 76 5 24 facehbi_0005 flong_0005 0 Male 68 9 22 f2cehbi_0004 flong_0005 2 Male 70 9 20 f5cehbi_0002 flong_0005 3 Male 73 9 23 facehbi_0006 flong_0006 0 Female 64 7 24 f2cehbi_0005 flong_0006 2.1 Female 66 7 23 f5cehbi_0003 flong_0006 3 Female 69 7 18 [osotolongo@brick03 throttle]$ ./qdec2fsgd.sh var9/qdec.table [osotolongo@brick03 throttle]$ head var9/qdec.table.fsgd GroupDescriptorFile 1 Title FACEHBI_LNG Class Main Variables years gender age education fe_letter_fluency_np Input flong_0001 Main 1.7 0 73.33333333333333 5.0 22.0 Input flong_0005 Main 1.6666666666666667 1 70.33333333333333 9.0 21.666666666666668 Input flong_0006 Main 1.7 0 66.33333333333333 7.0 21.666666666666668 Input flong_0007 Main 1.6666666666666667 1 61.333333333333336 9.0 25.333333333333332 Input flong_0008 Main 1.8333333333333333 0 57.333333333333336 8.0 15.666666666666666 Input flong_0009 Main 1.7 0 69.33333333333333 8.0 20.333333333333332 [osotolongo@brick03 throttle]$ cat var9/qdec.table.mtx 0 0 0 0 0 1
- dospc.sh
#!/bin/bash var=$1 # everybody look to their left long_mris_slopes --qdec var${var}/qdec.table --meas thickness --hemi lh --do-spc --do-label --stack-spc var${var}/lh.qdec.thickness-spc.stack.mgh --isec-labels var${var}/lh.qdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/ mri_glmfit --y var${var}/lh.qdec.thickness-spc.stack.mgh --fsgd var${var}/qdec.table.fsgd --C var${var}/qdec.table.mtx --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --surf fsaverage lh mri_glmfit-sim --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces # everybody look to their right long_mris_slopes --qdec var${var}/qdec.table --meas thickness --hemi rh --do-spc --do-label --stack-spc var${var}/rh.qdec.thickness-spc.stack.mgh --isec-labels var${var}/rh.qdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/ mri_glmfit --y var${var}/rh.qdec.thickness-spc.stack.mgh --fsgd var${var}/qdec.table.fsgd --C var${var}/qdec.table.mtx --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --surf fsaverage rh mri_glmfit-sim --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces # Can you feel that? Yeah grep -v "^#" var${var}/*h.qdec.thickness-spc.fwhm10/*/*.summary
[osotolongo@brick03 throttle]$ ./dospc.sh 9 ... ... ... var9/rh.qdec.thickness-spc.fwhm10/qdec.table/cache.th30.pos.sig.cluster.summary: 1 4.7058 44663 10.70 45.8 8.5 -33.1 0.03292 0.02977 0.03607 14 51.01 middletemporal
Volumen cortical
Para calcular esto mismo pero segun el volumen tengo que sacar el eitv.
[osotolongo@brick03 throttle]$ for y in $(seq 9 17); do for x in $(grep "Input" var${y}/qdec.table.fsgd | awk '{print $2}'); do v=$(mri_segstats --subject ${x} --etiv-only | grep "mm\^3" | awk '{print $4}'); echo "${x} ${v}"; done > var${y}/eitv.tsv; done [osotolongo@brick03 throttle]$ for x in $(seq 9 17); do cd var${x}; Rscript ../getzicv.r; cd ..; done [osotolongo@brick03 throttle]$ head -n 3 var9/qdec.table.fsgd > headers.txt [osotolongo@brick03 throttle]$ for x in $(seq 9 17); do tail -n +5 var${x}/qdec.table.fsgd > var${x}/tmp_fsgd; done [osotolongo@brick03 throttle]$ for x in $(seq 9 17); do join -1 2 -2 1 var${x}/tmp_fsgd var${x}/zeitv.tsv | awk -F " Input " '{print "Input",$1,$2 }' > var${x}/tmp_full; done [osotolongo@brick03 throttle]$ for x in $(seq 9 17); do labs=$(head -n 4 var${x}/qdec.table.fsgd | tail -n 1); c=$(echo "${labs} zEITV"); cat headers.txt > var${x}/vqdec.table.fsgd; echo ${c} >> var${x}/vqdec.table.fsgd; cat var${x}/tmp_full >> var${x}/vqdec.table.fsgd; done [osotolongo@brick03 throttle]$ for x in $(seq 9 17); do echo "0 0 0 0 0 1 0" > var${x}/vqdec.table.mtx; done
- dovpsc.sh
#!/bin/bash var=$1 # everybody look to their left long_mris_slopes --qdec var${var}/qdec.table --meas volume --hemi lh --do-spc --do-label --stack-spc var${var}/lh.qdec.volume-spc.stack.mgh --isec-labels var${var}/lh.vqdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/ mri_glmfit --y var${var}/lh.qdec.volume-spc.stack.mgh --fsgd var${var}/vqdec.table.fsgd --C var${var}/vqdec.table.mtx --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --surf fsaverage lh mri_glmfit-sim --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces # everybody look to their right long_mris_slopes --qdec var${var}/qdec.table --meas volume --hemi rh --do-spc --do-label --stack-spc var${var}/rh.qdec.volume-spc.stack.mgh --isec-labels var${var}/rh.vqdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/ mri_glmfit --y var${var}/rh.qdec.volume-spc.stack.mgh --fsgd var${var}/vqdec.table.fsgd --C var${var}/vqdec.table.mtx --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --surf fsaverage rh mri_glmfit-sim --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces # Can you feel that? Yeah grep -v "^#" var${var}/*h.qdec.volume-spc.fwhm10/*/*.summary
y
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do ./dovspc.sh ${x}; done
Investigando variables
El primer paso es siempre bajar la macrotabla mas actualizada de FACEHBI. Esto lo limpiamos con una de las tools del pipeline,
[osotolongo@brick03 extended]$ perl ~/acenip/tools/xls2csv.pl FACEHBI\ macrotabla\ v0123456\ 2022-12-22\ version\ 1.xlsx > macrotabla.csv
Ahora, para encontrar las variables que deseamos, hacemos,
[osotolongo@brick03 extended]$ head -n 1 macrotabla.csv | sed 's/,/\n/g' | cat -n > fields.txt
y guardamos los nombres de variable en un archivo. Por ahora lo que queremos son unas pocas,
[osotolongo@brick03 extended]$ awk -F',' 'BEGIN { OFS = ","} {print $1,$2,$3,$4,$5,$6,$36,$239,$240,$241,$302,$303,$312,$313,$304,$314,$278,$279,$506,$68,$489}' macrotabla.csv | grep -v "^,,,,,," > facehbi_sel_long_0to6.csv [osotolongo@brick03 extended]$ head facehbi_sel_long_0to6.csv internal_id_fac,facehbi_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac,syndromic_diagnosis,primary_diagnosis,fbb_pet_centilod,scd_plus_apoe4_fac,apoe 20040516,F012,0,0,85,0,8,20,13,12,,,,,50,48,scd,scd,-24.765025349999998,0,e2e3 20040516,F012,1,0,86,0,,20,12,11,,,,,,,mci,amnestic possible mci,,, 20040516,F012,2,0,87,0,8,14,8,9,45,44,,,41,,mci,amnestic possible mci,,, 20050260,F166,0,0,78,0,4,14,12,9,50,52,48,48,55,50,scd,scd,-2.0989695859999999,0,e3e3 20050260,F166,1,0,79,0,,5,13,10,,,,,,,mci,non-amnestic possible mci,,, 20050260,F166,2,0,80,0,4,10,15,6,52,52,48,52,50,49,mci,non-amnestic possible mci,-3.4808662379999999,, 20050260,F166,3,0,81,0,,11,13,13,,,,,,,mci,non-amnestic possible mci,,, 20050260,F166,4,0,82,0,,8,9,5,,,,,,,mci,amnestic possible mci,,, 20050260,F166,5,0,83,0,,8,10,6,50,50,,,51,38,mci,non-amnestic mci,11.428131260000001,,
Ahora, tengo que sacar los que convierten (por ejemplo) y los q no.
$ grep 'mci' facehbi_sel_long_0to6.csv | awk -F',' '{print $2}' | sort |uniq > conversors.list $ while read s; do grep "${s}" facehbi_sel_long_0to6.csv; done < conversors.list > flong_conversors.csv $ grep -v "`cat conversors.list`" facehbi_sel_long_0to6.csv > flong_non_conversors.csv
y una vez esto hecho puedo echar un vistazo,
- explore_dx.rmd
--- title: longitudinal NP data by DX output: html_document --- Lo que hemos hecho es seleccionar algunas variables (NP) y dividir segun la conversion de diagnostico sindromico en dos archivos (los que hacen scd-mci en algun momento son conversores) ## Plots ```{r message = FALSE, warning = FALSE} library(ggplot2) fc <- read.csv("flong_conversors.csv") fnc <- read.csv("flong_non_conversors.csv") langs <- c("fe_letter_fluency_np", "fe_category_fluency_np", "fe_verb_fluency_np", "pyramid_palmtrees_images_fac", "pyramid_palmtrees_words_fac", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "boston60_free_fac", "action_naming_free_fac") for (lang in langs) { ggplot(fnc, aes(x = .data[[lang]], fill = factor(visit_fac))) + geom_histogram(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p p + labs(title="No converters") -> p print(p) ggplot(fc, aes(x = .data[[lang]], fill = factor(visit_fac))) + geom_histogram(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p p + ggtitle("SCD -> MCI converters") -> p print(p) ggplot(fnc, aes(x = visit_fac, y = .data[[lang]], fill = factor(visit_fac))) + geom_boxplot(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p p + ggtitle("No converters") -> p print(p) ggplot(fc, aes(x = visit_fac, y = .data[[lang]], fill = factor(visit_fac))) + geom_boxplot(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p p + ggtitle("SCD -> MCI converters") -> p print(p) ggplot(fnc, aes(x = visit_fac, y = .data[[lang]], color = facehbi_id_fac)) + geom_line() + stat_summary(aes(group = 1), geom = "line", fun.y = median, size = 2, color = "red") + theme(legend.position="none") -> p p + ggtitle("SCD -> MCI converters") -> p print(p) ggplot(fc, aes(x = visit_fac, y = .data[[lang]], color = facehbi_id_fac)) + geom_line() + stat_summary(aes(group = 1), geom = "line", fun.y = median, size = 2, color = "red") + theme(legend.position="none") -> p p + ggtitle("SCD -> MCI converters") -> p print(p) } ```
Si lo que quiero es sacar los que convierten a amiloide positivo,
[osotolongo@brick03 extended]$ tail -n +2 facehbi_sel_long_0to6.csv | awk -F',' '{if ($19 && $19 > 20) print $2}' | sort | uniq > amiloid_pos.list [osotolongo@brick03 extended]$ while read s; do grep "${s}" facehbi_sel_long_0to6.csv; done < amiloid_pos.list > flong_pos.csv [osotolongo@brick03 extended]$ head -n 1 facehbi_sel_long_0to6.csv > headers.txt [osotolongo@brick03 extended]$ cat headers.txt flong_pos_nohead.csv > flong_pos.csv [osotolongo@brick03 extended]$ grep -v "`cat amiloid_pos.list`" facehbi_sel_long_0to6.csv > flong_neg.csv
OK, ahora deberia correr algun modelo que me diga si hay diferencias entre las dos categorias.
Longitudinal mixed effects model
> fc <- read.csv("flong_conversors_fixed.csv") > a <- lme(fe_letter_fluency_np ~ age + sex + level_education_fac, data = fc, random = ~ age | facehbi_id_fac, method="ML") > summary(a) Linear mixed-effects model fit by maximum likelihood Data: fc AIC BIC logLik 2335.434 2368.055 -1159.717 Random effects: Formula: ~age | facehbi_id_fac Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 3.3007040745 (Intr) age 0.0000334666 0 Residual 2.8988033948 Fixed effects: fe_letter_fluency_np ~ age + sex + level_education_fac Value Std.Error DF t-value p-value (Intercept) 16.588978 3.942268 364 4.207978 0.0000 age -0.074779 0.047785 364 -1.564904 0.1185 sex 0.933704 0.894277 68 1.044088 0.3001 level_education_fac 0.787147 0.243211 68 3.236476 0.0019 Correlation: (Intr) age sex age -0.904 sex -0.125 0.165 level_education_fac -0.458 0.060 -0.261 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -2.74802315 -0.65308936 -0.02842913 0.61876952 2.52676949 Number of Observations: 436 Number of Groups: 71
Prediccion a largo plazo
La variable X en baseline, predice algun cambio a los 5 años en el diagnostico?
[osotolongo@brick03 extended]$ awk -F',' '{if ($3=="5") print}' facehbi_sel_long_0to6.csv > facehbi_v5.csv [osotolongo@brick03 extended]$ awk -F',' '{if ($3==$4) print}' facehbi_sel_long_0to6.csv > facehbi_first.csv [osotolongo@brick03 extended]$ head -n 1 facehbi_sel_long_0to6.csv > facehbi_headers.csv [osotolongo@brick03 extended]$ cat facehbi_headers.csv facehbi_first.csv > facehbi_first_h.csv [osotolongo@brick03 extended]$ cat facehbi_headers.csv facehbi_v5.csv > facehbi_v5_h.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1","$17","$19}' facehbi_v5_h.csv > dxv5.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1","$5","$6","$7","$10","$13","$14","$16","$19}' facehbi_first_h.csv > baseline.csv [osotolongo@brick03 extended]$ join -t, baseline.csv dxv5.csv > facehbi_response.csv
Basic shit: comparar dos grupos
> setwd("/nas/osotolongo/long_facehbi_v025/extended") > f <- read.csv("facehbi_response.csv") > library(ggplot2) > ggplot(f, aes(x=syndromic_diagnosis, y=kissing_dancing_images_fac)) + stat_boxplot(geom = "errorbar", width = 0.25) + geom_boxplot()
- explore_response.rmd
--- title: Prediccion a largo plazo de las variables de verbos output: html_document --- ```{r warning = FALSE} date() ``` Lo que hemos hecho es seleccionar algunas variables (NP) que indican la fluencia verbal y ver si hay algun cambio significativo en el diagnostico en la visita 5 ### Plots ```{r message = FALSE, warning = FALSE} library(ggplot2) fc <- read.csv("facehbi_response.csv") langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac") for (lang in langs) { out_fig <- paste0("response_",lang, ".png") ggplot(fc, aes(x = syndromic_diagnosis, y = .data[[lang]], fill = syndromic_diagnosis)) + stat_boxplot(geom = "errorbar", width = 0.25) + geom_boxplot() -> p print(p) } ``` Ya que parece que hay diferencias significativas, vamos a ver que dice la estadistica, ### Stats ```{r message = FALSE, warning = FALSE} fc <- read.csv("facehbi_response.csv") langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac") for (lang in langs) { a <- t.test(as.formula(paste(lang, '~ factor(syndromic_diagnosis)')), data = fc, na.action = "na.omit") print(lang) print(a) } ```
Diferencia variable entre visitas
Un problema añadido de los datos es que algunos sujetos empiezan en la visita 0 y otros en la visita 2. Por esto es importante incluir la diferencia temporal entre visitas como covariable.
esto es mas o menos sencillo. Primero determinar cuale es la primera visita,
[osotolongo@brick03 extended]$ awk -F',' '{if ($3==$4) print}' facehbi_sel_long_0to6.csv | awk -F',' '{print $1","$2","$4}' > first_visit.csv
y ahora determinar en XNAT las fechas de cada MRI y restarlas. La complejidad es unicamente determinar cual de los proyectos hay que interrogar en cada caso,
- get_dates.sh
#!/bin/bash echo "internal_id_fac,ydiff" while read -r line; do nhc=$(echo ${line} | awk -F',' '{print $1}') sbj=$(echo ${line} | awk -F',' '{print $2}') visit=$(echo ${line} | awk -F',' '{print $3}') if [[ ${visit} == '0' ]]; then date0=$(xnatapic list_experiments --project_id facehbi --subject_id ${sbj} --modality MRI --date | sed 's/.*,//') else date0=$(xnatapic list_experiments --project_id f2cehbi --subject_id ${sbj} --modality MRI --date | sed 's/.*,//') fi date5=$(xnatapic list_experiments --project_id f5cehbi --subject_id ${sbj} --modality MRI --date 2>/dev/null| sed 's/.*,//') if [[ ! -z ${date5} ]]; then ddif=$(( ($(date --date=${date5} +%s) - $(date --date=${date0} +%s))/(60*60*24) )) ydif=$(printf "%.2f\n" $(echo "${ddif}/365" | bc -l)) if [[ $(bc <<< "${ydif} > 0.1") > 0 ]]; then echo "${nhc},${ydif}" fi fi done < first_visit.csv
y entonces,
[osotolongo@brick03 extended]$ ./get_dates.sh > facehbi_ydiff.csv [osotolongo@brick03 extended]$ head facehbi_ydiff.csv internal_id_fac,ydiff 20050260,5.08 20060090,5.15 20070072,5.04 20070303,5.41 20080337,5.31 20080565,5.23 20080716,4.98 20081224,5.01 20090481,5.10
FSGA
Basicamente, tengo los datos para el analisis.
[osotolongo@brick03 extended]$ join -t, facehbi_response.csv facehbi_ydiff.csv > fsga_data.csv [osotolongo@brick03 extended]$ head -n 1 fsga_data.csv | sed 's/,/\n/g' | cat -n 1 internal_id_fac 2 age 3 sex 4 level_education_fac 5 fe_verb_fluency_np 6 kissing_dancing_images_fac 7 kissing_dancing_words_fac 8 action_naming_free_fac 9 fbb_pet_centilod 10 syndromic_diagnosis 11 fbb_pet_centilod 12 ydiff
Ahora debo ordenar los sujetos en directorios de Freesurfer, trayendo los directrios de XNAT. Tengo nombre de sujeto y visita, asi que es bastante sencillo utilizando la API,
#!/bin/bash data='fsga_data.csv' v='first_visit.csv' for x in $(tail -n +2 ${data} | awk -F',' '{print $1}'); do l=$(grep ${x} ${v}) sbj=$(echo ${l} | awk -F',' '{print $2}') visit=$(echo ${l} | awk -F',' '{print $3}') mkdir ${SUBJECTS_DIR}/${x} if [[ ${visit} == '0' ]]; then xp=$(xnatapic list_experiments --project_id facehbi --subject_id ${sbj} --modality MRI) else xp=$(xnatapic list_experiments --project_id f2cehbi --subject_id ${sbj} --modality MRI) fi stmp=$(mktemp -d) stgz=$(xnatapic get_fsresults --experiment_id ${xp} --all-tgz ${stmp}) tar xzf ${stmp}/${stgz} --strip-components=1 -C ${SUBJECTS_DIR}/${x} --exclude='fsaverage' rm -rf ${stmp}/ echo "${sbj} --> ${xp} --> ${SUBJECTS_DIR}/${x}" done
y
[osotolongo@brick03 fsga]$ ./getfs.sh F166 --> XNAT4_E00321 --> /old_nas/subjects/20050260 F146 --> XNAT4_E00301 --> /old_nas/subjects/20060090 F161 --> XNAT4_E00316 --> /old_nas/subjects/20070072 F013 --> XNAT4_E00174 --> /old_nas/subjects/20070303 F070 --> XNAT4_E00227 --> /old_nas/subjects/20080337 F069 --> XNAT4_E00226 --> /old_nas/subjects/20080565 F007 --> XNAT4_E00168 --> /old_nas/subjects/20080716 F151 --> XNAT4_E00306 --> /old_nas/subjects/20081224 F181 --> XNAT4_E00335 --> /old_nas/subjects/20090481 F001 --> XNAT4_E00162 --> /old_nas/subjects/20090806 ... ...
Una ves tengo los directorios de FS, sacar el ICV de los sujetos es simple. Ademas, podemos normalizarlos rapidamente usando R,
- geticv.sh
#!/bin/bash data='fsga_data.csv' tdf=$(mktemp) echo "internal_id_fac,icv" > ${tdf} for x in $(tail -n +2 ${data} | awk -F',' '{print $1}'); do icv=$(grep EstimatedTotalIntraCranialVol /old_nas/subjects/${x}/stats/aseg.stats | awk -F', ' '{print $4}') echo "${x},${icv}" >> ${tdf} done echo "read.csv('${tdf}') -> x" > convert.r echo "x\$zICV <- (x\$icv - mean(x\$icv))/sd(x\$icv)" >> convert.r echo "xw <- x[c(1,3)]" >> convert.r echo "write.csv(xw, file='icvs.csv', row.names=FALSE, quote=FALSE)" >> convert.r Rscript convert.r
asi que lo añadimos a los datos,
[osotolongo@brick03 fsga]$ join -t, fsga_data.csv icvs.csv > fsga_data_wicv.csv
Antes de ejecutar el analisis.
Ahora, basado en estos datos vamos a fabricar los archivos fsgd. Como he de procesar diferentes variables, voy a hacer unos templates, primero para montar los archivos fsgd,
- organize.sh
#!/bin/bash mkdir fsga_<code> awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv awk -F',' '{print "Input "$1" Main",$2,$3,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt echo 'Class Main' >> fsga_<code>/headers_<code>.txt echo '0 0 0 0 1' > fsga_<code>/<code>.mtx cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd awk -F',' '{print "Input "$1" Main",$2,$3,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd echo '0 0 0 0 1 0' > fsga_<code>/<code>v.mtx
y luego para ejecutarlos pasos de cada analisis,
- runner.sh
#!/bin/bash ## area mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.area.10.mgh mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.area.10.mgh mri_glmfit --y lh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>a.glmdir mri_glmfit --y rh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>a.glmdir mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces ## thickness mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.thickness.10.mgh mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.thickness.10.mgh mri_glmfit --y lh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage lh --glmdir lh.<code>.glmdir mri_glmfit --y rh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage rh --glmdir rh.<code>.glmdir mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces ## volume mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.volume.10.mgh mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.volume.10.mgh mri_glmfit --y lh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>v.glmdir mri_glmfit --y rh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>v.glmdir mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces
Finalmente, tomo cada template y monto los archivos que he de ejecutar para cada variable, segun la columna que ocupa en el archivo de datos,
- run_all.sh
#!/bin/bash code=$1 shift sed "s/<code>/${code}/g" templates/organize.sh > org${code}.sh chmod +x org${code}.sh ./org${code}.sh rm org${code}.sh sed "s/<code>/${code}/g" templates/runner.sh > fsga_${code}/run.sh cd fsga_${code}/ chmod +x run.sh ./run.sh cd .. grep -v "^#" fsga_${code}/*glmdir/${code}*/*sig.cluster.summary
y si hago,
[osotolongo@brick01 fsga]$ ./run_all.sh 5
montare los archivos fsgd,
[osotolongo@brick01 fsga]$ head fsga_5/5.fsgd GroupDescriptorFile 1 Title FACEBI_verbs_5 Class Main Variables Age Gender Education fe_verb_fluency_np Input 20050260 Main 78 0 4 9 Input 20060090 Main 70 0 9 22 Input 20070072 Main 76 1 8 9 Input 20070303 Main 69 0 5 12 Input 20080337 Main 75 1 9 17 Input 20080565 Main 73 0 5 12
y ejecutare los analisis,
[osotolongo@brick01 fsga]$ ls fsga_5/*glmdir/ fsga_5/lh.5a.glmdir/: 5v cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat fsga_5/lh.5.glmdir/: 5 cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat fsga_5/lh.5v.glmdir/: 5v cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat fsga_5/rh.5a.glmdir/: 5v cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat fsga_5/rh.5.glmdir/: 5 cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat fsga_5/rh.5v.glmdir/: 5v cache.mri_glmfit-sim.log dof.dat mask.mgh rstd.mgh sar1.mgh Xg.dat y.fsgd beta.mgh csd fwhm.dat mri_glmfit.log rvar.mgh surface X.mat
para la variable fe_verb_fluency_np que ocupa la quinta columna en el archivo de datos.
Esto lo puedo automatizar si hago dos templates,
- organize.sh
#!/bin/bash mkdir fsga_<code> awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv awk -F',' '{print "Input "$1" Main",$2,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt echo 'Class Main' >> fsga_<code>/headers_<code>.txt echo '0 0 0 1' > fsga_<code>/<code>.mtx cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd awk -F',' '{print "Input "$1" Main",$2,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd echo '0 0 0 1 0' > fsga_<code>/<code>v.mtx
- runner.sh
#!/bin/bash mkdir fsga_<code> awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv awk -F',' '{print "Input "$1" Main",$2,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt echo 'Class Main' >> fsga_<code>/headers_<code>.txt echo '0 0 0 1' > fsga_<code>/<code>.mtx cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd awk -F',' '{print "Input "$1" Main",$2,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd echo '0 0 0 1 0' > fsga_<code>/<code>v.mtx [osotolongo@brick03 fsga]$ cat templates/runner.sh #!/bin/bash ## area mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.area.10.mgh mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.area.10.mgh mri_glmfit --y lh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>a.glmdir mri_glmfit --y rh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>a.glmdir mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces ## thickness mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.thickness.10.mgh mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.thickness.10.mgh mri_glmfit --y lh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage lh --glmdir lh.<code>.glmdir mri_glmfit --y rh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage rh --glmdir rh.<code>.glmdir mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces ## volume mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.volume.10.mgh mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.volume.10.mgh mri_glmfit --y lh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>v.glmdir mri_glmfit --y rh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>v.glmdir mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces
y un script que escoja las variables para escribir y ejecutar en cada caso todos los analisis
- run_all.sh
#!/bin/bash code=$1 shift sed "s/<code>/${code}/g" templates/organize.sh > org${code}.sh chmod +x org${code}.sh ./org${code}.sh rm org${code}.sh sed "s/<code>/${code}/g" templates/runner.sh > fsga_${code}/run.sh cd fsga_${code}/ chmod +x run.sh ./run.sh cd .. grep -v "^#" fsga_${code}/*glmdir/${code}*/*sig.cluster.summary
Esto lo ejecuto para cada una de las variables que quiera,
[osotolongo@brick03 fsga]$ ./run_all.sh 5 ... ... ... [osotolongo@brick03 fsga]$ ./run_all.sh 7 ... ... ...
Guiandome por el header deberia ser simple,
[osotolongo@brick03 fsga]$ head -n 1 fsga_data_swicv.csv | sed 's/,/\n/g' | cat -n 1 internal_id_fac 2 age 3 sex 4 level_education_fac 5 fe_verb_fluency_np 6 kissing_dancing_images_fac 7 kissing_dancing_words_fac 8 boston60_free_fac 9 action_naming_free_fac 10 fbb_pet_centilod 11 syndromic_diagnosis 12 fbb_pet_centilod 13 ydiff 14 zICV
Desgraciadamente, despues de ejecutarlo para cada una de las variables veo que los cluster no sobreviven a la correcion por comparaciones multiples.
Ejemplo, fe_verb_fluency_np
Subcortical
Voy a mirar las variables contra los cambios subcorticales. Asi que necesito sacar los valores de las aseg.stats,
[osotolongo@brick03 extended]$ a=$(cat fsga/fsga_mri.csv | tr '\n' ',' | sed 's/,/ -s /g;s/-s $//') [osotolongo@brick03 extended]$ asegstats2table -s ${a} --meas volume --tablefile fsga/aseg_stats.txt
Este archivo lo uno con el que he estado usando para hacer el FSGA,
library(readr) fs <- read_tsv("fsga/aseg_stats.txt", show_col_types = FALSE) colnames(fs)[1] <- 'internal_id_fac' fv <- read.csv("fsga/fsga_cs.csv") merge(fs, fv, by = 'internal_id_fac') -> s
y ahora puedo ver si hay relacion entre los volumenes subcorticales en la visita 5 y los test de verbos en baseline,
subs <- c("Hippocampus", "Amygdala") langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac") for (sub in subs){ s$"tmp" <- eval(parse(text = paste0('s$"Left-',sub,'" + s$"Right-',sub,'"'))) a <- lm(s$"tmp" ~ s$"EstimatedTotalIntraCranialVol") s$"adj" <- s$"tmp" - a$coefficients[2] * (s$"EstimatedTotalIntraCranialVol" - mean(s$"EstimatedTotalIntraCranialVol", na.rm = T)) for (lang in langs) { m <- lm(as.formula(paste0('s$adj ~ s$age + s$level_education_fac + s$', lang))) print(paste(sub, "~", lang)) print(summary(m)) } }
Lo que nos interesa en particular es el hipocampo,
## [1] "Hippocampus ~ fe_verb_fluency_np" ## ## Call: ## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", ## lang))) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1963.2 -388.4 4.9 352.1 2020.0 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10212.612 626.877 16.291 < 2e-16 *** ## s$age -43.924 7.772 -5.652 1.02e-07 *** ## s$level_education_fac -63.960 33.030 -1.936 0.05507 . ## s$fe_verb_fluency_np 26.034 9.672 2.692 0.00808 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 598.5 on 125 degrees of freedom ## Multiple R-squared: 0.2855, Adjusted R-squared: 0.2684 ## F-statistic: 16.65 on 3 and 125 DF, p-value: 3.667e-09 ## ## [1] "Hippocampus ~ kissing_dancing_images_fac" ## ## Call: ## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", ## lang))) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1851.83 -314.39 48.22 385.69 2081.59 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9082.65 1872.35 4.851 6.76e-06 *** ## s$age -46.50 10.14 -4.587 1.82e-05 *** ## s$level_education_fac -10.95 41.44 -0.264 0.792 ## s$kissing_dancing_images_fac 29.18 34.05 0.857 0.394 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 629.6 on 73 degrees of freedom ## (52 observations deleted due to missingness) ## Multiple R-squared: 0.2524, Adjusted R-squared: 0.2216 ## F-statistic: 8.214 on 3 and 73 DF, p-value: 8.809e-05 ## ## [1] "Hippocampus ~ kissing_dancing_words_fac" ## ## Call: ## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", ## lang))) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1620.36 -333.88 -14.26 358.11 1968.13 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6003.57 2356.73 2.547 0.0132 * ## s$age -48.69 9.99 -4.873 7.04e-06 *** ## s$level_education_fac -18.41 41.56 -0.443 0.6593 ## s$kissing_dancing_words_fac 95.09 44.81 2.122 0.0375 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 612 on 67 degrees of freedom ## (58 observations deleted due to missingness) ## Multiple R-squared: 0.3243, Adjusted R-squared: 0.2941 ## F-statistic: 10.72 on 3 and 67 DF, p-value: 7.654e-06 ## ## [1] "Hippocampus ~ action_naming_free_fac" ## ## Call: ## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", ## lang))) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2024.53 -357.40 13.68 362.28 2128.60 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10210.515 1040.783 9.810 < 2e-16 *** ## s$age -47.619 8.116 -5.868 4.14e-08 *** ## s$level_education_fac -42.310 33.791 -1.252 0.213 ## s$action_naming_free_fac 12.385 15.673 0.790 0.431 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 625 on 118 degrees of freedom ## (7 observations deleted due to missingness) ## Multiple R-squared: 0.244, Adjusted R-squared: 0.2248 ## F-statistic: 12.7 on 3 and 118 DF, p-value: 2.996e-07
y vemos como funciona el modelo para cada variable,
Variable | $R^{2}$ | p-value |
---|---|---|
fe_verb_fluency_np | 0.2684 | 0.00808 |
kissing_dancing_images_fac | 0.2216 | 0.394 |
kissing_dancing_words_fac | 0.294 | 0.0375 |
action_naming_free_fac | 0.2248 | 0.431 |
y nos muestra que debemos prestar especial atencion a la variable fe_verb_fluency_np.
y/o cortical
Bueno, esto que he hecho con el volumen subcortical tambien puedo hacerlo con algun area del cortex en especifico. Pongamos por caso, el enthorinal cortex,
[osotolongo@brick03 extended]$ a=$(cat fsga/fsga_mri.csv | tr '\n' ',' | sed 's/,/ -s /g;s/-s $//') [osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m volume -t fsga/aparc_lh_vol_stats.txt --hemi lh SUBJECTS_DIR : /old_nas/subjects Parsing the .stats files Building the table.. Writing the table to fsga/aparc_lh_vol_stats.txt [osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m volume -t fsga/aparc_rh_vol_stats.txt --hemi rh SUBJECTS_DIR : /old_nas/subjects Parsing the .stats files Building the table.. Writing the table to fsga/aparc_rh_vol_stats.txt [osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m thickness -t fsga/aparc_lh_thick_stats.txt --hemi lh SUBJECTS_DIR : /old_nas/subjects Parsing the .stats files Building the table.. Writing the table to fsga/aparc_lh_thick_stats.txt [osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m thickness -t fsga/aparc_rh_thick_stats.txt --hemi rh SUBJECTS_DIR : /old_nas/subjects Parsing the .stats files Building the table.. Writing the table to fsga/aparc_rh_thick_stats.txt
Composites
Resumen
ATN
Voy a intentar sacar el ATN de cada sujeto en a visita 5.
Primero,
[osotolongo@brick03 extended]$ awk -F ',' '{print $2","$1}' facehbi_sel_long_0to6.csv | uniq | tail -n +2 | sort -t, > facehbi.internos
Ahora,
[osotolongo@brick03 extended]$ xnat_pullfs.pl -s aseg -p f5cehbi -o base_aseg.csv [osotolongo@brick03 extended]$ xnat_pullfs.pl -s aparc -p f5cehbi -o base_aparc.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_aseg.csv > tmp_aseg.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_aparc.csv > tmp_aparc.csv [osotolongo@brick03 extended]$ join -t, tmp_aseg.csv tmp_aparc.csv | awk -F',' '{$1=$50=$51=""; print $0}' | sed 's/^ //;s/ /,/g;s/,,,/,/g' > base_full.csv [osotolongo@brick03 extended]$ (head -n 1 base_full.csv && tail -n +2 base_full.csv | sort -t,) > base_full_sorted.csv [osotolongo@brick03 extended]$ xnat_get_age.pl -x f5cehbi [osotolongo@brick03 extended]$ sort -t, -n f5cehbi_age_data.csv > f5cehbi_age_data_sorted.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' f5cehbi_age_data_sorted.csv > tmp_age.csv [osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_full_sorted.csv > tmp_base.csv [osotolongo@brick03 extended]$ (head -n 1 tmp_age.csv && tail -n +2 tmp_age.csv | sort -t,) > tmp_age_sorted.csv [osotolongo@brick03 extended]$ (head -n 1 tmp_base.csv && tail -n +2 tmp_base.csv | sort -t,) > tmp_base_sorted.csv [osotolongo@brick03 extended]$ join -t, tmp_age_sorted.csv tmp_base_sorted.csv | awk -F',' '{$1=$5=$6=""; print $0}' | sed 's/^ //;s/ /,/g;s/,,,/,/g'> input_data.csv [osotolongo@brick03 extended]$ Rscript nplus.r
Y tenemos la probabilidad de neurodegeneracion,
[osotolongo@brick03 extended]$ head classifier_output.csv Subject_ID,Date,ND,Nprob F001,2020-01-27,0,0.00457985332697753 F005,2020-02-03,0,0.0697559197037421 F006,2020-01-08,0,0.0250949466283251 F007,2019-12-10,0,1.8018003703755e-05 F008,2020-06-22,1,0.522450817937238 F009,2020-01-23,0,0.202625745362684 F010,2020-01-22,0,0.00476068613625532 F013,2020-06-13,1,0.550435980450324 F013,2020-06-13,1,0.550435980450324
El FBB en v5 es mas facil,
[osotolongo@brick03 extended]$ (head -n 1 facehbi_sel_long_0to6.csv | awk -F',' '{print "Subject_ID,"$1","$19}' && awk -F',' '{if ($3=="5" && $19) print $2","$1","$19}' facehbi_sel_long_0to6.csv | sort -t,) > v5_centiloid.csv
Tengo muy pocos LCR, asi que no lo puedo usar.
[osotolongo@brick03 extended]$ join -t, v5_centiloid.csv classifier_output.csv | uniq > got_an.csv [osotolongo@brick03 extended]$ head got_an.csv Subject_ID,internal_id_fac,fbb_pet_centilod,Date,ND,Nprob F001,20090806,5.995931423,2020-01-27,0,0.00457985332697753 F005,20141272,-1.8057342300000001,2020-02-03,0,0.0697559197037421 F007,20080716,108.1401346,2019-12-10,0,1.8018003703755e-05 F008,20131483,-4.0650127830000002,2020-06-22,1,0.522450817937238 F009,20141277,31.942835039999999,2020-01-23,0,0.202625745362684 F010,20141280,-13.788423529999999,2020-01-22,0,0.00476068613625532 F013,20070303,42.2375829,2020-06-13,1,0.550435980450324 F014,20100381,2.8114481339999999,2020-03-12,0,0.485388793964639 F015,20141087,-6.2935311299999999,2020-01-30,0,0.00990242072683091
Y haciendo un poco de logica saclos amiloide positivos y con neurodegeneracion,
[osotolongo@brick03 extended]$ awk -F',' '{if ($3>13.5 && $5) print $0",1"; else print $0",0"}' got_an.csv | sed 's/Nprob,1/Nprob,AN/' > an.csv [osotolongo@brick03 extended]$ head an.csv Subject_ID,internal_id_fac,fbb_pet_centilod,Date,ND,Nprob,AN F001,20090806,5.995931423,2020-01-27,0,0.00457985332697753,0 F005,20141272,-1.8057342300000001,2020-02-03,0,0.0697559197037421,0 F007,20080716,108.1401346,2019-12-10,0,1.8018003703755e-05,0 F008,20131483,-4.0650127830000002,2020-06-22,1,0.522450817937238,0 F009,20141277,31.942835039999999,2020-01-23,0,0.202625745362684,0 F010,20141280,-13.788423529999999,2020-01-22,0,0.00476068613625532,0 F013,20070303,42.2375829,2020-06-13,1,0.550435980450324,1 F014,20100381,2.8114481339999999,2020-03-12,0,0.485388793964639,0 F015,20141087,-6.2935311299999999,2020-01-30,0,0.00990242072683091,0