User Tools

Site Tools


neuroimagen:lang_flongv5

Longitudinal de FACEHBI v5

Al terminar la v5, tengo 3 puntos de imagen. Tres puntos no dan para mucho, dada la cantidad de covariables eonvolucradas pero siempre puedo darle alguna vuelta a ver que se hace.

qdec table

https://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalTwoStageModel

Primer paso: hacer el qdec table. Voy a suponer que ya tengo una tabla con todas las variables que quiero analizar para cada punto temporal. Algo como esto,

[osotolongo@brick04 long_facehbi_v025]$ head 'longitudinal_lang_025.csv'
facehbi_id_fac,LID,internal_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac
F001,0001,20090806,0,0,71,0,5,22,17,13,,,,,,
F001,0001,20090806,2,0,73,0,5,20,22,19,50,51,50,50,50,51
F001,0001,20090806,5,0,76,0,,24,18,14,51,51,50,50,47,49
F002,0002,20131084,0,0,70,1,9,23,25,22,,,,,,
F002,0002,20131084,2,0,72,1,9,24,23,29,52,52,51,52,50,54
F003,0003,20130456,0,0,70,0,6,14,17,11,,,,,,
F003,0003,20130456,2,0,72,0,6,10,25,17,51,51,42,51,48,44
F004,0004,20080130,0,0,76,0,8,18,15,20,,,,,,
F005,0005,20141272,0,0,68,1,9,22,26,28,,,,,,

Esto básicamente sale de la tabla de FACEHBI que se comparte cada X tiempo y que se supone que da todo limpio, correcto, etc . LOL

Ahora, una vez resolvimos todos lo problemas de manipulación de archivos previos, vamos a construir el archivo qdec.

Una vez que tenemos las reglas hay que ordenar la info segun el formato qdec.

y entonces, si hacemos,

[osotolongo@brick04 long_facehbi_v025]$ ./make_qdec0.pl > qdec0.table
[osotolongo@brick04 long_facehbi_v025]$ long_mris_slopes --qdec qdec0.table --meas thickness --hemi lh --do-pc1 --do-label --stack-pc1 lh.qdec0.thickness-pc1.stack.mgh --isec-labels lh.qdec0.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/
[osotolongo@brick04 long_facehbi_v025]$ mri_glmfit --osgm --glmdir lh.qdec0.thickness-pc1.fwhm10 --y lh.qdec0.thickness-pc1.stack.fwhm10.mgh --label lh.qdec0.fsaverage.cortex.label --surf fsaverage lh
[osotolongo@brick04 long_facehbi_v025]$ mri_glmfit-sim --glmdir lh.qdec0.thickness-pc1.fwhm10 --cache 3 pos --cwp 0.05 --2spaces

Estamos buscando en que region hay un cambio porcentual del grosor para la muestra. (o algo asi)

left right

long to cross

Voy a hacer una cosa rara. como tengo tantas covariables va a ser muy complicado que con solo tres puntos como mucho pueda hacer algo. Pero puedo pasarlo a cross-sectional y correr un fsgd. A ver como sale esto.

[osotolongo@brick03 long_facehbi_v025]$ head qdec0.table 
fsid	fsid-base	years	gender	age	education
facehbi_0001	flong_0001	0	Female	71	5
f2cehbi_0001	flong_0001	2.1	Female	73	5
f5cehbi_0001	flong_0001	3	Female	76	5
facehbi_0005	flong_0005	0	Male	68	9
f2cehbi_0004	flong_0005	2	Male	70	9
f5cehbi_0002	flong_0005	3	Male	73	9
facehbi_0006	flong_0006	0	Female	64	7
f2cehbi_0005	flong_0006	2.1	Female	66	7
f5cehbi_0003	flong_0006	3	Female	69	7
[osotolongo@brick03 long_facehbi_v025]$ long_qdec_table --qdec qdec0.table --cross --out cross.qdec0.table
Parsing the qdec table: qdec0.table
Collapsing the qdec table to CROSS 
Writing the qdec table: cross.qdec0.table
[osotolongo@brick03 long_facehbi_v025]$ head cross.qdec0.table
fsid years gender age education
flong_0001 1.7 Female 73.33333333333333 5.0
flong_0005 1.6666666666666667 Male 70.33333333333333 9.0
flong_0006 1.7 Female 66.33333333333333 7.0
flong_0007 1.6666666666666667 Male 61.333333333333336 9.0
flong_0008 1.8333333333333333 Female 57.333333333333336 8.0
flong_0009 1.7 Female 69.33333333333333 8.0
flong_0010 1.7 Male 70.33333333333333 9.0
flong_0013 1.8 Female 71.33333333333333 5.0
flong_0014 1.7 Female 67.33333333333333 8.0

Bueno, puede ser interesante.

qdec2fsgd.sh
#!/bin/bash
ifile=$1
ofile=${ifile%.dat}'.fsgd'
echo "GroupDescriptorFile 1" > ${ofile}
echo "Title FACEHBI_LNG" >> ${ofile}
echo "Class Main" >> ${ofile}
long_qdec_table --qdec ${ifile} --cross --out tmp.cross
awk '{print "Input",$1,"Main",$2,$3,$4,$5,$6}' tmp.cross | sed 's/Input fsid Main/Variables/;s/Female/0/;s/Male/1/' >> ${ofile}
echo "0 0 0 0 0 1" > ${ifile%.dat}'.mtx'

y ahora conesto lo que hago es,

./qdec2fsgd.sh var9.qdec/qdec.table.dat
mri_glmfit --y lh.qdec0.thickness-spc.stack.fwhm10.mgh --fsgd var9.qdec/qdec.table.fsgd --C var9.qdec/qdec.table.mtx --surf fsaverage lh --glmdir var9.qdec/lh.cross.glmdir
mri_glmfit-sim --glmdir var9.qdec/lh.cross.glmdir --cache 3 neg --cwp 0.05 --2spaces

Testing first one

[osotolongo@brick03 throttle]$ ls -ltr
total 56
-rw-rw-r-- 1 osotolongo osotolongo 32797 May 22 10:45 longitudinal_lang_025.csv
-rw-rw-r-- 1 osotolongo osotolongo  3801 May 22 10:45 lmatch.csv
-rw-rw-r-- 1 osotolongo osotolongo  9967 May 22 10:45 flong.csv
-rwxrwxr-x 1 osotolongo osotolongo  2878 May 22 10:56 make_qdec.pl
[osotolongo@brick03 throttle]$ mkdir var9
[osotolongo@brick03 throttle]$ head -n 1 longitudinal_lang_025.csv | sed 's/,/\n/' | cat -n
     1	facehbi_id_fac
     2	LID,internal_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac
[osotolongo@brick03 throttle]$ head -n 1 longitudinal_lang_025.csv | sed 's/,/\n/g' | cat -n
     1	facehbi_id_fac
     2	LID
     3	internal_id_fac
     4	visit_fac
     5	first_visit_0_2
     6	age
     7	sex
     8	level_education_fac
     9	fe_letter_fluency_np
    10	fe_category_fluency_np
    11	fe_verb_fluency_np
    12	pyramid_palmtrees_images_fac
    13	pyramid_palmtrees_words_fac
    14	kissing_dancing_images_fac
    15	kissing_dancing_words_fac
    16	boston60_free_fac
    17	action_naming_free_fac
[osotolongo@brick03 throttle]$ ./make_qdec.pl fe_letter_fluency_np > var9/qdec.table
[osotolongo@brick03 throttle]$ head var9/qdec.table
fsid	fsid-base	years	gender	age	education	fe_letter_fluency_np
facehbi_0001	flong_0001	0	Female	71	5	22
f2cehbi_0001	flong_0001	2.1	Female	73	5	20
f5cehbi_0001	flong_0001	3	Female	76	5	24
facehbi_0005	flong_0005	0	Male	68	9	22
f2cehbi_0004	flong_0005	2	Male	70	9	20
f5cehbi_0002	flong_0005	3	Male	73	9	23
facehbi_0006	flong_0006	0	Female	64	7	24
f2cehbi_0005	flong_0006	2.1	Female	66	7	23
f5cehbi_0003	flong_0006	3	Female	69	7	18
[osotolongo@brick03 throttle]$ ./qdec2fsgd.sh var9/qdec.table 
[osotolongo@brick03 throttle]$ head var9/qdec.table.fsgd
GroupDescriptorFile 1
Title FACEHBI_LNG
Class Main
Variables years gender age education fe_letter_fluency_np
Input flong_0001 Main 1.7 0 73.33333333333333 5.0 22.0
Input flong_0005 Main 1.6666666666666667 1 70.33333333333333 9.0 21.666666666666668
Input flong_0006 Main 1.7 0 66.33333333333333 7.0 21.666666666666668
Input flong_0007 Main 1.6666666666666667 1 61.333333333333336 9.0 25.333333333333332
Input flong_0008 Main 1.8333333333333333 0 57.333333333333336 8.0 15.666666666666666
Input flong_0009 Main 1.7 0 69.33333333333333 8.0 20.333333333333332
[osotolongo@brick03 throttle]$ cat var9/qdec.table.mtx
0 0 0 0 0 1
dospc.sh
#!/bin/bash
var=$1
# everybody look to their left
long_mris_slopes --qdec var${var}/qdec.table --meas thickness --hemi lh --do-spc --do-label --stack-spc var${var}/lh.qdec.thickness-spc.stack.mgh --isec-labels var${var}/lh.qdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/
mri_glmfit --y var${var}/lh.qdec.thickness-spc.stack.mgh --fsgd var${var}/qdec.table.fsgd --C var${var}/qdec.table.mtx --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --surf fsaverage lh
mri_glmfit-sim --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir var${var}/lh.qdec.thickness-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces
# everybody look to their right
long_mris_slopes --qdec var${var}/qdec.table --meas thickness --hemi rh --do-spc --do-label --stack-spc var${var}/rh.qdec.thickness-spc.stack.mgh --isec-labels var${var}/rh.qdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/
mri_glmfit --y var${var}/rh.qdec.thickness-spc.stack.mgh --fsgd var${var}/qdec.table.fsgd --C var${var}/qdec.table.mtx --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --surf fsaverage rh
mri_glmfit-sim --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir var${var}/rh.qdec.thickness-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces
# Can you feel that? Yeah
grep -v "^#" var${var}/*h.qdec.thickness-spc.fwhm10/*/*.summary
[osotolongo@brick03 throttle]$ ./dospc.sh 9
... 
...
...
var9/rh.qdec.thickness-spc.fwhm10/qdec.table/cache.th30.pos.sig.cluster.summary:   1        4.7058   44663     10.70     45.8    8.5  -33.1  0.03292  0.02977  0.03607     14       51.01  middletemporal

Volumen cortical

Para calcular esto mismo pero segun el volumen tengo que sacar el eitv.

[osotolongo@brick03 throttle]$ for y in $(seq 9 17); do for x in $(grep "Input" var${y}/qdec.table.fsgd | awk '{print $2}'); do v=$(mri_segstats --subject ${x} --etiv-only | grep "mm\^3" | awk '{print $4}'); echo "${x} ${v}"; done > var${y}/eitv.tsv; done
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do cd var${x}; Rscript ../getzicv.r; cd ..; done
[osotolongo@brick03 throttle]$ head -n 3 var9/qdec.table.fsgd > headers.txt
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do tail -n +5 var${x}/qdec.table.fsgd > var${x}/tmp_fsgd; done
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do join -1 2 -2 1 var${x}/tmp_fsgd var${x}/zeitv.tsv | awk -F " Input " '{print "Input",$1,$2 }' > var${x}/tmp_full; done
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do labs=$(head -n 4 var${x}/qdec.table.fsgd | tail -n 1); c=$(echo "${labs} zEITV"); cat headers.txt > var${x}/vqdec.table.fsgd; echo ${c} >>  var${x}/vqdec.table.fsgd; cat  var${x}/tmp_full >> var${x}/vqdec.table.fsgd; done
[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do echo "0 0 0 0 0 1 0" > var${x}/vqdec.table.mtx; done
dovpsc.sh
#!/bin/bash
var=$1
# everybody look to their left
long_mris_slopes --qdec var${var}/qdec.table --meas volume --hemi lh --do-spc --do-label --stack-spc var${var}/lh.qdec.volume-spc.stack.mgh --isec-labels var${var}/lh.vqdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/
mri_glmfit --y var${var}/lh.qdec.volume-spc.stack.mgh --fsgd var${var}/vqdec.table.fsgd --C var${var}/vqdec.table.mtx --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --surf fsaverage lh 
mri_glmfit-sim --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir var${var}/lh.qdec.volume-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces
# everybody look to their right
long_mris_slopes --qdec var${var}/qdec.table --meas volume --hemi rh --do-spc --do-label --stack-spc var${var}/rh.qdec.volume-spc.stack.mgh --isec-labels var${var}/rh.vqdec.fsaverage.cortex.label --time years --qcache fsaverage --sd /old_nas/subjects/
mri_glmfit --y var${var}/rh.qdec.volume-spc.stack.mgh --fsgd var${var}/vqdec.table.fsgd --C var${var}/vqdec.table.mtx --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --surf fsaverage rh
mri_glmfit-sim --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --cache 3 pos --cwp 0.05 --2spaces 
mri_glmfit-sim --glmdir var${var}/rh.qdec.volume-spc.fwhm10 --cache 3 neg --cwp 0.05 --2spaces 
# Can you feel that? Yeah
grep -v "^#" var${var}/*h.qdec.volume-spc.fwhm10/*/*.summary

y

[osotolongo@brick03 throttle]$ for x in $(seq 9 17); do ./dovspc.sh ${x}; done

Investigando variables

El primer paso es siempre bajar la macrotabla mas actualizada de FACEHBI. Esto lo limpiamos con una de las tools del pipeline,

[osotolongo@brick03 extended]$ perl ~/acenip/tools/xls2csv.pl FACEHBI\ macrotabla\ v0123456\ 2022-12-22\ version\ 1.xlsx > macrotabla.csv

Ahora, para encontrar las variables que deseamos, hacemos,

[osotolongo@brick03 extended]$ head -n 1 macrotabla.csv | sed 's/,/\n/g' | cat -n > fields.txt

y guardamos los nombres de variable en un archivo. Por ahora lo que queremos son unas pocas,

[osotolongo@brick03 extended]$ awk -F',' 'BEGIN { OFS = ","} {print $1,$2,$3,$4,$5,$6,$36,$239,$240,$241,$302,$303,$312,$313,$304,$314,$278,$279,$506,$68,$489}' macrotabla.csv | grep -v "^,,,,,," > facehbi_sel_long_0to6.csv
[osotolongo@brick03 extended]$ head facehbi_sel_long_0to6.csv
internal_id_fac,facehbi_id_fac,visit_fac,first_visit_0_2,age,sex,level_education_fac,fe_letter_fluency_np,fe_category_fluency_np,fe_verb_fluency_np,pyramid_palmtrees_images_fac,pyramid_palmtrees_words_fac,kissing_dancing_images_fac,kissing_dancing_words_fac,boston60_free_fac,action_naming_free_fac,syndromic_diagnosis,primary_diagnosis,fbb_pet_centilod,scd_plus_apoe4_fac,apoe
20040516,F012,0,0,85,0,8,20,13,12,,,,,50,48,scd,scd,-24.765025349999998,0,e2e3
20040516,F012,1,0,86,0,,20,12,11,,,,,,,mci,amnestic possible mci,,,
20040516,F012,2,0,87,0,8,14,8,9,45,44,,,41,,mci,amnestic possible mci,,,
20050260,F166,0,0,78,0,4,14,12,9,50,52,48,48,55,50,scd,scd,-2.0989695859999999,0,e3e3
20050260,F166,1,0,79,0,,5,13,10,,,,,,,mci,non-amnestic possible mci,,,
20050260,F166,2,0,80,0,4,10,15,6,52,52,48,52,50,49,mci,non-amnestic possible mci,-3.4808662379999999,,
20050260,F166,3,0,81,0,,11,13,13,,,,,,,mci,non-amnestic possible mci,,,
20050260,F166,4,0,82,0,,8,9,5,,,,,,,mci,amnestic possible mci,,,
20050260,F166,5,0,83,0,,8,10,6,50,50,,,51,38,mci,non-amnestic mci,11.428131260000001,,

Ahora, tengo que sacar los que convierten (por ejemplo) y los q no.

$ grep 'mci' facehbi_sel_long_0to6.csv | awk -F',' '{print $2}' | sort |uniq > conversors.list
$ while read s; do grep "${s}" facehbi_sel_long_0to6.csv; done < conversors.list > flong_conversors.csv
$ grep -v "`cat conversors.list`" facehbi_sel_long_0to6.csv > flong_non_conversors.csv

y una vez esto hecho puedo echar un vistazo,

explore_dx.rmd
---
title: longitudinal NP data by DX
output: html_document
---
 
Lo que hemos hecho es seleccionar algunas variables (NP) y dividir segun la conversion de diagnostico sindromico en dos archivos (los que hacen scd-mci en algun momento son conversores)
 
## Plots
 
```{r message = FALSE, warning = FALSE}
library(ggplot2)
fc <- read.csv("flong_conversors.csv")
fnc <- read.csv("flong_non_conversors.csv")
langs <- c("fe_letter_fluency_np", "fe_category_fluency_np", "fe_verb_fluency_np", "pyramid_palmtrees_images_fac", "pyramid_palmtrees_words_fac", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "boston60_free_fac", "action_naming_free_fac")
 
for (lang in langs) {
        ggplot(fnc, aes(x = .data[[lang]], fill = factor(visit_fac))) + geom_histogram(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p
        p + labs(title="No converters") -> p
        print(p)
        ggplot(fc, aes(x = .data[[lang]], fill = factor(visit_fac))) + geom_histogram(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p
        p + ggtitle("SCD -> MCI converters") -> p
        print(p)
        ggplot(fnc, aes(x = visit_fac, y = .data[[lang]], fill = factor(visit_fac))) + geom_boxplot(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p
        p + ggtitle("No converters") -> p
        print(p)
        ggplot(fc, aes(x = visit_fac, y = .data[[lang]], fill = factor(visit_fac))) + geom_boxplot(colour = "black", lwd = 0.75, linetype= 1, position = "identity", alpha = 0.75) -> p
        p + ggtitle("SCD -> MCI converters") -> p
        print(p)
        ggplot(fnc, aes(x = visit_fac, y = .data[[lang]], color = facehbi_id_fac)) + geom_line() + stat_summary(aes(group = 1), geom = "line", fun.y = median, size = 2, color = "red") + theme(legend.position="none") -> p
        p + ggtitle("SCD -> MCI converters") -> p
        print(p)
        ggplot(fc, aes(x = visit_fac, y = .data[[lang]], color = facehbi_id_fac)) + geom_line() + stat_summary(aes(group = 1), geom = "line", fun.y = median, size = 2, color = "red") + theme(legend.position="none") -> p
        p + ggtitle("SCD -> MCI converters") -> p
        print(p)
}
```

Si lo que quiero es sacar los que convierten a amiloide positivo,

[osotolongo@brick03 extended]$ tail -n +2 facehbi_sel_long_0to6.csv | awk -F',' '{if ($19 && $19 > 20) print $2}' | sort | uniq > amiloid_pos.list
[osotolongo@brick03 extended]$ while read s; do grep "${s}" facehbi_sel_long_0to6.csv; done < amiloid_pos.list > flong_pos.csv
[osotolongo@brick03 extended]$ head -n 1 facehbi_sel_long_0to6.csv > headers.txt
[osotolongo@brick03 extended]$ cat headers.txt flong_pos_nohead.csv > flong_pos.csv
[osotolongo@brick03 extended]$ grep -v "`cat amiloid_pos.list`" facehbi_sel_long_0to6.csv > flong_neg.csv

OK, ahora deberia correr algun modelo que me diga si hay diferencias entre las dos categorias.

Longitudinal mixed effects model

> fc <- read.csv("flong_conversors_fixed.csv")
> a <- lme(fe_letter_fluency_np ~ age + sex + level_education_fac, data = fc, random = ~ age | facehbi_id_fac, method="ML")
> summary(a)
Linear mixed-effects model fit by maximum likelihood
  Data: fc 
       AIC      BIC    logLik
  2335.434 2368.055 -1159.717
 
Random effects:
 Formula: ~age | facehbi_id_fac
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 3.3007040745 (Intr)
age         0.0000334666 0     
Residual    2.8988033948       
 
Fixed effects:  fe_letter_fluency_np ~ age + sex + level_education_fac 
                        Value Std.Error  DF   t-value p-value
(Intercept)         16.588978  3.942268 364  4.207978  0.0000
age                 -0.074779  0.047785 364 -1.564904  0.1185
sex                  0.933704  0.894277  68  1.044088  0.3001
level_education_fac  0.787147  0.243211  68  3.236476  0.0019
 Correlation: 
                    (Intr) age    sex   
age                 -0.904              
sex                 -0.125  0.165       
level_education_fac -0.458  0.060 -0.261
 
Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.74802315 -0.65308936 -0.02842913  0.61876952  2.52676949 
 
Number of Observations: 436
Number of Groups: 71 

Prediccion a largo plazo

La variable X en baseline, predice algun cambio a los 5 años en el diagnostico?

[osotolongo@brick03 extended]$ awk -F',' '{if ($3=="5") print}' facehbi_sel_long_0to6.csv > facehbi_v5.csv
[osotolongo@brick03 extended]$ awk -F',' '{if ($3==$4) print}' facehbi_sel_long_0to6.csv > facehbi_first.csv
[osotolongo@brick03 extended]$ head -n 1 facehbi_sel_long_0to6.csv > facehbi_headers.csv
[osotolongo@brick03 extended]$ cat facehbi_headers.csv facehbi_first.csv > facehbi_first_h.csv
[osotolongo@brick03 extended]$ cat facehbi_headers.csv facehbi_v5.csv > facehbi_v5_h.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1","$17","$19}' facehbi_v5_h.csv > dxv5.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1","$5","$6","$7","$10","$13","$14","$16","$19}' facehbi_first_h.csv > baseline.csv
[osotolongo@brick03 extended]$ join -t, baseline.csv dxv5.csv > facehbi_response.csv

Basic shit: comparar dos grupos

> setwd("/nas/osotolongo/long_facehbi_v025/extended")
> f <- read.csv("facehbi_response.csv")
> library(ggplot2)
> ggplot(f, aes(x=syndromic_diagnosis, y=kissing_dancing_images_fac)) + stat_boxplot(geom = "errorbar", width = 0.25) + geom_boxplot()
explore_response.rmd
---
title: Prediccion a largo plazo de las variables de verbos
output: html_document
---
 
```{r warning = FALSE}
date() 
```
 
Lo que hemos hecho es seleccionar algunas variables (NP) que indican la fluencia verbal y ver si hay algun cambio significativo en el diagnostico en la visita 5
 
### Plots
 
```{r message = FALSE, warning = FALSE}
library(ggplot2) 
fc <- read.csv("facehbi_response.csv") 
langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac") 
for (lang in langs) {         
	out_fig <- paste0("response_",lang, ".png")         
	ggplot(fc, aes(x = syndromic_diagnosis, y = .data[[lang]], fill = syndromic_diagnosis)) + stat_boxplot(geom = "errorbar", width = 0.25) + geom_boxplot() -> p         
	print(p)
}
```
 
Ya que parece que hay diferencias significativas, vamos a ver que dice la estadistica,
 
### Stats
 
```{r message = FALSE, warning = FALSE}
fc <- read.csv("facehbi_response.csv") 
langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac") 
for (lang in langs) {
  a <- t.test(as.formula(paste(lang, '~ factor(syndromic_diagnosis)')), data = fc, na.action = "na.omit")
  print(lang) 
  print(a)
}
```

Ver: https://fundacioace-my.sharepoint.com/:u:/g/personal/osotolongo_fundacioace_org/EY-hkq7VtPJAiSctDhiYOcUBKJuWKpt24il-k6SJL9Ctug?e=Fllo6u

Diferencia variable entre visitas

Un problema añadido de los datos es que algunos sujetos empiezan en la visita 0 y otros en la visita 2. Por esto es importante incluir la diferencia temporal entre visitas como covariable.

esto es mas o menos sencillo. Primero determinar cuale es la primera visita,

[osotolongo@brick03 extended]$ awk -F',' '{if ($3==$4) print}' facehbi_sel_long_0to6.csv | awk -F',' '{print $1","$2","$4}' > first_visit.csv

y ahora determinar en XNAT las fechas de cada MRI y restarlas. La complejidad es unicamente determinar cual de los proyectos hay que interrogar en cada caso,

get_dates.sh
#!/bin/bash
echo "internal_id_fac,ydiff"
while read -r line; do
        nhc=$(echo ${line} | awk -F',' '{print $1}')
        sbj=$(echo ${line} | awk -F',' '{print $2}')
        visit=$(echo ${line} | awk -F',' '{print $3}')
        if [[ ${visit} == '0' ]]; then
                date0=$(xnatapic list_experiments --project_id facehbi --subject_id ${sbj} --modality MRI --date | sed 's/.*,//')
        else
                date0=$(xnatapic list_experiments --project_id f2cehbi --subject_id ${sbj} --modality MRI --date | sed 's/.*,//')
        fi
        date5=$(xnatapic list_experiments --project_id f5cehbi --subject_id ${sbj} --modality MRI --date 2>/dev/null| sed 's/.*,//')
        if [[ ! -z ${date5} ]]; then
                ddif=$(( ($(date --date=${date5} +%s) - $(date --date=${date0} +%s))/(60*60*24) ))
                ydif=$(printf "%.2f\n" $(echo "${ddif}/365" | bc -l))
                if [[ $(bc <<< "${ydif} > 0.1") > 0 ]]; then
                        echo "${nhc},${ydif}"
                fi
        fi
done < first_visit.csv

y entonces,

[osotolongo@brick03 extended]$ ./get_dates.sh > facehbi_ydiff.csv
[osotolongo@brick03 extended]$ head facehbi_ydiff.csv
internal_id_fac,ydiff
20050260,5.08
20060090,5.15
20070072,5.04
20070303,5.41
20080337,5.31
20080565,5.23
20080716,4.98
20081224,5.01
20090481,5.10

FSGA

Basicamente, tengo los datos para el analisis.

[osotolongo@brick03 extended]$ join -t, facehbi_response.csv facehbi_ydiff.csv > fsga_data.csv
[osotolongo@brick03 extended]$ head -n 1 fsga_data.csv | sed 's/,/\n/g' | cat -n
     1	internal_id_fac
     2	age
     3	sex
     4	level_education_fac
     5	fe_verb_fluency_np
     6	kissing_dancing_images_fac
     7	kissing_dancing_words_fac
     8	action_naming_free_fac
     9	fbb_pet_centilod
    10	syndromic_diagnosis
    11	fbb_pet_centilod
    12	ydiff

Ahora debo ordenar los sujetos en directorios de Freesurfer, trayendo los directrios de XNAT. Tengo nombre de sujeto y visita, asi que es bastante sencillo utilizando la API,

#!/bin/bash
 
data='fsga_data.csv'
v='first_visit.csv'
 
for x in  $(tail -n +2 ${data} | awk -F',' '{print $1}'); do
        l=$(grep ${x} ${v})
        sbj=$(echo ${l} | awk -F',' '{print $2}')
        visit=$(echo ${l} | awk -F',' '{print $3}')
        mkdir ${SUBJECTS_DIR}/${x}
        if [[ ${visit} == '0' ]]; then
                xp=$(xnatapic list_experiments --project_id facehbi --subject_id ${sbj} --modality MRI)
        else
                xp=$(xnatapic list_experiments --project_id f2cehbi --subject_id ${sbj} --modality MRI)
        fi
        stmp=$(mktemp -d)
        stgz=$(xnatapic get_fsresults --experiment_id ${xp} --all-tgz ${stmp})
        tar xzf ${stmp}/${stgz}  --strip-components=1 -C ${SUBJECTS_DIR}/${x} --exclude='fsaverage'
        rm -rf ${stmp}/
        echo "${sbj} --> ${xp} --> ${SUBJECTS_DIR}/${x}"
done

y

[osotolongo@brick03 fsga]$ ./getfs.sh 
F166 --> XNAT4_E00321 --> /old_nas/subjects/20050260
F146 --> XNAT4_E00301 --> /old_nas/subjects/20060090
F161 --> XNAT4_E00316 --> /old_nas/subjects/20070072
F013 --> XNAT4_E00174 --> /old_nas/subjects/20070303
F070 --> XNAT4_E00227 --> /old_nas/subjects/20080337
F069 --> XNAT4_E00226 --> /old_nas/subjects/20080565
F007 --> XNAT4_E00168 --> /old_nas/subjects/20080716
F151 --> XNAT4_E00306 --> /old_nas/subjects/20081224
F181 --> XNAT4_E00335 --> /old_nas/subjects/20090481
F001 --> XNAT4_E00162 --> /old_nas/subjects/20090806
...
...

Una ves tengo los directorios de FS, sacar el ICV de los sujetos es simple. Ademas, podemos normalizarlos rapidamente usando R,

geticv.sh
#!/bin/bash
 
data='fsga_data.csv'
tdf=$(mktemp)
echo "internal_id_fac,icv" > ${tdf}
for x in $(tail -n +2 ${data} | awk -F',' '{print $1}'); do
        icv=$(grep EstimatedTotalIntraCranialVol /old_nas/subjects/${x}/stats/aseg.stats | awk -F', ' '{print $4}')
        echo "${x},${icv}" >> ${tdf}
done
 
echo "read.csv('${tdf}') -> x" > convert.r
echo "x\$zICV <- (x\$icv - mean(x\$icv))/sd(x\$icv)" >> convert.r
echo "xw <- x[c(1,3)]" >> convert.r
echo "write.csv(xw, file='icvs.csv', row.names=FALSE, quote=FALSE)" >> convert.r
 
Rscript convert.r

asi que lo añadimos a los datos,

[osotolongo@brick03 fsga]$ join -t, fsga_data.csv icvs.csv > fsga_data_wicv.csv

Antes de ejecutar el analisis.


Ahora, basado en estos datos vamos a fabricar los archivos fsgd. Como he de procesar diferentes variables, voy a hacer unos templates, primero para montar los archivos fsgd,

organize.sh
#!/bin/bash
mkdir fsga_<code>
awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv
awk -F',' '{print "Input "$1" Main",$2,$3,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv
echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt
echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt
echo 'Class Main' >> fsga_<code>/headers_<code>.txt
echo '0 0 0 0 1' > fsga_<code>/<code>.mtx
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd
awk -F',' '{print "Input "$1" Main",$2,$3,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd
echo '0 0 0 0 1 0' > fsga_<code>/<code>v.mtx

y luego para ejecutarlos pasos de cada analisis,

runner.sh
#!/bin/bash
## area
mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.area.10.mgh
mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.area.10.mgh
mri_glmfit --y lh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>a.glmdir
mri_glmfit --y rh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>a.glmdir
mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces
## thickness
mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.thickness.10.mgh
mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.thickness.10.mgh
mri_glmfit --y lh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage lh --glmdir lh.<code>.glmdir
mri_glmfit --y rh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage rh --glmdir rh.<code>.glmdir
mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces
## volume
mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.volume.10.mgh
mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.volume.10.mgh
mri_glmfit --y lh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>v.glmdir
mri_glmfit --y rh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>v.glmdir
mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces

Finalmente, tomo cada template y monto los archivos que he de ejecutar para cada variable, segun la columna que ocupa en el archivo de datos,

run_all.sh
#!/bin/bash
code=$1
shift
sed "s/<code>/${code}/g" templates/organize.sh > org${code}.sh
chmod +x org${code}.sh
./org${code}.sh
rm org${code}.sh
sed "s/<code>/${code}/g" templates/runner.sh > fsga_${code}/run.sh
cd fsga_${code}/
chmod +x run.sh
./run.sh
cd ..
grep -v "^#" fsga_${code}/*glmdir/${code}*/*sig.cluster.summary

y si hago,

[osotolongo@brick01 fsga]$ ./run_all.sh 5

montare los archivos fsgd,

[osotolongo@brick01 fsga]$ head fsga_5/5.fsgd 
GroupDescriptorFile 1
Title FACEBI_verbs_5
Class Main
Variables Age Gender Education fe_verb_fluency_np
Input 20050260 Main 78 0 4 9
Input 20060090 Main 70 0 9 22
Input 20070072 Main 76 1 8 9
Input 20070303 Main 69 0 5 12
Input 20080337 Main 75 1 9 17
Input 20080565 Main 73 0 5 12

y ejecutare los analisis,

[osotolongo@brick01 fsga]$ ls fsga_5/*glmdir/
fsga_5/lh.5a.glmdir/:
5v        cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat
 
fsga_5/lh.5.glmdir/:
5         cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat
 
fsga_5/lh.5v.glmdir/:
5v        cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat
 
fsga_5/rh.5a.glmdir/:
5v        cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat
 
fsga_5/rh.5.glmdir/:
5         cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat
 
fsga_5/rh.5v.glmdir/:
5v        cache.mri_glmfit-sim.log  dof.dat   mask.mgh        rstd.mgh  sar1.mgh  Xg.dat  y.fsgd
beta.mgh  csd                       fwhm.dat  mri_glmfit.log  rvar.mgh  surface   X.mat

para la variable fe_verb_fluency_np que ocupa la quinta columna en el archivo de datos.

Esto lo puedo automatizar si hago dos templates,

organize.sh
#!/bin/bash
mkdir fsga_<code>
awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv
awk -F',' '{print "Input "$1" Main",$2,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv
echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt
echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt
echo 'Class Main' >> fsga_<code>/headers_<code>.txt
echo '0 0 0 1' > fsga_<code>/<code>.mtx
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd
awk -F',' '{print "Input "$1" Main",$2,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd
echo '0 0 0 1 0' > fsga_<code>/<code>v.mtx
runner.sh
#!/bin/bash
mkdir fsga_<code>
awk -F',' '{if($<code>) print}' fsga_data_swicv.csv > fsga_<code>/fsga_data.csv
awk -F',' '{print "Input "$1" Main",$2,$4,$<code>}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>_body.csv
echo 'GroupDescriptorFile 1' > fsga_<code>/headers_<code>.txt
echo 'Title FACEBI_verbs_<code>' >> fsga_<code>/headers_<code>.txt
echo 'Class Main' >> fsga_<code>/headers_<code>.txt
echo '0 0 0 1' > fsga_<code>/<code>.mtx
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>_body.csv > fsga_<code>/<code>.fsgd
awk -F',' '{print "Input "$1" Main",$2,$4,$<code>,$13}' fsga_<code>/fsga_data.csv | sed 's/Input internal_id_fac Main/Variables/;s/ydiff/Years/;s/age/Age/;s/sex/Gender/;s/level_education_fac/Education/' > fsga_<code>/<code>v_body.csv
cat fsga_<code>/headers_<code>.txt fsga_<code>/<code>v_body.csv > fsga_<code>/<code>v.fsgd
echo '0 0 0 1 0' > fsga_<code>/<code>v.mtx
 
[osotolongo@brick03 fsga]$ cat templates/runner.sh
#!/bin/bash
## area
mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.area.10.mgh
mris_preproc --fsgd <code>v.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.area.10.mgh
mri_glmfit --y lh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>a.glmdir
mri_glmfit --y rh.<code>.area.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>a.glmdir
mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>a.glmdir --cache 3 pos --cwp 0.05 --2spaces
## thickness
mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.thickness.10.mgh
mris_preproc --fsgd <code>.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.thickness.10.mgh
mri_glmfit --y lh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage lh --glmdir lh.<code>.glmdir
mri_glmfit --y rh.<code>.thickness.10.mgh --fsgd <code>.fsgd --C <code>.mtx --surf fsaverage rh --glmdir rh.<code>.glmdir
mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>.glmdir --cache 3 pos --cwp 0.05 --2spaces
## volume
mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.<code>.volume.10.mgh
mris_preproc --fsgd <code>v.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.<code>.volume.10.mgh
mri_glmfit --y lh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage lh --glmdir lh.<code>v.glmdir
mri_glmfit --y rh.<code>.volume.10.mgh --fsgd <code>v.fsgd --C <code>v.mtx --surf fsaverage rh --glmdir rh.<code>v.glmdir
mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir lh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 neg --cwp 0.05 --2spaces
mri_glmfit-sim --glmdir rh.<code>v.glmdir --cache 3 pos --cwp 0.05 --2spaces

y un script que escoja las variables para escribir y ejecutar en cada caso todos los analisis

run_all.sh
#!/bin/bash
code=$1
shift
sed "s/<code>/${code}/g" templates/organize.sh > org${code}.sh
chmod +x org${code}.sh
./org${code}.sh
rm org${code}.sh
sed "s/<code>/${code}/g" templates/runner.sh > fsga_${code}/run.sh
cd fsga_${code}/
chmod +x run.sh
./run.sh
cd ..
grep -v "^#" fsga_${code}/*glmdir/${code}*/*sig.cluster.summary

Esto lo ejecuto para cada una de las variables que quiera,

[osotolongo@brick03 fsga]$ ./run_all.sh 5
...
...
...
[osotolongo@brick03 fsga]$ ./run_all.sh 7
...
...
...

Guiandome por el header deberia ser simple,

[osotolongo@brick03 fsga]$ head  -n 1 fsga_data_swicv.csv | sed 's/,/\n/g' | cat -n
     1	internal_id_fac
     2	age
     3	sex
     4	level_education_fac
     5	fe_verb_fluency_np
     6	kissing_dancing_images_fac
     7	kissing_dancing_words_fac
     8	boston60_free_fac
     9	action_naming_free_fac
    10	fbb_pet_centilod
    11	syndromic_diagnosis
    12	fbb_pet_centilod
    13	ydiff
    14	zICV

Desgraciadamente, despues de ejecutarlo para cada una de las variables veo que los cluster no sobreviven a la correcion por comparaciones multiples.

Ejemplo, fe_verb_fluency_np

Subcortical

Voy a mirar las variables contra los cambios subcorticales. Asi que necesito sacar los valores de las aseg.stats,

[osotolongo@brick03 extended]$ a=$(cat fsga/fsga_mri.csv | tr '\n' ',' | sed 's/,/ -s /g;s/-s $//')
[osotolongo@brick03 extended]$ asegstats2table -s ${a} --meas volume --tablefile fsga/aseg_stats.txt

Este archivo lo uno con el que he estado usando para hacer el FSGA,

library(readr)
fs <- read_tsv("fsga/aseg_stats.txt", show_col_types = FALSE)
colnames(fs)[1] <- 'internal_id_fac'
fv <- read.csv("fsga/fsga_cs.csv")
merge(fs, fv, by = 'internal_id_fac') -> s

y ahora puedo ver si hay relacion entre los volumenes subcorticales en la visita 5 y los test de verbos en baseline,

subs <- c("Hippocampus", "Amygdala")
langs <- c("fe_verb_fluency_np", "kissing_dancing_images_fac", "kissing_dancing_words_fac", "action_naming_free_fac")
for (sub in subs){
        s$"tmp" <- eval(parse(text = paste0('s$"Left-',sub,'" + s$"Right-',sub,'"')))
        a <- lm(s$"tmp" ~ s$"EstimatedTotalIntraCranialVol")
        s$"adj" <- s$"tmp" - a$coefficients[2] * (s$"EstimatedTotalIntraCranialVol" - mean(s$"EstimatedTotalIntraCranialVol", na.rm = T))
        for (lang in langs) {
                m <- lm(as.formula(paste0('s$adj ~ s$age + s$level_education_fac + s$', lang)))
                print(paste(sub, "~", lang))
                print(summary(m))
        }
}

Lo que nos interesa en particular es el hipocampo,

## [1] "Hippocampus ~ fe_verb_fluency_np"
## 
## Call:
## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", 
##     lang)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1963.2  -388.4     4.9   352.1  2020.0 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           10212.612    626.877  16.291  < 2e-16 ***
## s$age                   -43.924      7.772  -5.652 1.02e-07 ***
## s$level_education_fac   -63.960     33.030  -1.936  0.05507 .  
## s$fe_verb_fluency_np     26.034      9.672   2.692  0.00808 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 598.5 on 125 degrees of freedom
## Multiple R-squared:  0.2855, Adjusted R-squared:  0.2684 
## F-statistic: 16.65 on 3 and 125 DF,  p-value: 3.667e-09
## 
## [1] "Hippocampus ~ kissing_dancing_images_fac"
## 
## Call:
## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", 
##     lang)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1851.83  -314.39    48.22   385.69  2081.59 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9082.65    1872.35   4.851 6.76e-06 ***
## s$age                          -46.50      10.14  -4.587 1.82e-05 ***
## s$level_education_fac          -10.95      41.44  -0.264    0.792    
## s$kissing_dancing_images_fac    29.18      34.05   0.857    0.394    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 629.6 on 73 degrees of freedom
##   (52 observations deleted due to missingness)
## Multiple R-squared:  0.2524, Adjusted R-squared:  0.2216 
## F-statistic: 8.214 on 3 and 73 DF,  p-value: 8.809e-05
## 
## [1] "Hippocampus ~ kissing_dancing_words_fac"
## 
## Call:
## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", 
##     lang)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1620.36  -333.88   -14.26   358.11  1968.13 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  6003.57    2356.73   2.547   0.0132 *  
## s$age                         -48.69       9.99  -4.873 7.04e-06 ***
## s$level_education_fac         -18.41      41.56  -0.443   0.6593    
## s$kissing_dancing_words_fac    95.09      44.81   2.122   0.0375 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 612 on 67 degrees of freedom
##   (58 observations deleted due to missingness)
## Multiple R-squared:  0.3243, Adjusted R-squared:  0.2941 
## F-statistic: 10.72 on 3 and 67 DF,  p-value: 7.654e-06
## 
## [1] "Hippocampus ~ action_naming_free_fac"
## 
## Call:
## lm(formula = as.formula(paste0("s$adj ~ s$age + s$level_education_fac + s$", 
##     lang)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2024.53  -357.40    13.68   362.28  2128.60 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              10210.515   1040.783   9.810  < 2e-16 ***
## s$age                      -47.619      8.116  -5.868 4.14e-08 ***
## s$level_education_fac      -42.310     33.791  -1.252    0.213    
## s$action_naming_free_fac    12.385     15.673   0.790    0.431    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 625 on 118 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.244,  Adjusted R-squared:  0.2248 
## F-statistic:  12.7 on 3 and 118 DF,  p-value: 2.996e-07

y vemos como funciona el modelo para cada variable,

Variable $R^{2}$ p-value
fe_verb_fluency_np 0.2684 0.00808
kissing_dancing_images_fac 0.2216 0.394
kissing_dancing_words_fac 0.294 0.0375
action_naming_free_fac 0.2248 0.431

y nos muestra que debemos prestar especial atencion a la variable fe_verb_fluency_np.

y/o cortical

Bueno, esto que he hecho con el volumen subcortical tambien puedo hacerlo con algun area del cortex en especifico. Pongamos por caso, el enthorinal cortex,

[osotolongo@brick03 extended]$ a=$(cat fsga/fsga_mri.csv | tr '\n' ',' | sed 's/,/ -s /g;s/-s $//')
[osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m volume -t fsga/aparc_lh_vol_stats.txt --hemi lh
SUBJECTS_DIR : /old_nas/subjects
Parsing the .stats files
Building the table..
Writing the table to fsga/aparc_lh_vol_stats.txt
[osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m volume -t fsga/aparc_rh_vol_stats.txt --hemi rh
SUBJECTS_DIR : /old_nas/subjects
Parsing the .stats files
Building the table..
Writing the table to fsga/aparc_rh_vol_stats.txt
[osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m thickness -t fsga/aparc_lh_thick_stats.txt --hemi lh
SUBJECTS_DIR : /old_nas/subjects
Parsing the .stats files
Building the table..
Writing the table to fsga/aparc_lh_thick_stats.txt
[osotolongo@brick03 extended]$ aparcstats2table -s ${a} -m thickness -t fsga/aparc_rh_thick_stats.txt --hemi rh
SUBJECTS_DIR : /old_nas/subjects
Parsing the .stats files
Building the table..
Writing the table to fsga/aparc_rh_thick_stats.txt

Composites

Resumen

Todo esto lo he resumido en un archivo de Rmarkdown

ATN

Voy a intentar sacar el ATN de cada sujeto en a visita 5.

Primero,

[osotolongo@brick03 extended]$ awk -F ',' '{print $2","$1}' facehbi_sel_long_0to6.csv | uniq | tail -n +2 | sort -t, > facehbi.internos

Ahora,

[osotolongo@brick03 extended]$ xnat_pullfs.pl -s aseg -p f5cehbi -o base_aseg.csv
[osotolongo@brick03 extended]$ xnat_pullfs.pl -s aparc -p f5cehbi -o base_aparc.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_aseg.csv > tmp_aseg.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_aparc.csv > tmp_aparc.csv
[osotolongo@brick03 extended]$ join -t, tmp_aseg.csv tmp_aparc.csv | awk -F',' '{$1=$50=$51=""; print $0}' | sed 's/^ //;s/ /,/g;s/,,,/,/g' >  base_full.csv
[osotolongo@brick03 extended]$ (head -n 1 base_full.csv && tail -n +2 base_full.csv | sort -t,) > base_full_sorted.csv
[osotolongo@brick03 extended]$ xnat_get_age.pl -x f5cehbi
[osotolongo@brick03 extended]$ sort -t, -n f5cehbi_age_data.csv > f5cehbi_age_data_sorted.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' f5cehbi_age_data_sorted.csv > tmp_age.csv
[osotolongo@brick03 extended]$ awk -F',' '{print $1"_"$2","$0}' base_full_sorted.csv > tmp_base.csv
[osotolongo@brick03 extended]$ (head -n 1 tmp_age.csv && tail -n +2 tmp_age.csv | sort -t,) > tmp_age_sorted.csv
[osotolongo@brick03 extended]$ (head -n 1 tmp_base.csv && tail -n +2 tmp_base.csv | sort -t,) > tmp_base_sorted.csv
[osotolongo@brick03 extended]$ join -t, tmp_age_sorted.csv tmp_base_sorted.csv | awk -F',' '{$1=$5=$6=""; print $0}' | sed 's/^ //;s/ /,/g;s/,,,/,/g'> input_data.csv
[osotolongo@brick03 extended]$ Rscript nplus.r

Y tenemos la probabilidad de neurodegeneracion,

[osotolongo@brick03 extended]$ head classifier_output.csv
Subject_ID,Date,ND,Nprob
F001,2020-01-27,0,0.00457985332697753
F005,2020-02-03,0,0.0697559197037421
F006,2020-01-08,0,0.0250949466283251
F007,2019-12-10,0,1.8018003703755e-05
F008,2020-06-22,1,0.522450817937238
F009,2020-01-23,0,0.202625745362684
F010,2020-01-22,0,0.00476068613625532
F013,2020-06-13,1,0.550435980450324
F013,2020-06-13,1,0.550435980450324

El FBB en v5 es mas facil,

[osotolongo@brick03 extended]$ (head -n 1 facehbi_sel_long_0to6.csv | awk -F',' '{print "Subject_ID,"$1","$19}' && awk -F',' '{if ($3=="5" && $19) print $2","$1","$19}' facehbi_sel_long_0to6.csv | sort -t,) > v5_centiloid.csv

Tengo muy pocos LCR, asi que no lo puedo usar.

[osotolongo@brick03 extended]$ join -t, v5_centiloid.csv classifier_output.csv | uniq > got_an.csv
[osotolongo@brick03 extended]$ head got_an.csv 
Subject_ID,internal_id_fac,fbb_pet_centilod,Date,ND,Nprob
F001,20090806,5.995931423,2020-01-27,0,0.00457985332697753
F005,20141272,-1.8057342300000001,2020-02-03,0,0.0697559197037421
F007,20080716,108.1401346,2019-12-10,0,1.8018003703755e-05
F008,20131483,-4.0650127830000002,2020-06-22,1,0.522450817937238
F009,20141277,31.942835039999999,2020-01-23,0,0.202625745362684
F010,20141280,-13.788423529999999,2020-01-22,0,0.00476068613625532
F013,20070303,42.2375829,2020-06-13,1,0.550435980450324
F014,20100381,2.8114481339999999,2020-03-12,0,0.485388793964639
F015,20141087,-6.2935311299999999,2020-01-30,0,0.00990242072683091

Y haciendo un poco de logica saclos amiloide positivos y con neurodegeneracion,

[osotolongo@brick03 extended]$ awk -F',' '{if ($3>13.5 && $5) print $0",1"; else print $0",0"}' got_an.csv | sed 's/Nprob,1/Nprob,AN/' > an.csv
[osotolongo@brick03 extended]$ head an.csv
Subject_ID,internal_id_fac,fbb_pet_centilod,Date,ND,Nprob,AN
F001,20090806,5.995931423,2020-01-27,0,0.00457985332697753,0
F005,20141272,-1.8057342300000001,2020-02-03,0,0.0697559197037421,0
F007,20080716,108.1401346,2019-12-10,0,1.8018003703755e-05,0
F008,20131483,-4.0650127830000002,2020-06-22,1,0.522450817937238,0
F009,20141277,31.942835039999999,2020-01-23,0,0.202625745362684,0
F010,20141280,-13.788423529999999,2020-01-22,0,0.00476068613625532,0
F013,20070303,42.2375829,2020-06-13,1,0.550435980450324,1
F014,20100381,2.8114481339999999,2020-03-12,0,0.485388793964639,0
F015,20141087,-6.2935311299999999,2020-01-30,0,0.00990242072683091,0
neuroimagen/lang_flongv5.txt · Last modified: 2023/09/07 09:05 by osotolongo