Table of Contents
Analisis SBM de K&D en FACEHBI
He de hacer un analisis SBM con la variable kissing & dancing en los sujetos de FACEHBI. una complejidad añadida es que algunos sujetos son de la V0 y otros de la V2, por lo que me he de montar un proyecto nuevo para el analisis.
Sacando los datos
Lo primero que tengo que hacer es dividir el archivo de datos entre los que corresponden a V0 y V2.
[osotolongo@brick03 kissface]$ head -n 96 kissing_freesurfer.csv > kf0.csv [osotolongo@brick03 kissface]$ tail -n 31 kissing_freesurfer.csv > kf2.temp [osotolongo@brick03 kissface]$ head -n 1 kissing_freesurfer.csv > kheader.txt6 [osotolongo@brick03 kissface]$ cat kheader.txt6 kf2.temp > kf2.csv [osotolongo@brick03 kissface]$ rm kf2.temp kheader.txt6
Ahora saco los experimentos de XNAT
[osotolongo@brick03 kissface]$ for x in $(awk -F"," '{print $1}' kf0.csv | grep -v facehbi); do e=$(xnatapic list_experiments --project_id facehbi --subject_id ${x} --modality MRI); echo "${x},${e}"; done > kf0_experiments.csv [osotolongo@brick03 kissface]$ for x in $(awk -F"," '{print $1}' kf2.csv | grep -v facehbi); do e=$(xnatapic list_experiments --project_id f2cehbi --subject_id ${x} --modality MRI); echo "${x},${e}"; done > kf2_experiments.csv [osotolongo@brick03 kissface]$ head kf0_experiments.csv F113,XNAT4_E00269 F115,XNAT4_E00270 F116,XNAT4_E00271 F117,XNAT4_E00272 F118,XNAT4_E00273 F119,XNAT4_E00274 F120,XNAT4_E00275 F121,XNAT4_E00276 F122,XNAT4_E00277 F123,XNAT4_E00278 [osotolongo@brick03 kissface]$ head kf2_experiments.csv F216,XNAT5_E00259 F217,XNAT5_E00260 F218,XNAT5_E00261 F219,XNAT5_E00262 F220,XNAT5_E00263 F221,XNAT5_E00264 F222,XNAT5_E00265 F223,XNAT5_E00266 F224,XNAT5_E00267 F225,XNAT5_E00268 [osotolongo@brick03 kissface]$ wc -l kf0_experiments.csv kf2_experiments.csv 95 kf0_experiments.csv 30 kf2_experiments.csv 125 total
y ahora me bajo los resultados,
[osotolongo@brick03 kissface]$ mkdir tmp [osotolongo@brick03 kissface]$ for x in $(cat kf0_experiments.csv); do sbj=$(echo $x | awk -F"," '{print $1}'); xxp=$(echo $x | awk -F"," '{print $2}'); mkdir fsresults/${sbj}; xnatapic get_fsresults --experiment_id ${xxp} --all-tgz tmp/;tar xzvf tmp/*.tar.gz -C fsresults/${sbj}/ --exclude='fsaverage' --strip-components=1; rm tmp/*.tar.gz; done [osotolongo@brick03 kissface]$ for x in $(cat kf2_experiments.csv); do sbj=$(echo $x | awk -F"," '{print $1}'); xxp=$(echo $x | awk -F"," '{print $2}'); mkdir fsresults/${sbj}; xnatapic get_fsresults --experiment_id ${xxp} --all-tgz tmp/;tar xzvf tmp/*.tar.gz -C fsresults/${sbj}/ --exclude='fsaverage' --strip-components=1; rm tmp/*.tar.gz; done
Antes de seguir, comprobamos que este todo,
[osotolongo@brick03 kissface]$ for x in $(cat kf2_experiments.csv); do sbj=$(echo $x | awk -F"," '{print $1}'); xxp=$(echo $x | awk -F"," '{print $2}'); ls -d fsresults/${sbj}/mri; done | wc -l ls: cannot access fsresults/F216/mri: No such file or directory 29
y no deberia, pero alguno falta, estos los enviamos a procesar a ver que pasa,
[osotolongo@brick03 kissface]$ xnatapic run_pipeline --project_id f2cehbi --pipeline RunFreesurfer --experiment_id XNAT5_E00259 [osotolongo@brick03 kissface]$ squeue | grep xnat 163268 fast RunFrees xnat R 2:02 1 brick05
Montando el proyecto
[osotolongo@brick03 kissface]$ ls fsresults/ | cat -n | awk '{printf("%04d;%s\n",$1,$2)}' > kissface_mri.csv [osotolongo@brick03 kissface]$ for x in `cat kissface_mri.csv`; do trg=$(echo ${x} | awk -F";" '{print $2}'); dst=$(echo ${x} | awk -F";" '{print $1}'); mkdir /nas/data/subjects/kissface_${dst}; cp -r fsresults/${trg}/* /nas/data/subjects/kissface_${dst}/; done
Se supone que ahora ya lo tengo todo copiado en su sitio.
Armando la DB
[osotolongo@brick03 kissface]$ sed 's/;/,/' kissface_mri.csv | sed '1iSubject,PSubject' > kissface_codes.csv [osotolongo@brick03 kissface]$ sed 's/facehbi_id_fac/PSubject/;s/,$//' kissing_freesurfer.csv > kissing_clean.csv [osotolongo@brick03 kissface]$ for x in fsresults/*; do icv=$(grep "EstimatedTotalIntraCranialVol" ${x}/stats/aseg.stats | awk -F"," '{print $4}' | sed 's/^ //'); echo "${x},${icv}" | sed 's/fsresults\///'; done | sed '1iPSubject,EstimatedTotalIntraCranialVol' > kissface_raw_icv.csv [osotolongo@brick03 kissface]$ head kissface_raw_icv.csv PSubject,EstimatedTotalIntraCranialVol F113,1327300.818871 F115,1825737.153699 F116,1591088.289640 F117,1555151.958621 F118,1392708.812305 F119,1815452.525813 F120,1672391.014925 F121,1415862.242416 F122,1333443.454197
Esto lo voy a convertir a z-scores,
- convert_icv.r
setwd("/nas/data/kissface") read.csv("kissface_raw_icv.csv") -> x x$zICV <- (x$EstimatedTotalIntraCranialVol - mean(x$EstimatedTotalIntraCranialVol))/sd(x$EstimatedTotalIntraCranialVol) write.csv(x, file="kissface_zicv.temp", row.names=FALSE)
asi,
[osotolongo@brick03 kissface]$ Rscript convert_icv.r [osotolongo@brick03 kissface]$ head kissface_zicv.temp "PSubject","EstimatedTotalIntraCranialVol","zICV" "F113",1327300.818871,-1.07341914119678 "F115",1825737.153699,2.33416070384843 "F116",1591088.28964,0.729974403694122 "F117",1555151.958621,0.48429424611196 "F118",1392708.812305,-0.626254790292351 "F119",1815452.525813,2.26384943589037 "F120",1672391.014925,1.28580372164305 "F121",1415862.242416,-0.467965443635606 "F122",1333443.454197,-1.0314247702616 [osotolongo@brick03 kissface]$ sed 's/"//g;' kissface_zicv.temp | awk -F"," '{print $1","$3}' > kissface_zicv.csv [osotolongo@brick03 kissface]$ head kissface_zicv.csv PSubject,zICV F113,-1.07341914119678 F115,2.33416070384843 F116,0.729974403694122 F117,0.48429424611196 F118,-0.626254790292351 F119,2.26384943589037 F120,1.28580372164305 F121,-0.467965443635606 F122,-1.0314247702616
Ahora todo junto,
[osotolongo@brick03 kissface]$ join -t, -1 2 -2 1 kissface_codes.csv kissing_clean.csv > kissing_data_0.csv [osotolongo@brick03 kissface]$ join -t, kissing_data_0.csv kissface_zicv.csv > kissface_data.csv [osotolongo@brick03 kissface]$ head kissface_data.csv PSubject,Subject,age,sex,years_education_fac,kissing_dancing_words_fac,zICV F113,0001,79,1,15,50,-1.07341914119678 F115,0002,66,1,19,50,2.33416070384843 F116,0003,55,0,8,50,0.729974403694122 F117,0004,73,0,8,44,0.48429424611196 F118,0005,66,0,12,49,-0.626254790292351 F119,0006,52,1,12,48,2.26384943589037 F120,0007,70,1,23,51,1.28580372164305 F121,0008,73,0,15,50,-0.467965443635606 F122,0009,59,0,10,47,-1.0314247702616
formateando los fsgd
Primero sin ICV,
[osotolongo@brick03 kissface]$ awk -F"," '{print "kissface_"$2","$3","$4","$5","$6}' kissface_data.csv | sed 's/kissface_Subject/Variables/' | sed 's/kissface_\([^,]*\),/Input kissface_\1 Main /; s/,/ /g' > kissface_body_noicv.csv [osotolongo@brick03 kissface]$ head kissface_body_noicv.csv Variables age sex years_education_fac kissing_dancing_words_fac Input kissface_0001 Main 79 1 15 50 Input kissface_0002 Main 66 1 19 50 Input kissface_0003 Main 55 0 8 50 Input kissface_0004 Main 73 0 8 44 Input kissface_0005 Main 66 0 12 49 Input kissface_0006 Main 52 1 12 48 Input kissface_0007 Main 70 1 23 51 Input kissface_0008 Main 73 0 15 50 Input kissface_0009 Main 59 0 10 47 [osotolongo@brick03 kissface]$ cat headers.txt kissface_body_noicv.csv > kissface_noicv.fsgd [osotolongo@brick03 kissface]$ head kissface_noicv.fsgd GroupDescriptorFile 1 Title FACEHBI_KD Class Main Variables age sex years_education_fac kissing_dancing_words_fac Input kissface_0001 Main 79 1 15 50 Input kissface_0002 Main 66 1 19 50 Input kissface_0003 Main 55 0 8 50 Input kissface_0004 Main 73 0 8 44 Input kissface_0005 Main 66 0 12 49 Input kissface_0006 Main 52 1 12 48
y ahora con ICV,
[osotolongo@brick03 kissface]$ awk -F"," '{print "kissface_"$2","$3","$4","$5","$6","$7}' kissface_data.csv | sed 's/kissface_Subject/Variables/' | sed 's/kissface_\([^,]*\),/Input kissface_\1 Main /; s/,/ /g' > kissface_body_icv.csv [osotolongo@brick03 kissface]$ head kissface_body_icv.csv Variables age sex years_education_fac kissing_dancing_words_fac zICV Input kissface_0001 Main 79 1 15 50 -1.07341914119678 Input kissface_0002 Main 66 1 19 50 2.33416070384843 Input kissface_0003 Main 55 0 8 50 0.729974403694122 Input kissface_0004 Main 73 0 8 44 0.48429424611196 Input kissface_0005 Main 66 0 12 49 -0.626254790292351 Input kissface_0006 Main 52 1 12 48 2.26384943589037 Input kissface_0007 Main 70 1 23 51 1.28580372164305 Input kissface_0008 Main 73 0 15 50 -0.467965443635606 Input kissface_0009 Main 59 0 10 47 -1.0314247702616 [osotolongo@brick03 kissface]$ cat headers.txt kissface_body_icv.csv > kissface_icv.fsgd [osotolongo@brick03 kissface]$ head kissface_icv.fsgd GroupDescriptorFile 1 Title FACEHBI_KD Class Main Variables age sex years_education_fac kissing_dancing_words_fac zICV Input kissface_0001 Main 79 1 15 50 -1.07341914119678 Input kissface_0002 Main 66 1 19 50 2.33416070384843 Input kissface_0003 Main 55 0 8 50 0.729974403694122 Input kissface_0004 Main 73 0 8 44 0.48429424611196 Input kissface_0005 Main 66 0 12 49 -0.626254790292351 Input kissface_0006 Main 52 1 12 48 2.26384943589037
Ahora un momento, los mtx,
[osotolongo@brick03 kissface]$ echo "0 0 0 0 1" > noicv.mtx [osotolongo@brick03 kissface]$ echo "0 0 0 0 1 0" > icv.mtx
qcache
OK. antes de seguir, hay que hacer el qcached de los sujetos. estos se hace con
recon-all -s subject -qcache
y hay que lanzarlo para todos los sujetos del proyecto. Generalmente esto esta hecho ya pero nos hemos inventado el proyecto y es la primera vez que hacemos el analisis.
Asi que hay que escribir algo como,
[osotolongo@brick03 kissface]$ mkdir slurm; for x in $(awk -F";" '{print $1}' kissface_mri.csv); do echo '#!/bin/bash' > slurm/rec_${x}.sh; echo "#SBATCH -c 2" >> slurm/rec_${x}.sh; echo "#SBATCH -o slurm/slurm_${x}_%j.out" >> slurm/rec_${x}.sh; echo "recon-all -s kissface_${x} -qcache" >> slurm/rec_${x}.sh; chmod +x slurm/rec_${x}.sh; sbatch slurm/rec_${x}.sh; done
Esto lanza unos cuantos jobs al worload manager pero no demoran demasiado asi que da tiempo a ir escribiendo los scripts de FSGA mientras tanto.
Mejor opción aún es utilizar el script pqcache.pl del pipeline que lanza este commando dentro de un proyecto.
scripting fsga
y ahora los scripts para la ejecucion.
Para cortical thickness,
- fsga_ct.sh
#!/bin/bash mris_preproc --fsgd kissface_noicv.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.kissface.thickness.10.mgh mris_preproc --fsgd kissface_noicv.fsgd --cache-in thickness.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.kissface.thickness.10.mgh mri_glmfit --y lh.kissface.thickness.10.mgh --fsgd kissface_noicv.fsgd --C noicv.mtx --surf fsaverage lh --glmdir lh.kissface.glmdir_ct mri_glmfit --y rh.kissface.thickness.10.mgh --fsgd kissface_noicv.fsgd --C noicv.mtx --surf fsaverage rh --glmdir rh.kissface.glmdir_ct mri_glmfit-sim --glmdir lh.kissface.glmdir_ct --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.kissface.glmdir_ct --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_ct --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_ct --cache 3 pos --cwp 0.05 --2spaces
hacemos,
[osotolongo@brick03 kissface]$ ./fsga_ct.sh
y cuando termina,
[osotolongo@brick03 kissface]$ for x in *h.kissface.glmdir_ct/noicv/*.summary; do echo ${x}; grep -v "^#" ${x}; done lh.kissface.glmdir_ct/noicv/cache.th30.neg.sig.cluster.summary lh.kissface.glmdir_ct/noicv/cache.th30.pos.sig.cluster.summary rh.kissface.glmdir_ct/noicv/cache.th30.neg.sig.cluster.summary rh.kissface.glmdir_ct/noicv/cache.th30.pos.sig.cluster.summary
que dice que para estos parametros no hay clusters significativos.
Ahora el cortical area,
- fsga_sa.sh
#!/bin/bash mris_preproc --fsgd kissface_icv.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.kissface.area.10.mgh mris_preproc --fsgd kissface_icv.fsgd --cache-in area.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.kissface.area.10.mgh mri_glmfit --y lh.kissface.area.10.mgh --fsgd kissface_icv.fsgd --C icv.mtx --surf fsaverage lh --glmdir lh.kissface.glmdir_sa mri_glmfit --y rh.kissface.area.10.mgh --fsgd kissface_icv.fsgd --C icv.mtx --surf fsaverage rh --glmdir rh.kissface.glmdir_sa mri_glmfit-sim --glmdir lh.kissface.glmdir_sa --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.kissface.glmdir_sa --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_sa --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_sa --cache 3 pos --cwp 0.05 --2spaces
lanzamos,
[osotolongo@brick03 kissface]$ ./fsga_sa.sh
y tengo, (fucking maths)
ERROR: matrix is ill-conditioned or badly scaled, condno = 10175.5
Lo que me dice que o escalo todo a z-scores o pongo los ICV en $mm^3$
que jodio, esto en R es algo como,
> read.csv("kissface_data.csv")-> x > x$zAge <- (x$age - mean(x$age))/sd(x$age) > x$zEducation <- (x$years_education_fac - mean(x$years_education_fac))/sd(x$years_education_fac) > x$zKD <- (x$kissing_dancing_words_fac - mean(x$kissing_dancing_words_fac))/sd(x$kissing_dancing_words_fac) > write.csv(x, file = "kissface_xetas_tmp.csv", row.names = FALSE)
y luego,
[osotolongo@brick03 kissface]$ head kissface_xetas_tmp.csv "PSubject","Subject","age","sex","years_education_fac","kissing_dancing_words_fac","zICV","zAge","zEducation","zKD" "F113",1,79,1,15,50,-1.07341914119678,1.87955542314161,0.0893677081683601,-0.0152547055628048 "F115",2,66,1,19,50,2.33416070384843,0.06907672568748,1.00128309764142,-0.0152547055628048 "F116",3,55,0,8,50,0.729974403694122,-1.46286678754294,-1.5064842234095,-0.0152547055628048 "F117",4,73,0,8,44,0.48429424611196,1.04394987047047,-1.5064842234095,-3.82893109626387 "F118",5,66,0,12,49,-0.626254790292351,0.06907672568748,-0.594568833936437,-0.650867437346315 "F119",6,52,1,12,48,2.26384943589037,-1.88066956387851,-0.594568833936437,-1.28648016912983 "F120",7,70,1,23,51,1.28580372164305,0.626147094134904,1.91319848711449,0.620358026220705 "F121",8,73,0,15,50,-0.467965443635606,1.04394987047047,0.0893677081683601,-0.0152547055628048 "F122",9,59,0,10,47,-1.0314247702616,-0.905796419095513,-1.05052652867297,-1.92209290091334 [osotolongo@brick03 kissface]$ sed 's/"//g;' kissface_xetas_tmp.csv | awk -F"," '{printf("%s,%04d,%s,%s,%s,%s,%s\n",$1,$2,$8,$4,$9,$10,$7)}' | sed 's/,0000,/,Subject,/'> zkissface_data.csv [osotolongo@brick03 kissface]$ head zkissface_data.csv PSubject,Subject,zAge,sex,zEducation,zKD,zICV F113,0001,1.87955542314161,1,0.0893677081683601,-0.0152547055628048,-1.07341914119678 F115,0002,0.06907672568748,1,1.00128309764142,-0.0152547055628048,2.33416070384843 F116,0003,-1.46286678754294,0,-1.5064842234095,-0.0152547055628048,0.729974403694122 F117,0004,1.04394987047047,0,-1.5064842234095,-3.82893109626387,0.48429424611196 F118,0005,0.06907672568748,0,-0.594568833936437,-0.650867437346315,-0.626254790292351 F119,0006,-1.88066956387851,1,-0.594568833936437,-1.28648016912983,2.26384943589037 F120,0007,0.626147094134904,1,1.91319848711449,0.620358026220705,1.28580372164305 F121,0008,1.04394987047047,0,0.0893677081683601,-0.0152547055628048,-0.467965443635606 F122,0009,-0.905796419095513,0,-1.05052652867297,-1.92209290091334,-1.0314247702616 [osotolongo@brick03 kissface]$ awk -F"," '{print "kissface_"$2","$3","$4","$5","$6","$7}' zkissface_data.csv | sed 's/kissface_Subject/Variables/' | sed 's/kissface_\([^,]*\),/Input kissface_\1 Main /; s/,/ /g' > zkissface_body_icv.csv [osotolongo@brick03 kissface]$ cat headers.txt zkissface_body_icv.csv > kissface_icv.fsgd [osotolongo@brick03 kissface]$ head kissface_icv.fsgd GroupDescriptorFile 1 Title FACEHBI_KD Class Main Variables zAge sex zEducation zKD zICV Input kissface_0001 Main 1.87955542314161 1 0.0893677081683601 -0.0152547055628048 -1.07341914119678 Input kissface_0002 Main 0.06907672568748 1 1.00128309764142 -0.0152547055628048 2.33416070384843 Input kissface_0003 Main -1.46286678754294 0 -1.5064842234095 -0.0152547055628048 0.729974403694122 Input kissface_0004 Main 1.04394987047047 0 -1.5064842234095 -3.82893109626387 0.48429424611196 Input kissface_0005 Main 0.06907672568748 0 -0.594568833936437 -0.650867437346315 -0.626254790292351 Input kissface_0006 Main -1.88066956387851 1 -0.594568833936437 -1.28648016912983 2.2638494358903
and rerun!!!!!
but same shit,
[osotolongo@brick03 kissface]$ for x in *h.kissface.glmdir_sa/icv/*.summary; do echo ${x}; grep -v "^#" ${x}; done lh.kissface.glmdir_sa/icv/cache.th30.neg.sig.cluster.summary lh.kissface.glmdir_sa/icv/cache.th30.pos.sig.cluster.summary rh.kissface.glmdir_sa/icv/cache.th30.neg.sig.cluster.summary rh.kissface.glmdir_sa/icv/cache.th30.pos.sig.cluster.summary
a ver que pasa con el volumen, (Nota: Como cambie la DB para surface area al volumen no hay que hacerle nada porque usa la misma)
- fsga_v.sh
#!/bin/bash mris_preproc --fsgd kissface_icv.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi lh --out lh.kissface.volume.10.mgh mris_preproc --fsgd kissface_icv.fsgd --cache-in volume.fwhm10.fsaverage --target fsaverage --hemi rh --out rh.kissface.volume.10.mgh mri_glmfit --y lh.kissface.volume.10.mgh --fsgd kissface_icv.fsgd --C icv.mtx --surf fsaverage lh --glmdir lh.kissface.glmdir_v mri_glmfit --y rh.kissface.volume.10.mgh --fsgd kissface_icv.fsgd --C icv.mtx --surf fsaverage rh --glmdir rh.kissface.glmdir_v mri_glmfit-sim --glmdir lh.kissface.glmdir_v --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir lh.kissface.glmdir_v --cache 3 pos --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_v --cache 3 neg --cwp 0.05 --2spaces mri_glmfit-sim --glmdir rh.kissface.glmdir_v --cache 3 pos --cwp 0.05 --2spaces
[osotolongo@brick03 kissface]$ ./fsga_v.sh [osotolongo@brick03 kissface]$ for x in *h.kissface.glmdir_v/icv/*.summary; do echo ${x}; grep -v "^#" ${x}; done lh.kissface.glmdir_v/icv/cache.th30.neg.sig.cluster.summary lh.kissface.glmdir_v/icv/cache.th30.pos.sig.cluster.summary rh.kissface.glmdir_v/icv/cache.th30.neg.sig.cluster.summary rh.kissface.glmdir_v/icv/cache.th30.pos.sig.cluster.summary