{"title":"四维肝脏超声标志物标记的观察者间和观察者内变异分析。","authors":"Daniel Wulff, Floris Ernst","doi":"10.1117/1.JMI.12.5.051807","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Four-dimensional (4D) ultrasound imaging is widely used in clinics for diagnostics and therapy guidance. Accurate target tracking in 4D ultrasound is crucial for autonomous therapy guidance systems, such as radiotherapy, where precise tumor localization ensures effective treatment. Supervised deep learning approaches rely on reliable ground truth, making accurate labels essential. We investigate the reliability of expert-labeled ground truth data by evaluating intra- and inter-observer variability in landmark labeling for 4D ultrasound imaging in the liver.</p><p><strong>Approach: </strong>Eight 4D liver ultrasound sequences were labeled by eight expert observers, each labeling eight landmarks three times. Intra- and inter-observer variability was quantified, and observer survey and motion analysis were conducted to determine factors influencing labeling accuracy, such as ultrasound artifacts and motion amplitude.</p><p><strong>Results: </strong>The mean intra-observer variability ranged from <math><mrow><mn>1.58</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>0.90</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> to <math><mrow><mn>2.05</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.22</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> depending on the observer. The inter-observer variability for the two observer groups was <math><mrow><mn>2.68</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.69</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> and <math><mrow><mn>3.06</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.74</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> . The observer survey and motion analysis revealed that ultrasound artifacts significantly affected labeling accuracy due to limited landmark visibility, whereas motion amplitude had no measurable effect. Our measured mean landmark motion was <math><mrow><mn>11.56</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>5.86</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> .</p><p><strong>Conclusions: </strong>We highlight variability in expert-labeled ground truth data for 4D ultrasound imaging and identify ultrasound artifacts as a major source of labeling inaccuracies. These findings underscore the importance of addressing observer variability and artifact-related challenges to improve the reliability of ground truth data for evaluating target tracking algorithms in 4D ultrasound applications.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051807"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207815/pdf/","citationCount":"0","resultStr":"{\"title\":\"Analysis of intra- and inter-observer variability in 4D liver ultrasound landmark labeling.\",\"authors\":\"Daniel Wulff, Floris Ernst\",\"doi\":\"10.1117/1.JMI.12.5.051807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Four-dimensional (4D) ultrasound imaging is widely used in clinics for diagnostics and therapy guidance. Accurate target tracking in 4D ultrasound is crucial for autonomous therapy guidance systems, such as radiotherapy, where precise tumor localization ensures effective treatment. Supervised deep learning approaches rely on reliable ground truth, making accurate labels essential. We investigate the reliability of expert-labeled ground truth data by evaluating intra- and inter-observer variability in landmark labeling for 4D ultrasound imaging in the liver.</p><p><strong>Approach: </strong>Eight 4D liver ultrasound sequences were labeled by eight expert observers, each labeling eight landmarks three times. Intra- and inter-observer variability was quantified, and observer survey and motion analysis were conducted to determine factors influencing labeling accuracy, such as ultrasound artifacts and motion amplitude.</p><p><strong>Results: </strong>The mean intra-observer variability ranged from <math><mrow><mn>1.58</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>0.90</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> to <math><mrow><mn>2.05</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.22</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> depending on the observer. The inter-observer variability for the two observer groups was <math><mrow><mn>2.68</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.69</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> and <math><mrow><mn>3.06</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.74</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> . The observer survey and motion analysis revealed that ultrasound artifacts significantly affected labeling accuracy due to limited landmark visibility, whereas motion amplitude had no measurable effect. Our measured mean landmark motion was <math><mrow><mn>11.56</mn> <mtext> </mtext> <mi>mm</mi> <mo>±</mo> <mn>5.86</mn> <mtext> </mtext> <mi>mm</mi></mrow> </math> .</p><p><strong>Conclusions: </strong>We highlight variability in expert-labeled ground truth data for 4D ultrasound imaging and identify ultrasound artifacts as a major source of labeling inaccuracies. These findings underscore the importance of addressing observer variability and artifact-related challenges to improve the reliability of ground truth data for evaluating target tracking algorithms in 4D ultrasound applications.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 5\",\"pages\":\"051807\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207815/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.5.051807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.5.051807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/30 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Analysis of intra- and inter-observer variability in 4D liver ultrasound landmark labeling.
Purpose: Four-dimensional (4D) ultrasound imaging is widely used in clinics for diagnostics and therapy guidance. Accurate target tracking in 4D ultrasound is crucial for autonomous therapy guidance systems, such as radiotherapy, where precise tumor localization ensures effective treatment. Supervised deep learning approaches rely on reliable ground truth, making accurate labels essential. We investigate the reliability of expert-labeled ground truth data by evaluating intra- and inter-observer variability in landmark labeling for 4D ultrasound imaging in the liver.
Approach: Eight 4D liver ultrasound sequences were labeled by eight expert observers, each labeling eight landmarks three times. Intra- and inter-observer variability was quantified, and observer survey and motion analysis were conducted to determine factors influencing labeling accuracy, such as ultrasound artifacts and motion amplitude.
Results: The mean intra-observer variability ranged from to depending on the observer. The inter-observer variability for the two observer groups was and . The observer survey and motion analysis revealed that ultrasound artifacts significantly affected labeling accuracy due to limited landmark visibility, whereas motion amplitude had no measurable effect. Our measured mean landmark motion was .
Conclusions: We highlight variability in expert-labeled ground truth data for 4D ultrasound imaging and identify ultrasound artifacts as a major source of labeling inaccuracies. These findings underscore the importance of addressing observer variability and artifact-related challenges to improve the reliability of ground truth data for evaluating target tracking algorithms in 4D ultrasound applications.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.