ETRI JournalPub Date : 2024-02-25DOI: 10.4218/etrij.2023-0335
Joghee Prasad, Arun Sekar Rajasekaran, J. Ajayan, Kambatty Bojan Gurumoorthy
{"title":"Finite impulse response design based on two-level transpose Vedic multiplier for medical image noise reduction","authors":"Joghee Prasad, Arun Sekar Rajasekaran, J. Ajayan, Kambatty Bojan Gurumoorthy","doi":"10.4218/etrij.2023-0335","DOIUrl":"10.4218/etrij.2023-0335","url":null,"abstract":"<p>Medical signal processing requires noise and interference-free inputs for precise segregation and classification operations. However, sensing and transmitting wireless media/devices generate noise that results in signal tampering in feature extractions. To address these issues, this article introduces a finite impulse response design based on a two-level transpose Vedic multiplier. The proposed architecture identifies the zero-noise impulse across the varying sensing intervals. In this process, the first level is the process of transpose array operations with equalization implemented to achieve zero noise at any sensed interval. This transpose occurs between successive array representations of the input with continuity. If the continuity is unavailable, then the noise interruption is considerable and results in signal tampering. The second level of the Vedic multiplier is to optimize the transpose speed for zero-noise segregation. This is performed independently for the zero- and nonzero-noise intervals. Finally, the finite impulse response is estimated as the sum of zero- and nonzero-noise inputs at any finite classification.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 4","pages":"619-632"},"PeriodicalIF":1.3,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0335","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-25DOI: 10.4218/etrij.2023-0288
Pranab Das, Dilwar Hussain Mazumder
{"title":"Inceptionv3-LSTM-COV: A multi-label framework for identifying adverse reactions to COVID medicine from chemical conformers based on Inceptionv3 and long short-term memory","authors":"Pranab Das, Dilwar Hussain Mazumder","doi":"10.4218/etrij.2023-0288","DOIUrl":"10.4218/etrij.2023-0288","url":null,"abstract":"<p>Due to the global COVID-19 pandemic, distinct medicines have been developed for treating the coronavirus disease (COVID). However, predicting and identifying potential adverse reactions to these medicines face significant challenges in producing effective COVID medication. Accurate prediction of adverse reactions to COVID medications is crucial for ensuring patient safety and medicine success. Recent advancements in computational models used in pharmaceutical production have opened up new possibilities for detecting such adverse reactions. Due to the urgent need for effective COVID medication development, this research presents a multi-label Inceptionv3 and long short-term memory methodology for COVID (Inceptionv3-LSTM-COV) medicine development. The presented experimental evaluations were conducted using the chemical conformer image of COVID medicine. The features of the chemical conformer are denoted utilizing the RGB color channel, which is extracted using Inceptionv3, GlobalAveragePooling2D, and long short-term memory (LSTM) layers. The results demonstrate that the efficiency of the Inceptionv3-LSTM-COV model outperformed the previous study's performance and achieved better results compared to MLCNN-COV, Inceptionv3, ResNet50, MobileNetv2, VGG19, and DenseNet201 models. The proposed model reported the highest accuracy value of 99.19% in predicting adverse reactions to COVID medicine.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 6","pages":"1030-1046"},"PeriodicalIF":1.3,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-25DOI: 10.4218/etrij.2023-0085
Ayoung Kim, Eun-Vin An, Soon-heung Jung, Hyon-Gon Choo, Jeongil Seo, Kwang-deok Seo
{"title":"Suboptimal video coding for machines method based on selective activation of in-loop filter","authors":"Ayoung Kim, Eun-Vin An, Soon-heung Jung, Hyon-Gon Choo, Jeongil Seo, Kwang-deok Seo","doi":"10.4218/etrij.2023-0085","DOIUrl":"10.4218/etrij.2023-0085","url":null,"abstract":"<p>A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that increases the compression efficiency and maintains the accuracy of machine vision tasks must be devised. Hence, the Moving Picture Experts Group created a standardization process for video coding for machines (VCM) to reduce bitrates while maintaining the accuracy of machine vision tasks. In particular, in-loop filters have been developed for improving the subjective quality and machine vision task accuracy. However, the high computational complexity of in-loop filters limits the development of a high-performance VCM architecture. We analyze the effect of an in-loop filter on the VCM performance and propose a suboptimal VCM method based on the selective activation of in-loop filters. The proposed method reduces the computation time for video coding by approximately 5% when using the enhanced compression model and 2% when employing a Versatile Video Coding test model while maintaining the machine vision accuracy and compression efficiency of the VCM architecture.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 3","pages":"538-549"},"PeriodicalIF":1.4,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-14DOI: 10.4218/etrij.2023-0357
Sangyeop Yeo, Yu-Seung Ma, Sang Cheol Kim, Hyungkook Jun, Taeho Kim
{"title":"Framework for evaluating code generation ability of large language models","authors":"Sangyeop Yeo, Yu-Seung Ma, Sang Cheol Kim, Hyungkook Jun, Taeho Kim","doi":"10.4218/etrij.2023-0357","DOIUrl":"10.4218/etrij.2023-0357","url":null,"abstract":"<p>Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, \u0000<math>\u0000 <mi>p</mi>\u0000 <mi>a</mi>\u0000 <mi>s</mi>\u0000 <mi>s</mi>\u0000 <mtext>-</mtext>\u0000 <mi>r</mi>\u0000 <mi>a</mi>\u0000 <mi>t</mi>\u0000 <mi>i</mi>\u0000 <mi>o</mi>\u0000 <mi>@</mi>\u0000 <mi>n</mi></math>, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the \u0000<math>\u0000 <mi>p</mi>\u0000 <mi>a</mi>\u0000 <mi>s</mi>\u0000 <mi>s</mi>\u0000 <mtext>-</mtext>\u0000 <mi>r</mi>\u0000 <mi>a</mi>\u0000 <mi>t</mi>\u0000 <mi>i</mi>\u0000 <mi>o</mi>\u0000 <mi>@</mi>\u0000 <mi>n</mi></math> metric.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"106-117"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0357","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-14DOI: 10.4218/etrij.2023-0358
Yong-Seok Choi, Jeong-Uk Bang, Seung Hi Kim
{"title":"Joint streaming model for backchannel prediction and automatic speech recognition","authors":"Yong-Seok Choi, Jeong-Uk Bang, Seung Hi Kim","doi":"10.4218/etrij.2023-0358","DOIUrl":"10.4218/etrij.2023-0358","url":null,"abstract":"<p>In human conversations, listeners often utilize brief backchannels such as “uh-huh” or “yeah.” Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human–machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single-task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single-task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"118-126"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-14DOI: 10.4218/etrij.2023-0321
Kyoungman Bae, Joon-Ho Lim
{"title":"Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets","authors":"Kyoungman Bae, Joon-Ho Lim","doi":"10.4218/etrij.2023-0321","DOIUrl":"10.4218/etrij.2023-0321","url":null,"abstract":"<p>We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"59-70"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis of electronically tunable multifunction biquad filter using voltage differencing differential input buffered amplifiers","authors":"Sirigul Bunrueangsak, Winai Jaikla, Amornchai Chaichana, Piya Supavarasuwat, Surapong Siripongdee, Peerawut Suwanjan","doi":"10.4218/etrij.2023-0391","DOIUrl":"10.4218/etrij.2023-0391","url":null,"abstract":"<p>Biquad filters are commonly used in analog circuits for various purposes in signal processing and communication applications. We synthesize an analog active biquad filter with five types of voltage-mode filtering functions. The filter is synthesized using a parallel passive resistor-inductor-capacitor (RLC) network and unity-gain voltage differencing amplifier. A voltage differencing differential input buffered amplifier (VD-DIBA) is the main active component, and the biquad filter has a three-input single-output (TISO) topology. By replacing the passive inductor and resistor with VD-DIBA-based inductance and resistance simulators with a subtractor, the TISO voltage-mode versatile filter is obtained from two VD-DIBAs, one resistor, and two capacitors connected to the ground. The proposed filter can provide five types of voltage-mode filtering functions: inverting bandpass and lowpass responses as well as noninverting band-stop, high-pass, and all-pass responses. The all-pass filter requires no additional active components. The three input voltage nodes have high impedance, and a low-impedance output voltage node facilitates cascade connections without using additional voltage buffers. In addition, the natural frequency and quality factor can be electronically tuned. The quality factor is controlled without disturbing the passband gain and natural frequency. The proposed filter is simulated and verified experimentally in the Personal Simulation Program with Integrated Circuit Emphasis (PSPICE) and through laboratory tests employing VD-DIBAs implemented using commercially available components.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"47 1","pages":"144-157"},"PeriodicalIF":1.3,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0391","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-02-14DOI: 10.4218/etrij.2023-0266
Sanghun Jeon, Jieun Lee, Dohyeon Yeo, Yong-Ju Lee, SeungJun Kim
{"title":"Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems","authors":"Sanghun Jeon, Jieun Lee, Dohyeon Yeo, Yong-Ju Lee, SeungJun Kim","doi":"10.4218/etrij.2023-0266","DOIUrl":"10.4218/etrij.2023-0266","url":null,"abstract":"<p>Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial–temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"22-34"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of multiview video compression based on MIV and VVC multilayer","authors":"Jinho Lee, Gun Bang, Jungwon Kang, Mehrdad Teratani, Gauthier Lafruit, Haechul Choi","doi":"10.4218/etrij.2023-0309","DOIUrl":"10.4218/etrij.2023-0309","url":null,"abstract":"<p>To represent immersive media providing six degree-of-freedom experience, moving picture experts group (MPEG) immersive video (MIV) was developed to compress multiview videos. Meanwhile, the state-of-the-art versatile video coding (VVC) also supports multilayer (ML) functionality, enabling the coding of multiview videos. In this study, we designed experimental conditions to assess the performance of these two state-of-the-art standards in terms of objective and subjective quality. We observe that their performances are highly dependent on the conditions of the input source, such as the camera arrangement and the ratio of input views to all views. VVC-ML is efficient when the input source is captured by a planar camera arrangement and many input views are used. Conversely, MIV outperforms VVC-ML when the camera arrangement is non-planar and the ratio of input views to all views is low. In terms of the subjective quality of the synthesized view, VVC-ML causes severe rendering artifacts such as holes when occluded regions exist among the input views, whereas MIV reconstructs the occluded regions correctly but induces rendering artifacts with rectangular shapes at low bitrates.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 6","pages":"1075-1089"},"PeriodicalIF":1.3,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ETRI JournalPub Date : 2024-01-31DOI: 10.4218/etrij.2023-0322
Byung Ok Kang, Hyung-Bae Jeon, Yun Kyung Lee
{"title":"AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation","authors":"Byung Ok Kang, Hyung-Bae Jeon, Yun Kyung Lee","doi":"10.4218/etrij.2023-0322","DOIUrl":"10.4218/etrij.2023-0322","url":null,"abstract":"<p>This paper presents the development of language tutoring systems for non-native speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI PengTalk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"48-58"},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}