سامانه اطلاعات پژوهشی ایران

این سایت در حال حاضر پشتیبانی نمی شود و امکان دارد داده های نشریات بروز نباشند

شنبه 1 آذر 1404


پردازش علائم و داده ها، جلد ۱۷، شماره ۴، صفحات ۶۷-۸۸


عنوان فارسی	تشخیص عبارت‌های گفتاری برای اخبار فارسی صداوسیمای جمهوری اسلامی ایران

چکیده فارسی مقاله	هدف از تشخیص عبارت‌های گفتاری یا جستجوی کلیدواژه، تشخیص و جستجوی مجموعه‌ای از کلیدواژه‌ها در مجموعه‌ای از اسناد گفتاری (مانند سخنرانی‌ها،‌ جلسه‌ها) است. در این پژوهش تشخیص عبارت‌های گفتاری فارسی برپایه سامانه‌های بازشناسی گفتار با کاربرد در بازیابی اطلاعات در بایگانی‌های گفتاری و ویدئویی سازمان صدا و سیما طراحی و پیاده‌سازی شده است. برای این کار، ابتدا اسناد گفتاری به متن، بازشناسی، سپس بر روی این متون جستجو انجام می‌شود. برای آموزش سامانه بازشناسی گفتار فارسی، دادگان فارس‌دات بزرگ به‌کار رفته است. این سامانه به نرخ خطای واژه 71/2 درصد بر روی همین دادگان و 23/28 درصد بر روی دادگان اخبار فارسی با استفاده از مدل‌ زیر فضای مخلوط گوسی (SGMM) رسید. برای تشخیص عبارت‌های گفتاری از روش پایه واژگان نماینده استفاده شده و با استفاده از شبکه حافظه کوتاه-مدت ماندگار و دسته‌بندی زمانی پیوندگرا (LSTM-CTC) روشی برای بهبود تشخیص واژگان خارج از واژگان (OOV) پیشنهاد شده است. کارایی سامانه تشخیص عبارات با روش واژه‌های نماینده بر روی دادگان فارس‌دات بزرگ بر طبق معیار ارزش وزنی واقعی عبارت (ATWV) برابر با 9206/0 برای کلیدواژه‌های داخل واژگان و برابر با 2/0 برای کلیدواژه‌های خارج از واژگان رسید که این نرخ برای واژگان OOV با استفاده از روش LSTM-CTC با حدود پنجاه درصد بهبود به مقدار 3058/0 رسید؛ همچنین، در تشخیص عبارت‌های گفتاری بر روی دادگان اخبار فارسی، ATWV برابر 8008/0 حاصل شد.

کلیدواژه‌های فارسی مقاله	تشخیص عبارت‌های گفتاری فارسی، جستجوی کلیدواژه، بازشناسی گفتار، سازمان صداوسیما، کلدی

عنوان انگلیسی	Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

چکیده انگلیسی مقاله	Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIB's archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting. The aim of this research is to design a content retrieval engine for the IRIB's media and production using spoken term detection (STD) or keyword spotting. The goal of an STD system is to search for a set of keywords in a set of speech documents. One of the methods for STD is using a speech recognition system in which speech is recognized and converted into text and then, the text is searched for the keywords. Variety of speech documents and the limitation of speech recognition vocabulary are two challenges of this approach. Large vocabulary continuous speech recognition systems (LVCSR) usually have limited but large vocabulary and these systems can't recognize out of vocabulary (OOV) words. Therefore, LVCSR-based STD systems suffer OOV problem and can't spotting the OOV keywords. Methods such as the use of sub-word units (e.g., phonemes or syllables) and proxy words have been introduced to overcome the vocabulary limitation and to deal with the out of vocabulary (OOV) keywords. This paper proposes a Persian (Farsi) STD system based on speech recognition and uses the proxy words method to deal with OOV keywords. To improve the performance of this method, we have used Long Short-Term Memory-Connectionist Temporal Classification (LSTM-CTC) network. In our experiments, we have designed and implemented a large vocabulary continuous speech recognition systems for Farsi language. Large FarsDat dataset is used to train the speech recognition system. FarsDat contains 80 hours voices from 100 speakers. Kaldi toolkit is used to implement speech recognition system. Since limited dataset, Subspace Gaussian Mixture Models (SGMM) is used to train acoustic model of the speech recognition. Acoustic model is trained based context tri-phones and language model is probability tri-gram words model. Word Error Rate (WER) of Speech recognition system is 2. 71% on FARSDAT test set and also 28.23% on the Persian news collected from IRIB data. Term detection is designed based on weighted finite-state transducers (WFST). In this method, first a speech document is converted to a lattice by the speech recognizer (the lattice contains the full probability of speech recognition system instead of the most probable one), and then the lattice is converted to WFST. This WFST contains the full probability of words that speech recognition computed. Then, text retrieval is used to index and search over the WFST output. The proxy words method is used to deal with OOV. In this method, OOV words are represented by similarly pronunciation in-vocabulary words. To improve the performance of the proxy words methods, an LSTM-CTC network is proposed. This LSTM-CTC is trained based on charterers of words separately (not a continuous sentence). This LSTM-CTC recomputed the probabilities and re-verified proxy outputs. It improves proxy words methods dues to the fact that proxy words method suffers false alarms. Since LSTM-CTC is an end-to-end network and is trained based on the characters, it doesn't need a phonetic lexicon and can support OOV words. As the LSTM-CTC is trained based on the separate words, it reduces the weight of the language model and focuses on acoustic model weight. The proposed STD achieve 0.9206 based Actual Term Weighted Value (ATWV) for in vocabulary keywords and for OOV keywords ATWV is 0.2 using proxy word method. Applying the proposed LSTM-CTC improves the ATWV rate to 0.3058. On Persian news dataset, the proposed method receives ATWV of 0.8008.

کلیدواژه‌های انگلیسی مقاله	Persian Spoken Term Detection, IRIB, Persian News, Keyword Spotting, Speech Recognition, Kaldi

نویسندگان مقاله	هادی ویسی \| Hadi Veisi Faculty of New Sciences and Technologies, University of Tehran دانشکده علوم و فنون نوین، دانشگاه تهران سید اکبر قریشی \| Sayed Akbar Ghoreishi Department of Media Engineering, IRI Broadcast University دانشکده فنی و مهندسی رسانه، دانشگاه صداوسیما اعظم باستان‌فرد \| Azam Bastanfard Karaj Islamic Azad University دانشگاه آزاد اسلامی واحد کرج

نشانی اینترنتی	http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-117-1&slc_lang=fa&sid=1
فایل مقاله	فایلی برای مقاله ذخیره نشده است
کد مقاله (doi)
زبان مقاله منتشر شده	fa
موضوعات مقاله منتشر شده	مقالات پردازش گفتار
نوع مقاله منتشر شده	کاربردی

برگشت به: صفحه اول پایگاه \| نسخه مرتبط \| نشریه مرتبط \| فهرست نشریات

ارسال پیام برخط

در صورت مشاهده هر نوع اشکال در داده های پایگاه و یا برای ارسال نظرات و پیشنهاد های خود می توانید با پر کردن فرم تماس ما را در جریان قرار دهید.
برای پر کردن فرم تماس اینجا را کلیک کنید.

آمار پایگاه

نمایه شده در ISI 135

نمایه شده در PubMed 109

نمایه شده در Scopus 192

کاربران برخط 680

بازدید امروز 26723

بازدید کل 38317887

اطلاعات تماس

آدرس : تهران، سعادت آباد، بلوار پاکنژاد شمالی، بالاتر از میدان سرو، نبش کوچه ندا، پلاک ۶۸، ساختمان جاوید، واحد ۱۶

پست الکترونیک: yektaweb-AT-gmail.com

توجه

کلیه حقوق این وب سایت و مطالب آن متعلق به شرکت یکتاوب بوده و استفاده از مطالب آن با ذکر منبع بلامانع است
طراحی و برنامه نویسی: یکتاوب افزار شرق