سامانه اطلاعات پژوهشی ایران

سه شنبه 27 خرداد 1404


جستارهای زبانی، جلد ۱۲، شماره ۶، صفحات ۴۹۹-۵۳۱


عنوان فارسی	نگرشی به «متن‌کاوی» در پژوهش‌های زبانی: رویکرد رایانشی در تحلیل متون

چکیده فارسی مقاله	«متنکاوی» به فرایند رایانشی تحلیل متون بدون ساختار و استخراج لایههای زبانی پنهان و مضامین موجود در آن‌ها گفته میشود. این روش، اهمیت ویژهای در تحلیل محتوا یا مضمون پژوهشهای توصیفی و تفسیری دارد. در این فرایند، نخست متون ساده ساختارمند شده وسپس مفاهیم و انگاره‌های نهفتۀ آن خلاصهسازی، طبقهبندی، مدلسازی، ارزیابی و تفسیر میشوند. نظر به اینکه این روش به‌ویژه در مطالعات گفتمان به‌منزلۀ یک نوآوری میانرشتهای به‌شمار میآید، سزاوار است استفاده از آن در مطالعات دانشگاهی کشور با جدیت بیشتری دنبال شود. معالوصف، بهرغم گستردگی کمی و کیفی پژوهشهای بین‌المللی در این حوزه، جای خالی این پژوهش‌ها در مقالات فارسی و انگلیسی داخل کشور بسیار احساس می‌شود. از اینرو، این مقاله در نظر دارد از رهگذر کنکاش نظری و عملی روشهای متنکاوی و ارزیابی ابزارها و روش‌های اصلی آن در زبان فارسی و انگلیسی، بستری مناسب برای بهرهمندی از ظرفیتهای این روششناسی در مطالعات زبانی فراهم سازد.

کلیدواژه‌های فارسی مقاله	متن‌کاوی، متون بدون ساختار، تحلیل محتوا، تحلیل مضمون، پردازش طبیعی زبان.

عنوان انگلیسی	An Overview of Text Mining in Language Studies: The Computational Approach to Text Analytics

چکیده انگلیسی مقاله	Text mining' refers to the computational process of unstructured text analytics for extracting latent linguistic layers and themes. It is especially significant as content or thematic analysis in descriptive and interpretive studies. This process begins with structuring simple texts and proceeds with summarizing, classifiing, modelling, evaluating and interpreting the inherent textual concepts and patterns. Given that this method counts as an interdisciplinary innovation especially in discoursal studies, it is to be pursued more intensively in academic studies. Despite the multitude of English studies in this area, there has been little interest to date in text mining amongst Iranian researchers as evidenced by the critically limited number of local Persian and English studies. Thus looking into the theory and practice of text mining and its major analytic tools and methods in Persian and English, this paper aims to prepare the ground for utilizing this methodology in language studies. The last two decades faced a major increase in the rate and accuracy of knowledge generation in language studies due to advances in interdisciplinary studies of applied linguistics and computer sciences. At the heart of methodological innovations especially in discourse studies lies 'text mining' whose merits have only recently been appreciated by researchers. 'Text mining', 'text data mining' or 'Text Analysis' is the use of different data mining algorithms and methods like natural language processing and linguistic as well as statistical techniques to derive linguistic features, significant patterns and valuable themes from the unstructured texts through collecting unstructured data, pre-processing and cleansing them to detect and remove anomalies and processing and controlling operations (Zhou et al, 2012). These processes are further broken down into feature extraction, structural analysis, text summary, text classification, text clustering, and association analysis. Text mining is actually a complicated procedure of extracting valuable, significant patterns and trends from a large number of textual data used for such functions as product suggestion analysis, social media opinion mining, and sentiment or trend analysis (He, 2013). Dating back to Feldman and Dagan (1995), text mining is an innovative methodology with a relatively short history which is often integrated with corpus analysis to computationally analyze a large body of unstructured texts as potential inormatieofinsight. As a subfield of data mining in computer sciences and an interdisciplinary method, text mining borrows from corpus and computational linguistics, whose main purpose is to extract the meta-characters representing textual features (Pons-Porrata et al, 2007). Zhou et al (2017) believe that despite its short history, text mining has been remarkably evolved into the mainstream research methodology in many interdisciplinary areas in the wake of increasingly rapid developments in data mining. Hashimi et al (2015) explained the steps involved in text mining as a semi-automated process of collecting, structuring and then analyzing textual data as follows: (a) collecting unstructured data from a variety of sources like textual documents, social media, web pages, mails, blogs, etc. using specialized corpora for organization, (b) pre-processing and cleansing the data for removing the anomalies to unveil latent valuable information using text mining tools, (c) unstructured data conversion into relevant structured formats, (d) discovering the underlying data patterns using word structures, sequences and frequency, and (e) extracting useful knowledge and storing them in a secure database for evaluation, later retrieval, trend analysis and possible decision-making. Text mining aslso makes use of lexicometrics dealing with frequency and co-occurrence analysis of vocabulary to derive structures from texts; sentiment analysis is an application of lexicometrics looking for positive or negative emotions in documents and has been used in social media analysis for evaluating public opinion (Shangzhen & Lemen, 2016). Text mining is an area of inquiry that in itself deserves to be pursued more intensively in future studies and this paper, thus, is an attempt to review its basic principles, procedures and top analytic tools and to raise researchers' awareness of the virtues of text mining.

کلیدواژه‌های انگلیسی مقاله	Text mining, unstructured texts, content analysis, thematic analysis, natural language processing

نویسندگان مقاله	هادی مسجدی \| Hadi Masjedy PhD in TEFL, Hakim Sabzevari University, Sabzevar, Iran دکتری زبان انگلیسی دانشگاه حکیم سبزواری، سبزوار، ایران سید محمد رضا عادل \| Seyyed Mohammad Reza Adel Associate Professor of TEFL, Hakim Sabzevari University, Sabzevar,Iran دانشیار زبان انگلیسی دانشگاه حکیم سبزواری، سبزوار، ایران سید محمدرضا امیریان \| Seyed Mohammad Reza Amirian Associate Professor of TEFL, Hakim Sabzevari University, Sabzevar, Iran دانشیار زبان انگلیسی دانشگاه حکیم سبزواری، سبزوار، ایران غلامرضا زارعیان \| Gholamreza Zareian Associate Professor of TEFL, Hakim Sabzevari University, Sabzevar, Iran دانشیار زبان انگلیسی دانشگاه حکیم سبزواری، سبزوار، ایران

نشانی اینترنتی	http://lrr.modares.ac.ir/browse.php?a_code=A-10-49659-5&slc_lang=fa&sid=14
فایل مقاله	فایلی برای مقاله ذخیره نشده است
کد مقاله (doi)
زبان مقاله منتشر شده	fa
موضوعات مقاله منتشر شده
نوع مقاله منتشر شده	مقالات مروری تحلیلی

برگشت به: صفحه اول پایگاه \| نسخه مرتبط \| نشریه مرتبط \| فهرست نشریات

ارسال پیام برخط

در صورت مشاهده هر نوع اشکال در داده های پایگاه و یا برای ارسال نظرات و پیشنهاد های خود می توانید با پر کردن فرم تماس ما را در جریان قرار دهید.
برای پر کردن فرم تماس اینجا را کلیک کنید.

آمار پایگاه

نمایه شده در ISI 135

نمایه شده در PubMed 109

نمایه شده در Scopus 191

کاربران برخط 405

بازدید امروز 10760

بازدید کل 31241277

اطلاعات تماس

آدرس : تهران، سعادت آباد، بلوار پاکنژاد شمالی، بالاتر از میدان سرو، نبش کوچه ندا، پلاک ۶۸، ساختمان جاوید، واحد ۱۶

پست الکترونیک: yektaweb-AT-gmail.com

توجه

کلیه حقوق این وب سایت و مطالب آن متعلق به شرکت یکتاوب بوده و استفاده از مطالب آن با ذکر منبع بلامانع است
طراحی و برنامه نویسی: یکتاوب افزار شرق