|
International Journal of Engineering، جلد ۳۴، شماره ۶، صفحات ۱۴۱۳-۱۴۱۸
|
|
|
عنوان فارسی |
|
|
چکیده فارسی مقاله |
|
|
کلیدواژههای فارسی مقاله |
|
|
عنوان انگلیسی |
A Signal Processing Method for Text Language Identification |
|
چکیده انگلیسی مقاله |
Language identification is a critical step prior to any natural language processing. In this paper, a signal processing method for Language Identification is proposed. Sequence of characters in a word and the order of words in stream identify the language. The sequence of characters in a stream provides a signature to recognize the language without understanding its meaning. The signature can be extracted using signal processing techniques via converting texts into time series. Although several research and commercial software have been developed to identify text language, they need a standard dictionary for each language. We proposed a dictionary independent method consisting of three main steps, I) preprocessing, II) clustering and finally III) classification. First, the texts are converted to time series using UTF-8 codes. Second, to group similar languages, the obtained series are clustered. Third, each cluster is decomposed into 32 sub-bands using a Wavelet packet, and 32 features are extracted from each sub-band. Also, a multilayer perceptron neural network is used to classify the extracted features. The proposed method was tested on our dataset with 31000 texts from 31 different languages. The proposed method achieved 72.20% accuracy for language identification. |
|
کلیدواژههای انگلیسی مقاله |
Language Identification,Signal processing,Wavelet Packet Transform,Artificial Neural Network |
|
نویسندگان مقاله |
H. Hassanpour | Image Processing & Data Mining Lab, Shahrood University of Technology, Shahrood, Iran
M. M. AlyanNezhadi | Department of Mathematics, University of Science and Technology of Mazandaran, Behshahr, Iran
M. Mohammadi | Department of Information Technology, College of Engineering and Computer Science, Lebanese French University, KR-Iraq
|
|
نشانی اینترنتی |
https://www.ije.ir/article_130899_c957d20e789ee256a32ad62e043d8652.pdf |
فایل مقاله |
فایلی برای مقاله ذخیره نشده است |
کد مقاله (doi) |
|
زبان مقاله منتشر شده |
en |
موضوعات مقاله منتشر شده |
|
نوع مقاله منتشر شده |
|
|
|
برگشت به:
صفحه اول پایگاه |
نسخه مرتبط |
نشریه مرتبط |
فهرست نشریات
|