Mathematical Linguistics Vol. 23

Vol. 23 No. l (June, 2001)	Vol. 23 No. 2 (Sep., 2001)	Vol. 23 No. 3 (Dec., 2001)	VOl. 23 No. 4 (Mar., 2002)
VOl. 23 No. 5 (June, 2002)	Vol. 23 No. 6 (Sep., 2002)	Vol. 23 No. 7 (Dec., 2002)	Vol. 23 No. 8 (Mar., 2003)

Vol. 23 No. 1 (June, 2001)

Classification:
Paper
Authors:
UCHIYAMA Kiyoko (Graduate School of Media and Governance, Keio University), TAKEUCHI Koichi, YOSHIOKA Masaharu, KAGEURA Kyo and KOYAMA Teruo (National Institute of Informatics)
Title:
A Study of Grammatical Categories Based on Grammatical Features for Analysis of Compound Nouns in Specialized Field
Pages:
1-24
Descriptors:
Compound Nouns; Grammatical Categories; Grammatical Features
Abstract:
Most studies of compound noun analysis have been made on semantic information and co-occurrence information, and little attention has been given to grammatical features. In addition, existing grammatical categories applied to constituent elements of compounds are too coarse for analyzing compounds. Against this background, our aim here is to extract grammatical features and to establish grammatical categories which might be more suitable for analyzing compounds, and also to examine and evaluate to what extent they are valid and useful. To evaluate the usefulness of grammatical categories, we assign them to constituent elements of compounds and examine the correlation between grammatical categories and intra-compound relations. In the process, we also tried to clarify the limitation of the grammatical approach or the point where semantic information should be taken into account.
Classification:
Paper
Author:
CHOI Hyunchoel (Graduate School of Information Sciences, Tohoku University)
Title:
An Analysis of Diachronic Shift of Accent Nucleus in Japanese Loanwords
Pages:
25-36
Descriptors:
loanwords; accent pattern; diachronic shift; unaccentuation; unrecoverability
Abstract:
In this paper, we examined the diachronical change of the accent pattern of the Japanese loanwords. The data we used is the loanwords in the four versions (1951, 1966, 1985, 1998) of NHK Japanese Pronunciation and Accent Dictionary.
We found that:
1. The unaccentuation has been diachronically on the increase, althrough the general tencency of the change has been on the decrease.
2. many words underwent the change of word-mid accent --> word-head accent --> unaccentuation from 1966 through 1985 in statistically significant number. But, The unaccentuation became remarkable like ``word-head accent --> unaccentuation'', ``word-mid accent --> unaccentuation'' from 1985 through 1998.
3. Most of the words which lost their accent nucleus never recover them, revealing the unrecoverability of unaccentuation.
Classification:
Note
Author:
OGINO Tsunao (Tokyo Metropolitan University)
Title:
Papers with Unspecified Sample Size: An Inference of Sample Size from Percentages
Pages:
37-40
Descriptors:
percentage; sample size; inference

Vol. 23 No. 2 (September 20, 2001)

Classification:
Paper
Authors:
TAKEDA Yoshiyuki (Toyohashi University of Technology), UMEMURA Kyoji (Toyohashi University of Technology)
Title:
Document Frequency Analysis Which Realizes Keyword Extraction
Pages:
65-90
Descriptors:
Word Segmentation; adaptation; repetitive occurrence; multilingual; keyword extraction
Abstract:
Adaptation is the degree in which a substring appears twice or more, when it appears once or more in a document. Adaptation of the keyword has been observed in English. Similarly, it is observed in Japanese and Chinese. We have observed that adaptation of a keyword tends to have no correlation with just like English. On the other hand, the estimated value varies in strings that are selected at random. We analyzed adaptation using newspaper article (Japanese and Chinese) of several years and technical abstracts. We have tried to extract keywords using the difference of this distribution. We show that adaptation contains the information with which keyword boundaries are obtained. We have developed a keyword extraction system. This system can extract the keyword of Japanese and Chinese by the same program. This paper also shows this algorithm in detail.
Classification:
Paper
Authors:
TAKEDA Kanji
Title:
The Context which Sono and Kono Refer to in the Sentence
Pages:
91-109
Descriptors:
demonstrative; sono;kono; reference to the context in the sentence; substituted reference; definite reference; designated reference
Abstract:
We investigated the demonstratives sono and kono which refer to the context in the sentence.
1. We classified the usage of sono and kono into the following three types.
  1. substituted reference: Only so/kono in the ``so/kono + noun A'' refers to the antecedent context. The noun A after so/kono is not in the antecedent context.
  2. definite reference: The ``so/kono + noun A'' refers to ``modifier + noun A'' in the antecedent context.
  3. designated reference: The ``so/kono + noun A'' refers to the noun B or the nominal B in the antecedent context. A and B is in the relation of ``so/kono A is B''.
2. From counting so/kono in linguistic materials, we found that sono tends to be used in (A), kono tends to be used in (C), sono and kono is used in (B).
3. The ``sono + noun A'' refers to the antecedent context which becomes a modifier of the noun A. But, the ``kono + noun A'' refers to the antecedent context which explains the noun A.
4. We measured that the distance from so/kono to the antecedent context, by counting the number of periods between them. And we found that there is a regular distance between sono and the antecedent context, but that there is a wide variety of distances (a long distance or a short distance) between kono and the antecedent context.
Classification:
Paper
Authors:
ITO Masamitsu (National Institute for Japanese Language)
Title:
The Judgement Standard for Distinguishing Loan Words of Western Origin from Western Words in Japanese Pop Songs
Pages:
110-130
Descriptors:
languages mixture rhetoric; Japanese-English mixed text; Matsutoya Yumi; Ji'iron(Grapho-Lexicology); rinji gairaigo (temporary loan words)
Abstract:
I propose the judgement standard for distinguishing loan words of Western origin from Western words in Japanese pop songs. There are many ``languages mixture rhetorics'' in Japanese pop songs. For instance, one sentence of a pop song consists of an English word or a loan word from English among two Japanese phrases, e.g., ``Yagate kuru New Year ga machidoosikute'' (I'm looking forward to the New Year which will come soon). Another sentence consists of a Japanese word and an English word or a loan word from English, e.g., ``Nyuu iyaa no nagai kiteki'' (A long train whistle of the New Year). Such rhetoric is very complicated, so it is very difficult to distinguish loan words of Western origin from Western words.
I propose some new interpretations of loan words, like ``rinji gairaigo'' (temporary loan words). And I also propose the judgement standard based on the Grapho-Lexicology, as shown below.
1. The loan word standard : Loan words of Western origin, which are written in Kana letters or Romaji letters, and which are of Western origin.
2. The Western word standard : Western words which are written in the alphabet, and which are of Western origin.

Vol. 23 No. 3 (December 13, 2001)

Classification:
Paper
Author:
MIZUTANI Sizuo (Institute of Behavioral Sciences)
Title:
Word Classification from the Point of Lattice-Theoretical View
Pages:
135-156
Descriptors:
equality of word properties; word class; morphology; binary vector; lattice; order relation; distance; macroscopic word-class
Abstract:
A word class is an equivalence class based on mainly morphological properties. In this paper 12 properties were chosen to mark some 2,800 words neither conjugative ones nor postpositional particles in modern Japanese. Each word is represented by each 12-dimensional (0, 1)-vector whose components are marked as 1 if the word have the considering property or 0 if not. By means of the equality of such vectors 160 equivalence classes were found as word classes. Furthermore, in discussion below, such word class as an equivalence class can be identified with the vector by which the class is characterized. The number of 160 might be said to be too large for human memory, but we must bear in mind the fact that real usages of words are so various; classes so refined are necessary for exact discussion. On the other hand, to sketch a full picture of this classification it was planed to make "macroscopic word-classes" as clusters neglecting some differences with their "core" word-classes. For this sake we take notice of the fact that the set W of our vectors is a lattice, so we can introduce an order relation ≦ between a and b in W where a ≦ b is defined as a ∩ b = b. Starting from a core word-class, say, the class a of typical noun, and linking other word classes x's which a ≦ x or x ≦ a holds, a cluster can be obtained from the point of view of such order relation. These clusters, for example, one starting from the typical noun class and another one starting from a typical adverb class, are not always separated completely. However, there exist usually parts where linkage is weak to regard as one cluster contained both. Such part(s) shall be slit (s) of clusters under consideration, at which we can cut and divide clusters. Thus twelve macroscopic word-classes were found except for conjunctions and interjections.
Classification:
Note
Authors:
WATANABE Motohiko (Graduate School of Science and Technology, Keio University), TAKAHASHI Jyunichirou(Graduate School of Science and Technology, Keio University), MIYNO Yohei (Ebara Corp.)
Title:
Statistical Analysis of the Difference between Original and Translated Novels
Pages:
157-169
Descriptors:
Principal Component Analysis; Cluster Analysis; Morphological Analysis; Original and Translated Novels
Abstract:
In the literary works, the statistical methods have been used to grasp the characteristics of sentences in order to distinguish the novels written by the same author or not. However, there are little monographs using statistical methods to find the differences between the original and translated novels. In this paper it will be pointed out that there are notable differences between original and translated novels by Principal Component Analysis and Cluster Analysis. First, four novels written by Haruki Murakami and three novels translated by him in almost the same periods were selected. Then the rates of appearance of parts of speech, the rates of appearance of them just before the "Toten" which is the separator between sentences in Japanese, and the rates of particles just before the "Toten" were calculated. According to the analysis by using the rate of appearance of parts of speech as variables the differences could be observed in the style of sentences between original and translated ones, but no differences could be observed by using another variables. In conclusion, how to use the parts of speech in sentences is dependent on the author, while how to use the "Toten" is dependent on the habit of the author and the translator.
Classification:
Miscellaneous
Title:
Proceedings of the 45th Annual Meeting
Pages:
170-179
Classification:
Miscellaneous
Title:
Descriptors and Abstracts
Pages:
182-183

Vol. 23 No. 4 (March 14, 2002)

Classification:
Report
Author:
MURATA Minori(Keio University, International Center)
Title:
Functional Words to Support the Logical Structure of a Text: Using Multivariate Analysis to Identify a Text's Genre
Pages:
185--206
Descriptors:
text's genre; multivariate analysis; canonical discriminant analysis; functional words; conjunctive words; particle-phrases; rate of appearance per sentence; logical structure of a text; Japanese for Specific Purposes
Abstract:
It is quite important for advanced students of Japanese-Language for Specific Purposes to understand the underlying logical structure of the text. Since the logical structure will enhance an ability to read and write technical papers. Such items as the conjunctive words (i.e. the words which function as a conjunction in a sentence: Setsuzoku-goku) and particle-phrases (i.e. the phrases which function as a particle in a sentence: Joshi-sootoo-ku in Fukugo-ji) can provide important clues for understanding the logical structure of the text. The ultimate goal of this study is to clarify the logical structures of the technical texts in Japanese by focusing on the functions of conjunctive words and particle-phrases. As a step toward achieving this objective, we chose 132 text samples of six genres. Those six genres are (i) an introductory economics textbook, (ii) papers of the Journal of the Physical Society of Japan, (iii) papers of science and technology, (iv) papers of Japanese literature , (v) editorial articles of 4 kinds of newspapers, and (vi) modern novels. We counted the rate of appearance (per sentence) of the 62 selected conjunctive words and particle-phrases of each sample. The analysis was conducted in the following two steps,
1. We first examined univariate distribution of the above 62 items and then applied the canonical discriminant analysis to 132 samples ((i) 16 samples (ii) 24 samples (iii) 14 samples (iv) 24 samples (v) 40 samples selected by random-sampling out of 222 (vi) 14 samples).
2. Secondly we applied the same method to 3 samples (i.e. (ii), (iii) and (iv)) which were belonged to the same genre, so-called "technical papers" in the first step.
According to the result obtained (a), these genres are classified with 14 conjunctive words and particle-phrases (out of 62) at a high apparent correct classification rate (80%). Following to the result obtained (b), the words which distinguished 3genres were clearly selected. These results indicate the existence of common conjunctive words and particle-phrases both in texts having an explicit logical structure.
Classification:
Note
Author:
TANOMURA Tadaharu (Osaka University of Foreign Studies)
Title:
On a Semantic Factor Determinant in the `-na'/`-no' Choice of Keiyoudousi Suffixes:`Yuumei-na' vs. `Mumei-na'
Pages:
207--213
Descriptors:
Keiyoudousi; Prenominal Suffix; `-na'; `-no'
Abstract:
The keiyoudousi stem `yuumei' (well-known) takes the suffix `-na' when it precedes nominal expressions, while its antonymous stem `mumei' (unknown) takes the suffix `-no'. In this paper, the author examines, using a large quantity of electronic newspaper texts as a corpus, the distribution of the suffixes `-na' and `-no' as they co-occur with stems of the forms `yuu-X' (with X) and `mu-X' (without X), and reveals that the suffixal choice depends to a large extent on whether the stem expresses a gradable property or not.
Classification:
Book Review
Author:
KAGEURA Kyo (National Institute of Informatics)
Title:
``Word Frequency Distribution'' by Harald Baayen
Pages:
214--219

Vol. 23 No. 5 (June 24, 2002)

Classification:
Paper
Author:
JIN Mingzhe (Sapporo Gakuin University)
Title:
Authorship Attribution Based on N-gram Models in Postpositional Particle of Japanese
Pages:
225--240
Descriptors:
authorship attribution; short-texts; -gram models; non-eminent writers; postpositional particle of Japanese
Abstract:
Within stylometrics, the disciplines of attribution and descriptive stylistics hitherto have been studied on works of eminent writers. This paper reports experiments in authorship attribution on short-texts written by non-eminent writers. In this paper, we present an approach to identification of the authorship task that is based on -gram models in postpositional particle of Japanese. The experiments have used 60 diaries written by 6 non-eminent writers and 110 compositions written by 11 students. The results of the experiments show that -gram models in postposition particle of Japanese are very effective in authorship attribution even on short-texts.
Classification:
Paper
Author:
TAKADA Tomokazu (Graduate School of Letters, Hokkaido University)
Title:
Machine-readable Dictionary and Position Exchanged Kanji Variants
Pages:
241--254
Descriptors:
Machine-readable dictionary; JIS X 0208; position-exchanged kanji variants
Abstract:
As one may know, the morphological-analysis system decomposes sentences into morphologies by means of using a machine-readable dictionary. Therefore, the equipment of such a machine-readable dictionary becomes necessary and essential for whole decomposing process. However, due to the fact that the modern Japanese is a mix of kana and kanji, and certain kanji have some variants, problems may occur in distinguishing these kanji variants, especially for those position-exchanged kanji variants appeared in JIS X 0208. In this paper, I take up 22 pairs of so-called "NEJIRENOKANJI" which code positions are exchanged between JIS Level 1 kanji and JIS Level 2 kanji and aim to show how these 22 pairs of "NEJIRENOKANJI" are used in a machine-readable dictionary.

Vol. 23 No. 6 (September 19, 2002)

Classification:
Report
Authors:
HISANO Masaki (The University of Electro-Communications), YOKOYAMA Shoichi (The National Institute for Japanese Language), NOZAKI Hironari (Aichi University of Education)
Title:
Differences of Character Usage between Two Major Japanese Newspapers, Mainichi Shimbun and Asahi Shimbun
Pages:
277--295
Descriptors:
newspaper; character usage; occurrence rate; frequency; Mainichi Shimbun Newspaper; Asahi Shimbun Newspaper; electronic corpus
Abstract:
Differences of character occurrence rates between two national newspapers of Japan, Mainichi Shimbun and Asahi Shimbun were investigated. The total number of tokens which was examined was about 640,000,000, and the number of types was about 6,000. For each character type, signed rank tests were applied to the corresponding 24 monthly occurrence rates and 8 annual occurrence rates of the two newspapers. Significant differences between the two newspapers were observed in about 1,500 types. There were many Katakanas, alphabets, Kanjis of JIS level-1, and numeric Kanjis which were used more in Mainichi Shimbun. There were many Hiraganas and Arabic numerals which appeared more in Asahi Shimbun. The wide existence of differences of character usage between even the general two newspapers makes us be careful to calculate or use frequencies of characters or words.
Classification:
Report
Author:
ITO Masamitsu (The National Institute for Japanese Language)
Title:
A 2-gram List for Semiautomatic Experimentation of Japanese Pop Song Writing
Pages:
296--329
Descriptors:
random generation; second-order word approximation; a 2-gram list; 2-gram automatic pastiche; Japanese popular songs; Matsutoya Yumi
Abstract:
This paper introduces a 2-gram list for semiautomatic experimentation with Japanese popular songs writing. The songs are derived from a randomgeneration of second-order word approximation based on C.E. Shannon's information theory. The words in the list come from 70 Japanese popular songs written by Matsutoya Yumi. Using this list, it is passible to make Japanese popular songs semiautomatically.

Vol. 23 No. 7 (December 16, 2002)

Classification:
Paper
Author:
JUNG Hyeseon (Graduate School, Osaka Prefecture University)
Title:
Exploring the Actual Use of Personal Pronouns in Japanese and Korean; Difference of the Frequencies and the Usage Shown by Questionnaires
Pages:
333--346
Descriptors:
personal pronouns; benefactive expression; introductory sentence; cohortative sentence; conjugation; adnominal modifier; plural of association
Abstract:
This paper tries to verify that native speakers of the Korean language use personal pronouns more frequently than native speakers of Japanese. The questionnaire on personal pronouns revealed the six tendencies among Korean native speakers, compared with Japanese counterparts: (1) To express the first person pronouns in the beginning of introductory sentences, (2) To use the plural first person pronoun in the beginning of cohortative sentences, (3) To use the plural first person pronoun as adnominal modifier, (4) To use benefactive expression less frequently than Japanese speakers, who tend to avoid personal pronouns in benefactive sentences, (5) To produce cohortative sentences using the plural first person pronoun at the beginning of the sentences and conjugations of a/oyo forms, which bring four meanings: statement, question, command, or invitation, at the sentence end, (6) To use the plural first person pronoun as adnominal modifier when the singular first pronoun should be used or when the pronoun does not refer to the speaker's family or a group the speaker belongs to. The findings (1)-(3) describe the fact that Korean speakers use personal pronouns more frequently than Japanese while the findings (4)-(6) explain why the frequencies are different between Japanese and Korean.
Classification:
Note
Author:
UCHIYAMA,Kazuya (Hiroshima-University)
Title:
A Note on Stylometric Studies
Pages:
347--352
Descriptors:
writing style; zipping; authorship attribution; stylistics
Classification:
Book Review
Author:
KAGEURA Kyo (National Institute of Informatics)
Title:
"Keiryo Gengogaku Nyumon" by ITO Masamitsu
Pages:
353--355
Classification:
Miscellaneous
Title:
Proceedings of the 46th Annual Meeting
Pages:
356--364
Classification:
Miscellaneous
Title:
Descriptors and Abstracts
Pages:
367--368

Vol. 23 No. 8 (March 13, 2003)

Classification:
Paper
Author:
JIN Mingzhe (Sapporo Gakuin University)
Title:
Authorship Attribution and Feature Analysis Using Frequency of JOSHI with SOM
Pages:
369--386
Descriptors:
authorship attribution; feature of a writer; frequency of JOSHI; multivariate analysis; Self-Organizing Map
Abstract:
Several statistical methods of clustering have been widely applied to computational stylistics, or authorship attribution: principal components analysis, factor analysis, multi-dimensional scaling, correspondence analysis, hierarchical clustering etc. Recently, neural networks have been applied to authorship determination.　However, their application has only concentrated on the type of using training set. Self-Organizing Map (SOM) is one of neural networks technique which dose not need to use training set for clustering of multi-dimensional data. This paper proposes using SOM for authorship attribution, and compares traditional multivariate analysis methods with SOM. Comparisons are made with frequency of JOSHI (postpositional particle of Japanese) from eighty texts written in Japanese by four authors. This study demonstrates that SOM is a powerful technique for computational stylistics, and the frequency of JOSHI is a significant feature of a writer in Japanese.
Classification:
Paper
Authors:
KOIKE Yasushi (Tsukuba University)
Title:
Change of Figurative Adverbs in Modern Japanese
Pages:
387--406
Descriptors:
figuration; MARUDE; figurative adverb; figurative modal form; concord with modality
Abstract:
In this paper, we examine changes in the usage of ``figurative adverbs'' ---marude, atakamo, sanagara---, and the change in the relations between each adverb and its co-occurrent forms. About the change in the usage of adverbs, it showed the tendency to use sanagara extensively in narrative sentences and the tendency to use marude in conversational sentences in the Meiji era. However, marude became used extensively in both of narrative sentences and conversational one after Taisho era. Atakamo was used abundantly in Meiji era, but it was non-figurative meaning. About the change in the co-occurrent form of marude, we found out there is a correlation between marude and ``figurative modal forms'' such as youda and mitaida. So we proposed calling this correlation ``concord with modality.''
Classification:
Miscellaneous
Title:
Index of Volume23
Pages:
408--411

Back to Homepage