Order from us for quality, customized work in due time of your choice.
Abstract
A fully automated system used to generate Multilingual Language from Monolingual Language .Language communicator is a computerized system that is design to translate source text from various natural language into target text of another fo- reign languages. In this paper, we present the language generation of Marathi-English pair, where the source language is Marathi and the targeted language would be English. Due to increased need of global communication, multilingual Machine Translation is the new need of this generation. In general, languages are full of ambiguity so to make a semantic representation it must be explicit and free of all the ambiguities. Hence, on the language Communicator tool USR plays an important role. It is the universal semantic representation of the sentence with layers of information which include Paninian Karaka and non- karaka relation between the words in a sentence. One issue addresses this tool is about the dialerts that is spoken in a different part of a region/country. To overcome the communication barriers that are mainly due to the diversity in the regional languages.
Introduction
With the advancement in technology, we have computerized systems that can replace the human experts in particular domains. Natural language processing (NLP) is the application of the artificial intelligence that is being studied from decades back to overcome the language communication barriers that are mainly there due to the diversity in the regional languages. And now it is heading towards developing systems that are capable of handling more than two language pairs i.e. multilingual. Monolingual language means a language with only one specified language and Multilingual language refers to more than one languages.
This project deals with developing an authoring tool for writing in ones own language which can be converted into another targeting languages through the mediation of the control languages which is very close to the source language. Many machine translation systems are already available which does not give the proper word to word alignment of the sentence. Translation is not just finding and using a word in target language for the word in source language. It should maintain the context and meaning while translating and one of the pros in this tool the input is USR (Universal semantic representation) with layers of information of a particular sentence and hence the proper word to word alignment of a sentence is achieved with proper semantic disambiguation & user validation. The uniqueness of this tool is human intervention module where the knowledge representation will be user verified and all the rules and facts are written in clips language. CLIPS mean C language integrated productive system and an expert level language mainly used for developing embedded systems. It operates by maintaining a list of facts and a set of rules which operate on them.
Due to the global commercial contacts now there is a need to have a system which gives the multilingual generation which will helpful for the peoples to communicate with each other without understanding of their native language and overcome the language barrier. As well as it will be a part of primary school curriculum which will helpful for the students studying in the school to enhance their knowledge of a language grammar for a specific subject.
Objective
The main aim of the project is to generate multilingual language from monolingual language using communicator tool. Our main objective is to generate Marathi to English to overcome the language barrier.
Literature survey
In paper [1] we have studied Disambiguating Tense, Aspect and Modality Markers for Correcting Machine Translation Errors research paper which was written by, Anil Kumar Singh, Samar Husain, Harshit Surana, Jagadeesh Gorla, Dipti Misra Sharma in the Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP). Borovets, Bulgaria. 2007 along with some researchers. This paper discuss about Tense, aspect and modality which are important elements of natural languages.
All languages mark tense, aspect and modality (TAM) in some way, but the markers dont have a one-to-one mapping across languages. Many errors in machine translation (MT) are due to wrong translation of TAM markers. Reducing them can improve the performance of an MT system.
In paper [2] we have studied Morphological Analyser for Hindi A Rule Based Implementation research paper which was written by Ankita Agarwal, Pramila, Shashi Pal Singh, Ajai Kumar Article in International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014 along with some researchers. In this paper focus on the design of a morphological analyzer for Hindi language. The analyzer takes a Hindi sentence or a word as an input and analyzes it to generate its necessary features with its root words. The features will have categories: part of speech, gender, number, and person. The tool works on both inflectional and derivational morphemes. This works on rule based approach.
Proposed system
MONOLINGUAL AND MULTILINGUAL LANGUAGE
Monolingual language refers to only one language and Multilingual language refers to more than one language. Monolingual speakers are those who speak only one language and multilingual speakers are those who speak more than one language. In the Communicator tool, the input is monolingual language and the translation can be seen in multilingual languages.
LANGUAGE COMMUNICATOR TOOL
Language Communicator is an authoring tool for machine translation which allows the user to convey the information in an the unambiguous way that it can be transferred faithfully to different target languages. Unambiguous and explicit semantic representations help us to use transformation grammar across languages. The uniqueness of this tool is the human intervention module where the knowledge representation will be user verified. The information can as well be modified by the user as an when required. Language Communicator tool is a multilingual tool in which the natural languages are getting converted into other foreign languages. Language communicator allows users to see the translation in multiple languages. The languages that are currently supported by communicator tool is English, Japanese, German.
UNIVERSAL SEMANTIC REPRESENTATION
The input that Language Communicator takes is semantically disambiguated and user verified. It is the Universal Semantic Representation (USR) of the sentence with layers of information. USR must be explicit and free of all the linguistics ambiguities. While the machine can go up to some point, the speaker/user is at the best place to notice them and rectify them as well. USR contains 9 rows. The 1st row in user csv represents a grouping of local word grouping (LWG) the modifiers of a noun such as an adjective, determiners. The 2nd row represents a concept dictionary that contains the concepts and the respective language-specific representations for those concepts. 3rd row represents the indexing of the words.4th row represents Ontological information about the nouns. 5th row represents the Gender, Number, Person. 6th row represents the intra chunk relation that gives the dependency relation between the tokens of a chunk. 7th row represents the inter chunk relation that gives the dependency relation between the heads of the sentences. 8th row represents discourse relation which gives the adverb relation.9th row represents Types of a sentence are described in this row like assertive, question, imperative.
CONCEPT DICTIONARY
The concept dictionary represents the concepts and the respective language-specific representations for those concepts. Each concept can have one or more than one sense which is disambiguated by the concept ids in the respective language columns. The concept dictionary as of now has Hindi, Eng, German, Japanese, Tamil, and Marathi. The first column is the Label which represents the concept and it is unique. Hence, the concept id in the label can not be repeated. The aim behind creating a separate column as ‘Label’ is to capture the universal semantics of a particular concept without making it language-dependent.The concepts can be represented by the language-dependent lexemes. But as the concepts are unique, the label should also be unique and mnemonic.
Output
Language Communicator is an authoring tool for machine translation for writing in one’s language which can be converted into another targeting language through the mediation of the control languages which is very close to the source language. The input that Language Communicator takes is semantically disambiguated and the user verified is in the Marathi language. It is the Universal Semantic Representation (USR) of the sentence with layers of information. This includes panini karaka and non-karaka relations between the words in a sentence, ontological information about the content words such as proper noun, pronoun, mass noun and definiteness of a noun. Other information such as discourse relations, morphological information, and sentence type is also part of the representation all this information are get by the Morph analyser. It uses the concept dictionary which has unique Ids for all the content words. This representation is an extended form of controlled language. This module works as a bridge between USR and MRS representations. The MRS (Minimal Recursion Semantics) mapper or the transformations engine takes it as the input and maps the Paninian and non-Paninian relations and all other ontological information to that of MRS representation. Using the MRS representation the ACE generator provides the final output for the Targeting English Language.
Conclusion
This paper introduces a language generating tool which is based on Natural Language Processing. The tool can be programmed to generate more than one languages with the help of algorithm and programming using Clips and Python. The languages that are currently generated are English, German, Japanese from Hindi user_csv as an input and English language from Marathi user_csv as an input. The supervised method for making resources using Morph was suited best for the data present.
References
- Anil Kumar Singh, Samar Husain, Harshit Surana, Jagadeesh Gorla, Dipti Misra Sharma, Disambiguating Tense, Aspect and Modality Markers for Correcting Machine Translation Errors, in the Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP). Borovets, Bulgaria. 2007.
- Ankita Agarwal, Pramila, Shashi Pal Singh, Ajai Kumar, Morphological Analyser for Hindi A Rule Based Implementation Article in International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014.
Order from us for quality, customized work in due time of your choice.