Last Updated: Sep 11th 2017
With the advent of cloud machine learning systems, it is proposed to use these technologies to assist and speed up the work associated with the main goals.
- Optical Character Recognition (OCR) of any Konkani dialect script (Kannada, Roman, Marathi) text into its Unicode text form.
- Translate any Konkani dialect in Unicode Form into English and vice-versa. (requires grammar analysis)
- Create a audio reader for text in any dialect.
- Voice recognition (realtime or audio file) of Konkani dialects into English.
- Audio translation for videos to Unicode Form.
- Subtitle generation for videos.
- Use these systems for people to learn Konkani language and its dialects.
Since we have larger goals, proposed technical choices:
- PRINCE2 methodology for managing projects.
- github.com for technology projects management
- bitbucket.org for main project documentation
- tesseract-ocr for optical character recognition
- kaldi-asr for automatic speech recognition
Last Updated: July 13th, 2017
Proposed: Data collection & display via Excel and R, R Studio based tools.
Last Updated: Dec 13rd, 2016
Master document.
Master plan: Online web database to search information.
Sub-cultures/communities:
I) SubCulture1:
Aspects of culture1:
birth ~ traditional children’s songs, naming youth ~ communion roce ~ hymns, rituals wedding ~ marriage community festivals ~ harvest, Mother Mary, Christmas, Kuswar. death ~ procedure, hymns, songs
Aspects of culture2:
- Sayings/Proverbs
- Nicknames of Formal Names:
- Picture Dictionary
- Comprehensive Grammar
- Audio/Video course of language and grammar.
- Genres of music and musicians
- Genres of plays, songs
- Lullabies, children songs,
- Traditional and old recipes, dishes for festivals, preparation and photographs.
II) SubCulture2:
similar to SubCulture1.