parser.sandhi_analyzer

Intro

Sandhi Analyzer for Sanskrit words

@author: Karthik Madathil (github: @kmadathil)

Usage

Use the LexicalSandhiAnalyzer to split a sentence (wrapped in a SanskritObject) and retrieve the top 10 splits:

>>> from __future__ import print_function
>>> from sanskrit_parser.parser.sandhi_analyzer import LexicalSandhiAnalyzer
>>> from sanskrit_parser.base.sanskrit_base import SanskritObject, SLP1
>>> sentence = SanskritObject("astyuttarasyAMdishidevatAtmA")
>>> analyzer = LexicalSandhiAnalyzer()
>>> splits = analyzer.getSandhiSplits(sentence).findAllPaths(10)
>>> for split in splits:
...    print(split)
...
[u'asti', u'uttarasyAm', u'diSi', u'devatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devata', u'AtmA']
[u'asti', u'uttara', u'syAm', u'diSi', u'devatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devatA', u'at', u'mA']
[u'asti', u'uttarasyAm', u'diSi', u'de', u'vatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devata', u'at', u'mA']
[u'asti', u'uttas', u'rasyAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttara', u'syAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'de', u'avatA', u'AtmA']

The sandhi_analyzer can also be used to look up the tags for a given word form: (Note that the database stores words ending in visarga with an ‘s’ at the end)

>>> word = SanskritObject('hares')
>>> tags = analyzer.getMorphologicalTags(word)
>>> for tag in tags:
...    print(tag)
...
('hf#1', set(['cj', 'snd', 'prim', 'para', 'md', 'sys', 'prs', 'v', 'np', 'sg', 'op']))
('hari#1', set(['na', 'mas', 'sg', 'gen']))
('hari#1', set(['na', 'mas', 'abl', 'sg']))
('hari#1', set(['na', 'fem', 'sg', 'gen']))
('hari#1', set(['na', 'fem', 'abl', 'sg']))
('hari#2', set(['na', 'mas', 'sg', 'gen']))
('hari#2', set(['na', 'mas', 'abl', 'sg']))
('hari#2', set(['na', 'fem', 'sg', 'gen']))
('hari#2', set(['na', 'fem', 'abl', 'sg']))
class sanskrit_parser.parser.sandhi_analyzer.LexicalSandhiAnalyzer(lexical_lookup='combined')[source]

Bases: object

Singleton class to hold methods for Sanskrit lexical sandhi analysis.

We define lexical sandhi analysis to be the process of taking an input sequence and transforming it to a collection (represented by a DAG) of potential sandhi splits of the sequence. Each member of a split is guaranteed to be a valid lexical form.

getMorphologicalTags(obj, tmap=True)[source]

Get Morphological tags for a word

Params:

obj(SanskritString): word tmap(Boolean=True): If True, maps

tags to our format

Returns

list: List of (base, tagset) pairs

getSandhiSplits(o, tag=False, pre_segmented=False)[source]

Get all valid Sandhi splits for a string

Params:

o(SanskritString): Input object tag(Boolean) : When True (def=False), return a

morphologically tagged graph

Returns:

SandhiGraph : DAG all possible splits

hasTag(obj, name, tagset)[source]

Check if word matches morhphological tags

Params:

obj(SanskritString): word name(str): name in tag tagset(set): set of tag elements

Returns
list: List of (base, tagset) pairs for obj that

match (name,tagset), or None

preSegmented(sl, tag=False)[source]

Get a SandhiGraph for a pre-segmented sentence

Params:

sl (list of SanskritString): Input object tag(Boolean) : When True (def=False), return a

morphologically tagged graph

Returns:

SandhiGraph : DAG all possible splits

sandhi = <sanskrit_parser.parser.sandhi.Sandhi object>
tagSandhiGraph(g)[source]

Tag a Sandhi Graph with morphological tags for each node

Params:

g (SandhiGraph) : input lexical sandhi graph

sanskrit_parser.parser.sandhi_analyzer.getArgs(argv=None)[source]

Argparse routine. Returns args variable

sanskrit_parser.parser.sandhi_analyzer.main(argv=None)[source]

Submodules

Indices and tables