parser.sandhi_analyzer¶
Usage¶
Use the LexicalSandhiAnalyzer
to split a sentence (wrapped in a
SanskritObject
) and retrieve the top 10 splits:
>>> from __future__ import print_function
>>> from sanskrit_parser.parser.sandhi_analyzer import LexicalSandhiAnalyzer
>>> from sanskrit_parser.base.sanskrit_base import SanskritObject, SLP1
>>> sentence = SanskritObject("astyuttarasyAMdishidevatAtmA")
>>> analyzer = LexicalSandhiAnalyzer()
>>> splits = analyzer.getSandhiSplits(sentence).findAllPaths(10)
>>> for split in splits:
... print(split)
...
[u'asti', u'uttarasyAm', u'diSi', u'devatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devata', u'AtmA']
[u'asti', u'uttara', u'syAm', u'diSi', u'devatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devatA', u'at', u'mA']
[u'asti', u'uttarasyAm', u'diSi', u'de', u'vatA', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'devata', u'at', u'mA']
[u'asti', u'uttas', u'rasyAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttara', u'syAm', u'diSi', u'devat', u'AtmA']
[u'asti', u'uttarasyAm', u'diSi', u'de', u'avatA', u'AtmA']
The sandhi_analyzer can also be used to look up the tags for a given word form: (Note that the database stores words ending in visarga with an ‘s’ at the end)
>>> word = SanskritObject('hares')
>>> tags = analyzer.getMorphologicalTags(word)
>>> for tag in tags:
... print(tag)
...
('hf#1', set(['cj', 'snd', 'prim', 'para', 'md', 'sys', 'prs', 'v', 'np', 'sg', 'op']))
('hari#1', set(['na', 'mas', 'sg', 'gen']))
('hari#1', set(['na', 'mas', 'abl', 'sg']))
('hari#1', set(['na', 'fem', 'sg', 'gen']))
('hari#1', set(['na', 'fem', 'abl', 'sg']))
('hari#2', set(['na', 'mas', 'sg', 'gen']))
('hari#2', set(['na', 'mas', 'abl', 'sg']))
('hari#2', set(['na', 'fem', 'sg', 'gen']))
('hari#2', set(['na', 'fem', 'abl', 'sg']))
-
class
sanskrit_parser.parser.sandhi_analyzer.
LexicalSandhiAnalyzer
(lexical_lookup='combined')[source]¶ Bases:
object
Singleton class to hold methods for Sanskrit lexical sandhi analysis.
We define lexical sandhi analysis to be the process of taking an input sequence and transforming it to a collection (represented by a DAG) of potential sandhi splits of the sequence. Each member of a split is guaranteed to be a valid lexical form.
-
getMorphologicalTags
(obj, tmap=True)[source]¶ Get Morphological tags for a word
- Params:
obj(SanskritString): word tmap(Boolean=True): If True, maps
tags to our format
- Returns
list: List of (base, tagset) pairs
-
getSandhiSplits
(o, tag=False, pre_segmented=False)[source]¶ Get all valid Sandhi splits for a string
- Params:
o(SanskritString): Input object tag(Boolean) : When True (def=False), return a
morphologically tagged graph
- Returns:
SandhiGraph : DAG all possible splits
-
hasTag
(obj, name, tagset)[source]¶ Check if word matches morhphological tags
- Params:
obj(SanskritString): word name(str): name in tag tagset(set): set of tag elements
- Returns
- list: List of (base, tagset) pairs for obj that
match (name,tagset), or None
-
preSegmented
(sl, tag=False)[source]¶ Get a SandhiGraph for a pre-segmented sentence
- Params:
sl (list of SanskritString): Input object tag(Boolean) : When True (def=False), return a
morphologically tagged graph
- Returns:
SandhiGraph : DAG all possible splits
-
sandhi
= <sanskrit_parser.parser.sandhi.Sandhi object>¶
-