kyoto_reader.document module¶
-
class
kyoto_reader.document.
Document
(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool)[source]¶ Bases:
object
A class to represent a document of KWDLC, KyotoCorpus, or AnnotatedFKCCorpus.
Parameters: - knp_string (str) – KNP format string of the document.
- doc_id (str) – A document ID.
- cases (Collection[str]) – Cases to extract.
- corefs (Collection[str]) – Coreference relations to extract.
- relax_cases (bool) – Whether to consider relations with “≒” as those without “≒” (e.g. ガ≒格 -> ガ格).
- extract_nes (bool) – Whether to extract named entities.
- use_pas_tag (bool) – Whether to read predicate-argument structures from <述語項構造: > tags, not <rel> tags.
-
knp_string
¶ KNP format string of the document.
Type: str
-
doc_id
¶ A document ID.
Type: str
-
cases
¶ Cases to extract.
Type: Collection[str]
-
corefs
¶ Coreference relations to extract.
Type: Collection[str]
-
extract_nes
¶ Whether to extract named entities.
Type: bool
-
mentions
¶ A mapping from a document-wide tag ID to the corresponding mention.
Type: Dict[int, Mention]
-
named_entities
¶ Extracted named entities.
Type: List[NamedEntity]
-
__init__
(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool) → None[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
draw_tree
(sid: Optional[str] = None, coreference: bool = True, fh: Optional[TextIO] = None) → None[source]¶ Write out the PAS and coreference relations in the specified sentence in a tree format.
If sid is not specified, write out trees in all the sentences in this document.
Parameters: - sid (str, optional) – A sentence ID of the target sentence.
- coreference (bool) – If True, write out coreference relations as well.
- fh (TextIO, optional) – The output stream.
-
get_arguments
(predicate: kyoto_reader.base_phrase.BasePhrase, relax: bool = False, include_optional: bool = False) → Dict[str, List[kyoto_reader.pas.BaseArgument]][source]¶ Return all the arguments that the given predicate has.
Parameters: - predicate (Predicate) – A predicate.
- relax (bool) – If True, return arguments that have a coreference relation with the arguments the predicate has.
- include_optional (bool) – If True, return adverbial arguments such as “すぐに” as well.
Returns: A mapping from a case to arguments.
Return type: Dict[str, List[BaseArgument]]
-
get_entities
(bp: kyoto_reader.base_phrase.BasePhrase, include_uncertain: bool = False) → List[kyoto_reader.coreference.Entity][source]¶ Return list of entities that the specified mention refers to. The mention is given as a type of BasePhrase.
Parameters: - bp (BasePhrase) – A base phrase corresponds to the mention.
- include_uncertain (bool) – Whether to return entities that has uncertain relation with the mention.
-
get_siblings
(mention: kyoto_reader.coreference.Mention, relax: bool = False) → Set[kyoto_reader.coreference.Mention][source]¶ Return all the mentions that have coreference chains with the specified mention.
Parameters: - mention (Mention) – A mention.
- relax (bool) – If True, return coreferent mentions as well.
Returns: A set of mentions.
Return type: Set[Mention]
-
mrph2dmid
¶ A mapping from morpheme to its document-wide ID.
-
sentences
¶ List of sentences in this document.
Returns: List[Sentence]
-
surf
¶ A surface expression of this document.