kyoto_reader.document module

class kyoto_reader.document.Document(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool)[source]

Bases: object

A class to represent a document of KWDLC, KyotoCorpus, or AnnotatedFKCCorpus.

Parameters:
  • knp_string (str) – KNP format string of the document.
  • doc_id (str) – A document ID.
  • cases (Collection[str]) – Cases to extract.
  • corefs (Collection[str]) – Coreference relations to extract.
  • relax_cases (bool) – Whether to consider relations with “≒” as those without “≒” (e.g. ガ≒格 -> ガ格).
  • extract_nes (bool) – Whether to extract named entities.
  • use_pas_tag (bool) – Whether to read predicate-argument structures from <述語項構造: > tags, not <rel> tags.
knp_string

KNP format string of the document.

Type:str
doc_id

A document ID.

Type:str
cases

Cases to extract.

Type:Collection[str]
corefs

Coreference relations to extract.

Type:Collection[str]
extract_nes

Whether to extract named entities.

Type:bool
sid2sentence

A mapping from a sentence ID to the corresponding sentence.

Type:Dict[str, Sentence]
mentions

A mapping from a document-wide tag ID to the corresponding mention.

Type:Dict[int, Mention]
entities

A mapping from a entity ID to the corresponding entity.

Type:Dict[int, Entity]
named_entities

Extracted named entities.

Type:List[NamedEntity]
__init__(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool) → None[source]

Initialize self. See help(type(self)) for accurate signature.

bnst_list() → List[pyknp.knp.bunsetsu.Bunsetsu][source]

Return list of Bunsetsu object in pyknp.

bp_list() → List[kyoto_reader.base_phrase.BasePhrase][source]

Return list of base phrases.

draw_tree(sid: Optional[str] = None, coreference: bool = True, fh: Optional[TextIO] = None) → None[source]

Write out the PAS and coreference relations in the specified sentence in a tree format.

If sid is not specified, write out trees in all the sentences in this document.

Parameters:
  • sid (str, optional) – A sentence ID of the target sentence.
  • coreference (bool) – If True, write out coreference relations as well.
  • fh (TextIO, optional) – The output stream.
get_arguments(predicate: kyoto_reader.base_phrase.BasePhrase, relax: bool = False, include_optional: bool = False) → Dict[str, List[kyoto_reader.pas.BaseArgument]][source]

Return all the arguments that the given predicate has.

Parameters:
  • predicate (Predicate) – A predicate.
  • relax (bool) – If True, return arguments that have a coreference relation with the arguments the predicate has.
  • include_optional (bool) – If True, return adverbial arguments such as “すぐに” as well.
Returns:

A mapping from a case to arguments.

Return type:

Dict[str, List[BaseArgument]]

get_entities(bp: kyoto_reader.base_phrase.BasePhrase, include_uncertain: bool = False) → List[kyoto_reader.coreference.Entity][source]

Return list of entities that the specified mention refers to. The mention is given as a type of BasePhrase.

Parameters:
  • bp (BasePhrase) – A base phrase corresponds to the mention.
  • include_uncertain (bool) – Whether to return entities that has uncertain relation with the mention.
get_predicates() → List[kyoto_reader.base_phrase.BasePhrase][source]

Return list of predicates.

get_siblings(mention: kyoto_reader.coreference.Mention, relax: bool = False) → Set[kyoto_reader.coreference.Mention][source]

Return all the mentions that have coreference chains with the specified mention.

Parameters:
  • mention (Mention) – A mention.
  • relax (bool) – If True, return coreferent mentions as well.
Returns:

A set of mentions.

Return type:

Set[Mention]

mrph2dmid

A mapping from morpheme to its document-wide ID.

mrph_list() → List[pyknp.juman.morpheme.Morpheme][source]

Return list of Morpheme object in pyknp.

pas_list() → List[kyoto_reader.pas.Pas][source]

Return list of predicate-argument structures.

sentences

List of sentences in this document.

Returns:List[Sentence]
stat() → dict[source]

Calculate various kinds of statistics of this document.

surf

A surface expression of this document.

tag_list() → List[pyknp.knp.tag.Tag][source]

Return list of Tag object in pyknp.