kyoto_reader.document module¶

class kyoto_reader.document.Document(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool)[source]¶

Bases: object

A class to represent a document of KWDLC, KyotoCorpus, or AnnotatedFKCCorpus.

Parameters:

knp_string (str) – KNP format string of the document.
doc_id (str) – A document ID.
cases (Collection[str]) – Cases to extract.
corefs (Collection[str]) – Coreference relations to extract.
relax_cases (bool) – Whether to consider relations with “≒” as those without “≒” (e.g. ガ≒格 -> ガ格).
extract_nes (bool) – Whether to extract named entities.
use_pas_tag (bool) – Whether to read predicate-argument structures from <述語項構造: > tags, not <rel> tags.

knp_string¶

KNP format string of the document.

Type:	str

doc_id¶

A document ID.

Type:	str

cases¶

Cases to extract.

Type:	Collection[str]

corefs¶

Coreference relations to extract.

Type:	Collection[str]

extract_nes¶

Whether to extract named entities.

Type:	bool

sid2sentence¶

A mapping from a sentence ID to the corresponding sentence.

Type:	Dict[str, Sentence]

mentions¶

A mapping from a document-wide tag ID to the corresponding mention.

Type:	Dict[int, Mention]

entities¶

A mapping from a entity ID to the corresponding entity.

Type:	Dict[int, Entity]

named_entities¶

Extracted named entities.

Type:	List[NamedEntity]

__init__(knp_string: str, doc_id: str, cases: Collection[str], corefs: Collection[str], relax_cases: bool, extract_nes: bool, use_pas_tag: bool) → None[source]¶: Initialize self. See help(type(self)) for accurate signature.

bnst_list() → List[pyknp.knp.bunsetsu.Bunsetsu][source]¶: Return list of Bunsetsu object in pyknp.

bp_list() → List[kyoto_reader.base_phrase.BasePhrase][source]¶: Return list of base phrases.

draw_tree(sid: Optional[str] = None, coreference: bool = True, fh: Optional[TextIO] = None) → None[source]¶

Write out the PAS and coreference relations in the specified sentence in a tree format.

If sid is not specified, write out trees in all the sentences in this document.

Parameters:	sid (str, optional) – A sentence ID of the target sentence. coreference (bool) – If True, write out coreference relations as well. fh (TextIO, optional) – The output stream.

get_arguments(predicate: kyoto_reader.base_phrase.BasePhrase, relax: bool = False, include_optional: bool = False) → Dict[str, List[kyoto_reader.pas.BaseArgument]][source]¶

Return all the arguments that the given predicate has.

Parameters:	predicate (Predicate) – A predicate. relax (bool) – If True, return arguments that have a coreference relation with the arguments the predicate has. include_optional (bool) – If True, return adverbial arguments such as “すぐに” as well.
Returns:	A mapping from a case to arguments.
Return type:	Dict[str, List[BaseArgument]]

get_entities(bp: kyoto_reader.base_phrase.BasePhrase, include_uncertain: bool = False) → List[kyoto_reader.coreference.Entity][source]¶

Return list of entities that the specified mention refers to. The mention is given as a type of BasePhrase.

Parameters:	bp (BasePhrase) – A base phrase corresponds to the mention. include_uncertain (bool) – Whether to return entities that has uncertain relation with the mention.

get_predicates() → List[kyoto_reader.base_phrase.BasePhrase][source]¶: Return list of predicates.

get_siblings(mention: kyoto_reader.coreference.Mention, relax: bool = False) → Set[kyoto_reader.coreference.Mention][source]¶

Return all the mentions that have coreference chains with the specified mention.

Parameters:	mention (Mention) – A mention. relax (bool) – If True, return coreferent mentions as well.
Returns:	A set of mentions.
Return type:	Set[Mention]

mrph2dmid¶: A mapping from morpheme to its document-wide ID.

mrph_list() → List[pyknp.juman.morpheme.Morpheme][source]¶: Return list of Morpheme object in pyknp.

pas_list() → List[kyoto_reader.pas.Pas][source]¶: Return list of predicate-argument structures.

sentences¶

List of sentences in this document.

Returns:	List[Sentence]

stat() → dict[source]¶: Calculate various kinds of statistics of this document.

surf¶: A surface expression of this document.

tag_list() → List[pyknp.knp.tag.Tag][source]¶: Return list of Tag object in pyknp.