Skip to content
/ pke Public
forked from boudinfl/pke

Python Keyphrase Extraction module

License

Notifications You must be signed in to change notification settings

Jerrisk/pke

This branch is 70 commits behind boudinfl/pke:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f651015 · Oct 28, 2021
Nov 11, 2018
Sep 24, 2020
Oct 28, 2021
Sep 24, 2020
Apr 16, 2019
Mar 16, 2018
Jun 10, 2021
Nov 13, 2015
Oct 6, 2016
Sep 27, 2021
Nov 23, 2020
Nov 23, 2020

Repository files navigation

pke - python keyphrase extraction

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction models, and ships with supervised models trained on the SemEval-2010 dataset.

Build Status

Table of Contents

Installation

To pip install pke from github:

pip install git+https://github.com/boudinfl/pke.git

pke also requires external resources that can be obtained using:

python -m nltk.downloader stopwords
python -m nltk.downloader universal_tagset
python -m spacy download en_core_web_sm # download the english model

As of April 2019, pke only supports Python 3.6+.

Minimal example

pke provides a standardized API for extracting keyphrases from a document. Start by typing the 5 lines below. For using another model, simply replace pke.unsupervised.TopicRank with another model (list of implemented models).

import pke

# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be in raw
# format (i.e. a simple text file) and preprocessing is carried out using spacy
extractor.load_document(input='/path/to/input.txt', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10)

A detailed example is provided in the examples/ directory.

Getting started

Tutorials and code documentation are available at https://boudinfl.github.io/pke/.

Implemented models

pke currently implements the following keyphrase extraction models:

Citing pke

If you use pke, please cite the following paper:

@InProceedings{boudin:2016:COLINGDEMO,
  author    = {Boudin, Florian},
  title     = {pke: an open source python-based keyphrase extraction toolkit},
  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  pages     = {69--73},
  url       = {http://aclweb.org/anthology/C16-2015}
}

About

Python Keyphrase Extraction module

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%