In Smart Harvesting II, we did some research on scheduling data ingestion for curators.
In particular, we tried to see if we can model curators' decisions to ingest a particular dataset at a given point in time, by employing knowledge from historical data and features of the data streams (in our case, proceedings of conferences).
Our first approach at this problem has been published as "Prioritizing and Scheduling Conferences for Metadata Harvesting in dblp" at JCDL 2018.
See the
Publications page for more details.
Below is a demonstration of the tool we envision with our work. Each conference is assigned a score which is based on temporal relations of past data ingestions, and features like rating, size or internationality.
The curator can decide which features to include in the final score.
The conferences with the highest scores should then be top priority for ingestion.
For the demo, we assume that the current date is in 2018-08.
View the ranking that is based on the feature (
reset):
Rating
Activity
Activity+Rating
Conference Key |
Score |
Interval |
Month |
Delay |
Last Entry |
Expected Next |
Activity |
Rating |
Prominence |
Internationality |
Size |
Affiliations |
Log score |
|
|
|
|
|
|
|
|
|
|