Title: | Explore 'Wikidata' Through Tidy Data Frames |
---|---|
Description: | Query 'Wikidata' API <https://www.wikidata.org/wiki/Wikidata:Main_Page> with ease, get tidy data frames in response, and cache data in a local database. |
Authors: | Giorgio Comai [aut, cre, cph] , EDJNet [fnd] |
Maintainer: | Giorgio Comai <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.9.9000 |
Built: | 2024-11-17 06:27:23 UTC |
Source: | https://github.com/edjnet/tidywikidatar |
Mostly used internally in functions, exported for reference.
tw_check_cache(cache = NULL)
tw_check_cache(cache = NULL)
cache |
Defaults to NULL. If NULL, checks current cache settings. If given, returns given value, ignoring cache. |
Either TRUE or FALSE, depending on current cache settings.
if (interactive()) { tw_check_cache() }
if (interactive()) { tw_check_cache() }
Checks if cache folder exists, if not returns an informative message
tw_check_cache_folder()
tw_check_cache_folder()
If the cache folder exists, returns TRUE. Otherwise throws an error.
# If cache folder does not exist, it throws an error tryCatch(tw_check_cache_folder(), error = function(e) { return(e) } ) # Create cache folder tw_set_cache_folder(path = fs::path( tempdir(), "tw_cache_folder" )) tw_create_cache_folder(ask = FALSE) tw_check_cache_folder()
# If cache folder does not exist, it throws an error tryCatch(tw_check_cache_folder(), error = function(e) { return(e) } ) # Create cache folder tw_set_cache_folder(path = fs::path( tempdir(), "tw_cache_folder" )) tw_create_cache_folder(ask = FALSE) tw_check_cache_folder()
Tested only with SQLite and MySql. May work with other drivers. Used to check if given cache table is indexed (if created with any version of tidywikidatar
before 0.6, they are probably not indexed and less efficient).
tw_check_cache_index( table_name = NULL, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_check_cache_index( table_name = NULL, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
table_name |
Name of the table in the database. If given, it takes precedence over other parameters. |
type |
Defaults to "item". Type of cache file to output. Values typically used by |
show_details |
Logical, defaults to FALSE. If FALSE, return a logical vector of length one (TRUE if the table was indexed, FALSE if it was not). If TRUE, returns a data frame with more details about the index. |
language |
Defaults to language set with |
response_language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
If show_details
is set to FALSE, return a logical vector of length one (TRUE if the table was indexed, FALSE if it was not). If show_details
is set to TRUE, returns a data frame with more details about the index.
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_set_language(language = "en") tw_check_cache_index() }
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_set_language(language = "en") tw_check_cache_index() }
Check if given items are present in cache
tw_check_cached_items( id, language = tidywikidatar::tw_get_language(), cache_connection = NULL, disconnect_db = TRUE )
tw_check_cached_items( id, language = tidywikidatar::tw_get_language(), cache_connection = NULL, disconnect_db = TRUE )
id |
A character vector. Each element must start with Q, and correspond to a Wikidata identifier. |
language |
Defaults to language set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A character vector with IDs of items present in cache. If no item found in cache, returns NULL.
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) # add three items to local cache invisible(tw_get(id = "Q180099", language = "en")) invisible(tw_get(id = "Q228822", language = "en")) invisible(tw_get(id = "Q184992", language = "en")) # check if these other items are in cache items_in_cache <- tw_check_cached_items( id = c( "Q180099", "Q228822", "Q76857" ), language = "en" ) # it should return only the two items from the current list of id # but not other item already in cache items_in_cache }
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) # add three items to local cache invisible(tw_get(id = "Q180099", language = "en")) invisible(tw_get(id = "Q228822", language = "en")) invisible(tw_get(id = "Q184992", language = "en")) # check if these other items are in cache items_in_cache <- tw_check_cached_items( id = c( "Q180099", "Q228822", "Q76857" ), language = "en" ) # it should return only the two items from the current list of id # but not other item already in cache items_in_cache }
Mostly used internally by other functions.
tw_check_pid(property, logical_vector = FALSE, non_pid_as_NA = FALSE)
tw_check_pid(property, logical_vector = FALSE, non_pid_as_NA = FALSE)
property |
A character vector of one or more Wikidata property identifiers. |
logical_vector |
Logical, defaults to FALSE. If TRUE, returns a logical vector of the same length as input, where TRUE corresponds to seemingly meaningful property identifiers. |
non_pid_as_NA |
Logical, defaults to FALSE. If TRUE (and if |
A character vector with only strings appearing to be Wikidata identifiers; possibly shorter than input
tw_check_pid(property = c("P19", "p20", "Not an property id", "20", NA, "Q5", "")) tw_check_pid( property = c("P19", "p20", "Not an property id", "20", NA, "Q5", ""), logical_vector = TRUE ) tw_check_pid( property = c("P19", "p20", "Not an property id", "20", NA, "Q5", ""), non_pid_as_NA = TRUE )
tw_check_pid(property = c("P19", "p20", "Not an property id", "20", NA, "Q5", "")) tw_check_pid( property = c("P19", "p20", "Not an property id", "20", NA, "Q5", ""), logical_vector = TRUE ) tw_check_pid( property = c("P19", "p20", "Not an property id", "20", NA, "Q5", ""), non_pid_as_NA = TRUE )
Mostly used internally by other functions.
tw_check_qid(id, logical_vector = FALSE, non_id_as_NA = FALSE)
tw_check_qid(id, logical_vector = FALSE, non_id_as_NA = FALSE)
id |
A character vector of one or more Wikidata id. |
logical_vector |
Logical, defaults to FALSE. If TRUE, returns a logical vector of the same length as input, where TRUE corresponds to seemingly meaningful Q identifiers. |
non_id_as_NA |
Logical, defaults to FALSE. If TRUE (and if |
A character vector with only strings appearing to be Wikidata identifiers; possibly shorter than input
tw_check_qid(id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5")) tw_check_qid( id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5"), logical_vector = TRUE ) tw_check_qid( id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5"), non_id_as_NA = TRUE )
tw_check_qid(id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5")) tw_check_qid( id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5"), logical_vector = TRUE ) tw_check_qid( id = c("Q180099", "q228822", "Not an id", "00180099", NA, "Q5"), non_id_as_NA = TRUE )
Mostly used as a convenience function inside other functions to have consistent inputs.
tw_check_search( search, type = "item", language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_check_search( search, type = "item", language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A string to be searched in Wikidata |
type |
Defaults to "item". Either "item" or "property". |
language |
Language to be used for the search. Can be set once per session with |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame with three columns, id
, label
, and description
, filtered by the above criteria.
# The following two lines should give the same result. tw_check_search("Sylvia Pankhurst") tw_check_search(tw_search("Sylvia Pankhurst"))
# The following two lines should give the same result. tw_check_search("Sylvia Pankhurst") tw_check_search(tw_search("Sylvia Pankhurst"))
Return a connection to be used for caching
tw_connect_to_cache( connection = NULL, RSQLite = NULL, language = tidywikidatar::tw_get_language(), cache = NULL )
tw_connect_to_cache( connection = NULL, RSQLite = NULL, language = tidywikidatar::tw_get_language(), cache = NULL )
connection |
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example). |
RSQLite |
Defaults to NULL, expected either NULL or logical. If set to |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
A connection object.
if (interactive()) { cache_connection <- pool::dbPool( RSQLite::SQLite(), # or e.g. odbc::odbc(), Driver = ":memory:", # or e.g. "MariaDB", Host = "localhost", database = "example_db", UID = "example_user", PWD = "example_pwd" ) tw_connect_to_cache(cache_connection) db_settings <- list( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) tw_connect_to_cache(db_settings) }
if (interactive()) { cache_connection <- pool::dbPool( RSQLite::SQLite(), # or e.g. odbc::odbc(), Driver = ":memory:", # or e.g. "MariaDB", Host = "localhost", database = "example_db", UID = "example_user", PWD = "example_pwd" ) tw_connect_to_cache(cache_connection) db_settings <- list( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) tw_connect_to_cache(db_settings) }
tidywikidatar
caches data.Creates the base cache folder where tidywikidatar
caches data.
tw_create_cache_folder(ask = TRUE)
tw_create_cache_folder(ask = TRUE)
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_create_cache_folder() }
if (interactive()) { tw_create_cache_folder() }
Disable caching for the current session
tw_disable_cache()
tw_disable_cache()
Nothing, used for its side effects.
if (interactive()) { tw_disable_cache() }
if (interactive()) { tw_disable_cache() }
Ensure that connection to cache is disconnected consistently
tw_disconnect_from_cache( cache = NULL, cache_connection = NULL, disconnect_db = TRUE, language = tidywikidatar::tw_get_language() )
tw_disconnect_from_cache( cache = NULL, cache_connection = NULL, disconnect_db = TRUE, language = tidywikidatar::tw_get_language() )
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
language |
Defaults to language set with |
Nothing, used for its side effects.
if (interactive()) { tw_get( id = c("Q180099"), language = "en" ) tw_disconnect_from_cache() }
if (interactive()) { tw_get( id = c("Q180099"), language = "en" ) tw_disconnect_from_cache() }
tw_get_image_metadata()
would not return any value.A zero-rows tibble used internally when tw_get_image_metadata()
would not return any value.
tw_empty_image_metadata
tw_empty_image_metadata
A data frame with 0 rows and 19 columns
tw_get()
would not return any value.A zero-rows tibble used internally when tw_get()
would not return any value.
tw_empty_item
tw_empty_item
A data frame with 0 rows and 3 columns
tw_get_qualifiers()
would not return any value.A zero-rows tibble used internally when tw_get_qualifiers()
would not return any value.
tw_empty_qualifiers
tw_empty_qualifiers
A data frame with 0 rows and 8 columns
tw_search()
would not return any value.A zero-rows tibble used internally when tw_search()
would not return any value.
tw_empty_search
tw_empty_search
A data frame with 0 rows and 4 columns
tw_empty_wikipedia_category_members()
would not return any value.A zero-rows tibble used internally when tw_empty_wikipedia_category_members()
would not return any value.
tw_empty_wikipedia_category_members
tw_empty_wikipedia_category_members
A data frame with 0 rows and 3 columns
tw_get_wikipedia_page_qid()
would not return any value.A zero-rows tibble used internally when tw_get_wikipedia_page_qid()
would not return any value.
tw_empty_wikipedia_page
tw_empty_wikipedia_page
A data frame with 0 rows and 6 columns
tw_get_wikipedia_page_links()
would not return any value.A zero-rows tibble used internally when tw_get_wikipedia_page_links()
would not return any value.
tw_empty_wikipedia_page_links
tw_empty_wikipedia_page_links
A data frame with 0 rows and 8 columns
tw_get_wikipedia_page_sections()
would not return any value.A zero-rows tibble used internally when tw_get_wikipedia_page_sections()
would not return any value.
tw_empty_wikipedia_page_sections
tw_empty_wikipedia_page_sections
A data frame with 0 rows and 8 columns
Enable caching for the current session
tw_enable_cache(SQLite = TRUE)
tw_enable_cache(SQLite = TRUE)
SQLite |
Logical, defaults to TRUE. Set to FALSE to use custom database options. See |
Nothing, used for its side effects.
if (interactive()) { tw_enable_cache() }
if (interactive()) { tw_enable_cache() }
WikidataR
This function is mostly used internally and for testing.
tw_extract_qualifier(id, p, w = NULL)
tw_extract_qualifier(id, p, w = NULL)
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
w |
An object of class Wikidata created with |
A data frame (a tibble) with eight columns: id
for the input id, property
, qualifier_id
, qualifier_property
, qualifier_value
, rank
, qualifier_value_type
, and set
(to distinguish sets of data when a property is present more than once)
w <- WikidataR::get_item(id = "Q180099") tw_extract_qualifier(id = "Q180099", p = "P26", w = w)
w <- WikidataR::get_item(id = "Q180099") tw_extract_qualifier(id = "Q180099", p = "P26", w = w)
WikidataR
This function is mostly used internally and for testing.
tw_extract_single(w, language = tidywikidatar::tw_get_language())
tw_extract_single(w, language = tidywikidatar::tw_get_language())
w |
An object of class Wikidata created with |
language |
Defaults to language set with |
A data frame (a tibble) with four columns, such as the one created by tw_get
.
item <- tryCatch(WikidataR::get_item(id = "Q180099"), error = function(e) { as.character(e[[1]]) } ) tidywikidatar:::tw_extract_single(w = item)
item <- tryCatch(WikidataR::get_item(id = "Q180099"), error = function(e) { as.character(e[[1]]) } ) tidywikidatar:::tw_extract_single(w = item)
Filter search result and keep only items with matching property and Q identifier
tw_filter( search, p, q, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_filter( search, p, q, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A data frame generated by |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
q |
A character vector of length 1, a wikidata id. Must always start with the capital letter "Q", e.g. "Q5" for "human being". |
language |
Language to be used for the search. Can be set once per session with |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame with three columns, id
, label
, and description
, filtered by the above criteria.
tw_search(search = "Margaret Mead", limit = 3) %>% tw_filter(p = "P31", q = "Q5")
tw_search(search = "Margaret Mead", limit = 3) %>% tw_filter(p = "P31", q = "Q5")
Same as tw_filter()
, but consistently returns data frames with a single row.
tw_filter_first( search, p, q, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_filter_first( search, p, q, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A data frame generated by |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
q |
A character vector of length 1, a wikidata id. Must always start with the capital letter "Q", e.g. "Q5" for "human being". |
language |
Language to be used for the search. |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame with one row and three columns, id
, label
, and description
, filtered by the above criteria.
tw_search("Margaret Mead") %>% tw_filter_first(p = "P31", q = "Q5")
tw_search("Margaret Mead") %>% tw_filter_first(p = "P31", q = "Q5")
A wrapper of tw_filter()
that defaults to keep only "instance of" (P31) "human being" (Q5).
tw_filter_people( search, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, stop_at_first = TRUE, wait = 0, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_filter_people( search, language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, stop_at_first = TRUE, wait = 0, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A data frame generated by |
language |
Language to be used for the search. |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
stop_at_first |
Logical, defaults to TRUE. If TRUE, returns only the first match from the search that satisfies the criteria. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame with three columns, id
, label
, and description
; all rows refer to a human being.
tw_search("Ruth Benedict") tw_search("Ruth Benedict") %>% tw_filter_people()
tw_search("Ruth Benedict") tw_search("Ruth Benedict") %>% tw_filter_people()
Return (most) information from a Wikidata item in a tidy format
tw_get( id, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
tw_get( id, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
id |
A character vector, must start with Q, e.g. "Q180099" for the anthropologist Margaret Mead. Can also be a data frame of one row, typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
id_l |
Defaults to NULL. If given, must be an object or list such as the one generated with |
A data.frame (a tibble) with three columns (id, property, and value).
if (interactive()) { tw_get( id = c("Q180099", "Q228822"), language = "en" ) } ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tw_get( id = c("Q180099", "Q228822"), language = "en", id_l = tw_test_items )
if (interactive()) { tw_get( id = c("Q180099", "Q228822"), language = "en" ) } ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tw_get( id = c("Q180099", "Q228822"), language = "en", id_l = tw_test_items )
This function does not cache results.
tw_get_all_with_p( p, fields = c("item", "itemLabel", "itemDescription"), language = tidywikidatar::tw_get_language(), method = "SPARQL", wait = 0.1, limit = Inf, return_as_tw_search = TRUE )
tw_get_all_with_p( p, fields = c("item", "itemLabel", "itemDescription"), language = tidywikidatar::tw_get_language(), method = "SPARQL", wait = 0.1, limit = Inf, return_as_tw_search = TRUE )
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
fields |
A character vector of Wikidata fields. Ignored if |
language |
Defaults to language set with |
method |
Defaults to "SPARQL". The only accepted alternative value is "JSON", to use instead json-based API. |
wait |
Defaults to 0.1. Used only in method is set to "JSON". |
limit |
Defaults to |
return_as_tw_search |
Logical, defaults to TRUE. If TRUE, returns a data frame with three columns (id, label, and description) that can be piped to other |
A data frame with three columns is method is set to "SPARQL", or as many columns as fields if more are given and return_as_tw_search
is set to FALSE. A single column with Wikidata identifier if method is set to "JSON".
if (interactive()) { # get all Wikidata items with an ICAO airport code ("P239") tw_get_all_with_p(p = "P239", limit = 10) }
if (interactive()) { # get all Wikidata items with an ICAO airport code ("P239") tw_get_all_with_p(p = "P239", limit = 10) }
Typically set with tw_set_cache_db()
tw_get_cache_db()
tw_get_cache_db()
A list with all database parameters as stored in environment variables.
tw_get_cache_db()
tw_get_cache_db()
Gets location of cache file
tw_get_cache_file(type = NULL, language = tidywikidatar::tw_get_language())
tw_get_cache_file(type = NULL, language = tidywikidatar::tw_get_language())
type |
Defaults to NULL. Deprecated. If given, type of cache file to output. Values typically used by |
language |
Defaults to language set with |
A character vector of length one with location of item cache file.
tw_set_cache_folder(path = tempdir()) sqlite_cache_file_location <- tw_get_cache_file() # outputs location of cache file
tw_set_cache_folder(path = tempdir()) sqlite_cache_file_location <- tw_get_cache_file() # outputs location of cache file
Gets name of table inside the database
tw_get_cache_table_name( type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language() )
tw_get_cache_table_name( type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language() )
type |
Defaults to "item". Type of cache file to output. Values typically used by |
language |
Defaults to language set with |
response_language |
Defaults to language set with |
A character vector of length one with the name of the relevant table in the cache file.
# outputs name of table used in the cache database tw_get_cache_table_name(type = "item", language = "en")
# outputs name of table used in the cache database tw_get_cache_table_name(type = "item", language = "en")
Retrieve cached item
tw_get_cached_item( id, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_item( id, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
id |
A character vector, must start with Q, e.g. "Q180099" for the anthropologist Margaret Mead. Can also be a data frame of one row, typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection open. |
If data present in cache, returns a data frame with cached data.
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get(id = "Q180099", language = "en") df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en" )
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get(id = "Q180099", language = "en") df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en" )
Retrieve cached qualifier
tw_get_cached_qualifiers( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_qualifiers( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
id |
A character vector, must start with Q, e.g. "Q180099" for the anthropologist Margaret Mead. Can also be a data frame of one row, typically generated with |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection open. |
If data present in cache, returns a data frame with cached data.
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") df_from_cache <- tw_get_cached_qualifiers( id = "Q180099", p = "P26", language = "en" ) df_from_cache
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") df_from_cache <- tw_get_cached_qualifiers( id = "Q180099", p = "P26", language = "en" ) df_from_cache
Retrieve cached search
tw_get_cached_search( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, include_search = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_search( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, include_search = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A string to be searched in Wikidata |
type |
Defaults to "item". Either "item" or "property". |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
If data present in cache, returns a data frame with cached data.
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) search_from_api <- tw_search("Sylvia Pankhurst") search_from_api df_from_cache <- tw_get_cached_search("Sylvia Pankhurst") df_from_cache
tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) search_from_api <- tw_search("Sylvia Pankhurst") search_from_api df_from_cache <- tw_get_cached_search("Sylvia Pankhurst") df_from_cache
Mostly used internally.
tw_get_cached_wikipedia_category_members( category, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_wikipedia_category_members( category, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
category |
Title of a Wikipedia category page or final parts of its url. Must include "Category:", or equivalent in other languages. If given, url can be left empty, but language must be provided. |
type |
Defaults to "page", defines which kind of members of a category are returned. Valid values include "page", "file", and "subcat" (for sub-category). Corresponds to |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
If data present in cache, returns a data frame with cached data.
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(category = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_category_members( category = "Margaret Mead", language = "en" ) df_from_cache }
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(category = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_category_members( category = "Margaret Mead", language = "en" ) df_from_cache }
Mostly used internally.
tw_get_cached_wikipedia_page_links( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_wikipedia_page_links( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection open. |
If data present in cache, returns a data frame with cached data.
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_links( title = "Margaret Mead", language = "en" ) df_from_cache }
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_links( title = "Margaret Mead", language = "en" ) df_from_cache }
Mostly used internally.
tw_get_cached_wikipedia_page_qid( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_wikipedia_page_qid( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection open. |
If data present in cache, returns a data frame with cached data.
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_qid( title = "Margaret Mead", language = "en" ) df_from_cache }
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_qid( title = "Margaret Mead", language = "en" ) df_from_cache }
Mostly used internally.
tw_get_cached_wikipedia_page_sections( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_get_cached_wikipedia_page_sections( title, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection open. |
If data present in cache, returns a data frame with cached data.
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_sections( title = "Margaret Mead", language = "en" ) df_from_cache }
if (interactive()) { tw_set_cache_folder(path = tempdir()) tw_enable_cache() tw_create_cache_folder(ask = FALSE) df_from_api <- tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") df_from_cache <- tw_get_cached_wikipedia_page_sections( title = "Margaret Mead", language = "en" ) df_from_cache }
Get Wikidata description in given language
tw_get_description( id, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_description( id, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of the same length as the vector of id given, with the Wikidata description in the requested language.
tw_get_description( id = c( "Q180099", "Q228822" ), language = "en" )
tw_get_description( id = c( "Q180099", "Q228822" ), language = "en" )
tw_get()
Gets a field such a label or description from a dataframe typically generated with tw_get()
tw_get_field(df, field, id, language = tidywikidatar::tw_get_language())
tw_get_field(df, field, id, language = tidywikidatar::tw_get_language())
df |
A data frame typically generated with |
field |
A character vector of length one. Typically, either "label" or "description". |
id |
A character vector, typically of Wikidata identifiers. The output will be of the same length and in the same order as the identifiers provided with this parameter. |
language |
Defaults to language set with |
A character vector of the same length, and with data in the same order, as id
.
tw_get("Q180099") %>% tw_get_field(field = "label", id = "Q180099")
tw_get("Q180099") %>% tw_get_field(field = "label", id = "Q180099")
Please consult the relevant documentation for reusing content outside Wikimedia: https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia/technical
tw_get_image( id, format = "filename", width = NULL, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_image( id, format = "filename", width = NULL, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
format |
A character vector, defaults to 'filename'. If set to 'commons', outputs the link to the Wikimedia Commons page. If set to "embed", outputs a link that can be used to embed. |
width |
A numeric value, defaults to NULL, relevant only if format is set to 'embed'. If not given, defaults to full resolution image. |
language |
Needed for caching, defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A data frame of two columns, id and image, corresponding to reference to the image in the requested format.
tw_get_image("Q180099", format = "filename" ) if (interactive()) { tw_get_image("Q180099", format = "commons" ) tw_get_image("Q180099", format = "embed", width = 300 ) }
tw_get_image("Q180099", format = "filename" ) if (interactive()) { tw_get_image("Q180099", format = "commons" ) tw_get_image("Q180099", format = "embed", width = 300 ) }
Please consult the relevant documentation for reusing content outside Wikimedia: https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia/technical
tw_get_image_metadata( id, image_filename = NULL, only_first = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_image_metadata( id, image_filename = NULL, only_first = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
image_filename |
Defaults to NULL. If NULL, |
only_first |
Defaults to TRUE. If TRUE, returns metadata only for the first image associated with a given Wikidata id. If FALSE, returns all images available. |
language |
Needed for caching, defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A character vector, corresponding to reference to the image in the requested format.
if (interactive()) { tw_get_image_metadata("Q180099") }
if (interactive()) { tw_get_image_metadata("Q180099") }
Please consult the relevant documentation for reusing content outside Wikimedia: https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia/technical
tw_get_image_metadata_single( id, image_filename = NULL, only_first = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, read_cache = TRUE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_image_metadata_single( id, image_filename = NULL, only_first = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, read_cache = TRUE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
image_filename |
Defaults to NULL. If NULL, |
only_first |
Defaults to TRUE. If TRUE, returns metadata only for the first image associated with a given Wikidata id. If FALSE, returns all images available. |
language |
Needed for caching, defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
read_cache |
Logical, defaults to TRUE. Mostly used internally to prevent checking if an item is in cache if it is already known that it is not in cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A character vector, corresponding to reference to the image in the requested format.
if (interactive()) { tw_get_image_metadata_single("Q180099") }
if (interactive()) { tw_get_image_metadata_single("Q180099") }
Please consult the relevant documentation for reusing content outside Wikimedia: https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia/technical
tw_get_image_same_length( id, format = "filename", as_tibble = FALSE, only_first = TRUE, width = NULL, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_image_same_length( id, format = "filename", as_tibble = FALSE, only_first = TRUE, width = NULL, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
format |
A character vector, defaults to 'filename'. If set to 'commons', outputs the link to the Wikimedia Commons page. If set to "embed", outputs a link that can be used to embed. |
as_tibble |
Defaults to FALSE. If TRUE, returns a data frame instead of a character vector. |
only_first |
Defaults to TRUE. If TRUE, returns only the first image associated with a given Wikidata id. If FALSE, returns all images available. |
width |
A numeric value, defaults to NULL, relevant only if format is set to 'embed'. If not given, defaults to full resolution image. |
language |
Needed for caching, defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector, corresponding to reference to the image in the requested format.
tw_get_image_same_length("Q180099", format = "filename" ) if (interactive()) { tw_get_image_same_length("Q180099", format = "commons" ) tw_get_image_same_length("Q180099", format = "embed", width = 300 ) }
tw_get_image_same_length("Q180099", format = "filename" ) if (interactive()) { tw_get_image_same_length("Q180099", format = "commons" ) tw_get_image_same_length("Q180099", format = "embed", width = 300 ) }
Get Wikidata label in given language
tw_get_label( id, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_label( id, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of the same length as the vector of id given, with the Wikidata label in the requested language.
tw_get_label( id = c( "Q180099", "Q228822" ), language = "en" ) # If a label is not available, a NA value is returned if (interactive()) { tw_get_label( id = c( "Q64733534", "Q4773904", "Q220480" ), language = "sc" ) }
tw_get_label( id = c( "Q180099", "Q228822" ), language = "en" ) # If a label is not available, a NA value is returned if (interactive()) { tw_get_label( id = c( "Q64733534", "Q4773904", "Q220480" ), language = "sc" ) }
Efficiently get a wide table with various properties of a given set of Wikidata identifiers
tw_get_p_wide( id, p, label = FALSE, property_label_as_column_name = FALSE, both_id_and_label = FALSE, only_first = FALSE, preferred = FALSE, unlist = FALSE, collapse = ";", language = tidywikidatar::tw_get_language(), id_df = NULL, id_df_label = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_p_wide( id, p, label = FALSE, property_label_as_column_name = FALSE, both_id_and_label = FALSE, only_first = FALSE, preferred = FALSE, unlist = FALSE, collapse = ";", language = tidywikidatar::tw_get_language(), id_df = NULL, id_df_label = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
label |
Logical, defaults to FALSE. If TRUE labels of Wikidata Q
identifiers are reported instead of the identifiers themselves (or labels
are presented along of them, if |
property_label_as_column_name |
Logical, defaults to FALSE. If FALSE, names of columns with properties are the "P" identifiers of the property. If TRUE, the label of the correspondent property is assigned as column name. |
both_id_and_label |
Logical, defaults to FALSE. Relevant only if |
only_first |
Logical, defaults to FALSE. If TRUE, it just keeps the first relevant property value for each id (or NA if none is available), and returns a character vector. Warning: this likely discards valid values, so make sure this is really what you want. If FALSE, returns a list of the same length as input, with all values for each id stored in a list if more than one is found. |
preferred |
Logical, defaults to FALSE. If TRUE, returns properties that have rank "preferred" if available; if no "preferred" property is found, then it is ignored. |
unlist |
Logical, defaults to FALSE. Typically used sharing or exporting
data as csv files. Collapses all properties in a single string. The
separator is defined by the |
collapse |
Defaults to ";". Character used to separate results when
|
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
id_df_label |
Defaults to NULL. If given, it should be a dataframe
typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A data frame, with a column for each given property.
if (interactive()) { tw_get_p_wide( id = c("Q180099", "Q228822", "Q191095"), p = c("P27", "P19", "P20"), label = TRUE, only_first = TRUE ) }
if (interactive()) { tw_get_p_wide( id = c("Q180099", "Q228822", "Q191095"), p = c("P27", "P19", "P20"), label = TRUE, only_first = TRUE ) }
This function wraps tw_get_p()
, but always sets only_first
and preferred
to TRUE in order to give back always a character vector.
tw_get_p1( id, p, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_p1( id, p, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
latest_start_time |
Logical, defaults to FALSE. If TRUE, returns the property that has the most recent start time ("P580") as qualifier. If no such qualifier is found, then it is ignored. |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of the same length as the input.
tw_get_p1(id = "Q180099", "P26")
tw_get_p1(id = "Q180099", "P26")
Get Wikidata property of one or more items as a tidy data frame
tw_get_property( id, p, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_property( id, p, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A tibble, corresponding to the value for the given property. A tibble of zero rows if no relevant property found.
# Who were the doctoral advisors - P184 - of Margaret Mead - Q180099? advisors <- tw_get_property(id = "Q180099", p = "P184") advisors # tw_get_label(advisors) # It is also possible to get one property for many id if (interactive()) { tw_get_property( id = c( "Q180099", "Q228822" ), p = "P31" ) # Or many properties for a single id tw_get_property( id = "Q180099", p = c("P21", "P31") ) }
# Who were the doctoral advisors - P184 - of Margaret Mead - Q180099? advisors <- tw_get_property(id = "Q180099", p = "P184") advisors # tw_get_label(advisors) # It is also possible to get one property for many id if (interactive()) { tw_get_property( id = c( "Q180099", "Q228822" ), p = "P31" ) # Or many properties for a single id tw_get_property( id = "Q180099", p = c("P21", "P31") ) }
Get description of a Wikidata property in a given language
tw_get_property_description( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_property_description( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
property |
A character vector of length 1, must start with P, e.g. "P31". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of length 1, with the Wikidata label in the requested language.
tw_get_property_description(property = "P31")
tw_get_property_description(property = "P31")
Get label of a Wikidata property in a given language
tw_get_property_label( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_property_label( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
property |
A character vector. Each element must start with P, e.g. "P31". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector, with the Wikidata label in the requested language.
tw_get_property_label(property = "P31")
tw_get_property_label(property = "P31")
Get label of a Wikidata property in a given language
tw_get_property_label_single( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_property_label_single( property, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
property |
A character vector. Each element must start with P, e.g. "P31". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of length 1, with the Wikidata label in the requested language.
tidywikidatar:::tw_get_property_label_single(property = "P31")
tidywikidatar:::tw_get_property_label_single(property = "P31")
Get Wikidata property of an item as a vector or list of the same length as input
tw_get_property_same_length( id, p, only_first = FALSE, preferred = FALSE, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 ) tw_get_p( id, p, only_first = FALSE, preferred = FALSE, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_property_same_length( id, p, only_first = FALSE, preferred = FALSE, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 ) tw_get_p( id, p, only_first = FALSE, preferred = FALSE, latest_start_time = FALSE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
only_first |
Logical, defaults to FALSE. If TRUE, it just keeps the first relevant property value for each id (or NA if none is available), and returns a character vector. Warning: this likely discards valid values, so make sure this is really what you want. If FALSE, returns a list of the same length as input, with all values for each id stored in a list if more than one is found. |
preferred |
Logical, defaults to FALSE. If TRUE, returns properties that have rank "preferred" if available; if no "preferred" property is found, then it is ignored. |
latest_start_time |
Logical, defaults to FALSE. If TRUE, returns the property that has the most recent start time ("P580") as qualifier. If no such qualifier is found, then it is ignored. |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A list of the same length of input (or a character vector is only_first is set to TRUE)
# By default, it returns a list of the same length as input, # no matter how many values for each id/property if (interactive()) { tw_get_property_same_length( id = c( "Q180099", "Q228822", "Q76857" ), p = "P26" ) # Notice that if no relevant match is found, it returns a NA # This is useful for piped operations tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26")) # Consider unnesting for further analysis tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26")) %>% tidyr::unnest(cols = spouse) # If you are sure that you are interested only in the first return value, # consider setting only_first=TRUE to get a character vector rather than a list # Be mindful: you may well be discarding valid values. tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26", only_first = TRUE )) } tw_get_p(id = "Q180099", "P26")
# By default, it returns a list of the same length as input, # no matter how many values for each id/property if (interactive()) { tw_get_property_same_length( id = c( "Q180099", "Q228822", "Q76857" ), p = "P26" ) # Notice that if no relevant match is found, it returns a NA # This is useful for piped operations tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26")) # Consider unnesting for further analysis tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26")) %>% tidyr::unnest(cols = spouse) # If you are sure that you are interested only in the first return value, # consider setting only_first=TRUE to get a character vector rather than a list # Be mindful: you may well be discarding valid values. tibble::tibble(id = c( "Q180099", "Q228822", "Q76857" )) %>% dplyr::mutate(spouse = tw_get_property_same_length(id, "P26", only_first = TRUE )) } tw_get_p(id = "Q180099", "P26")
Gets all details of a property
tw_get_property_with_details(id, p, wait = 0)
tw_get_property_with_details(id, p, wait = 0)
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A tibble, corresponding to the details for the given property. NULL
if no relevant property found.
# Get "female form of label", including language tw_get_property_with_details(id = "Q64733534", p = "P2521")
# Get "female form of label", including language tw_get_property_with_details(id = "Q64733534", p = "P2521")
Gets all details of a property
tw_get_property_with_details_single(id, p)
tw_get_property_with_details_single(id, p)
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
A tibble, corresponding to the details for the given property. NULL if no relevant property found.
# Get "female form of label", including language tidywikidatar:::tw_get_property_with_details_single(id = "Q64733534", p = "P2521")
# Get "female form of label", including language tidywikidatar:::tw_get_property_with_details_single(id = "Q64733534", p = "P2521")
N.B. In order to provide for consistently structured output, this function outputs either id or value for each qualifier. The user should keep in mind that some of these come with additional detail (e.g. the unit, precision, or reference calendar).
tw_get_qualifiers( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
tw_get_qualifiers( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
id_l |
Defaults to NULL. If given, must be an object or list such as the one generated with |
A data frame (a tibble) with eight columns: id
for the input id, property
, qualifier_id
, qualifier_property
, qualifier_value
, rank
, qualifier_value_type
, and set
(to distinguish sets of data when a property is present more than once)
if (interactive()) { tidywikidatar::tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar::tw_get_qualifiers( id = "Q180099", p = "P26", language = "en", id_l = tw_test_items )
if (interactive()) { tidywikidatar::tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar::tw_get_qualifiers( id = "Q180099", p = "P26", language = "en", id_l = tw_test_items )
N.B. In order to provide for consistently structured output, this function outputs either id or value for each qualifier. The user should keep in mind that some of these come with additional detail (e.g. the unit, precision, or reference calendar).
tw_get_qualifiers_single( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
tw_get_qualifiers_single( id, p, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
id |
A character vector of length 1, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart. |
p |
A character vector of length 1, a property. Must always start with the capital letter "P", e.g. "P31" for "instance of". |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
id_l |
Defaults to NULL. If given, must be an object or list such as the one generated with |
A data frame (a tibble) with eight columns: id
for the input id, property
, qualifier_id
, qualifier_property
, qualifier_value
, rank
, qualifier_value_type
, and set
(to distinguish sets of data when a property is present more than once)
if (interactive()) { tidywikidatar:::tw_get_qualifiers_single(id = "Q180099", p = "P26", language = "en") } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar:::tw_get_qualifiers_single( id = "Q180099", p = "P26", language = "en", id_l = tw_test_items )
if (interactive()) { tidywikidatar:::tw_get_qualifiers_single(id = "Q180099", p = "P26", language = "en") } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar:::tw_get_qualifiers_single( id = "Q180099", p = "P26", language = "en", id_l = tw_test_items )
Return (most) information from a Wikidata item in a tidy format from a single Wikidata identifier
tw_get_single( id, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, read_cache = TRUE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
tw_get_single( id, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, read_cache = TRUE, cache_connection = NULL, disconnect_db = TRUE, wait = 0, id_l = NULL )
id |
A character vector, must start with Q, e.g. "Q180099" for the anthropologist Margaret Mead. Can also be a data frame of one row, typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
read_cache |
Logical, defaults to TRUE. Mostly used internally to prevent checking if an item is in cache if it is already known that it is not in cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
id_l |
Defaults to NULL. If given, must be an object or list such as the one generated with |
A data.frame (a tibble) with four columns (id, property, value, and rank). If item not found or trouble connecting with the server, a data frame with four columns and zero rows is returned, with the warning as an attribute, which can be retrieved with attr(output, "warning"))
if (interactive()) { tidywikidatar:::tw_get_single( id = "Q180099", language = "en" ) } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar:::tw_get_single( id = "Q180099", language = "en", id_l = tw_test_items )
if (interactive()) { tidywikidatar:::tw_get_single( id = "Q180099", language = "en" ) } #' ## using `tw_test_items` in examples in order to show output without calling ## on Wikidata servers tidywikidatar:::tw_get_single( id = "Q180099", language = "en", id_l = tw_test_items )
Get URL to a Wikipedia article corresponding to a Wikidata Q identifier in given language
tw_get_wikipedia( id, full_link = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_get_wikipedia( id, full_link = TRUE, language = tidywikidatar::tw_get_language(), id_df = NULL, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
id |
A character vector, must start with Q, e.g. "Q254" for Wolfgang Amadeus Mozart |
full_link |
Logical, defaults to TRUE. If FALSE, returns only the part of the url that corresponds to the title. |
language |
Defaults to language set with |
id_df |
Default to NULL. If given, it should be a dataframe typically generated with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A character vector of the same length as the vector of id given, with the Wikipedia link in the requested language.
tw_get_wikipedia(id = "Q180099")
tw_get_wikipedia(id = "Q180099")
Mostly used internally
tw_get_wikipedia_base_api_url( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), action = "query", type = "page" )
tw_get_wikipedia_base_api_url( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), action = "query", type = "page" )
url |
A character vector with the full URL to one or more Wikipedia pages. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
action |
Defaults to "query". Usually either "query" or "parse". In principle, any valid action value, see: https://www.mediawiki.org/w/api.php |
type |
Defaults to "page". Either "page" or "category". |
A character vector of base urls to be used with the MediaWiki API
tw_get_wikipedia_base_api_url(title = "Margaret Mead", language = "en") tw_get_wikipedia_base_api_url( title = "Category:American women anthropologists", type = "category", language = "en" )
tw_get_wikipedia_base_api_url(title = "Margaret Mead", language = "en") tw_get_wikipedia_base_api_url( title = "Category:American women anthropologists", type = "category", language = "en" )
Get all Wikidata Q identifiers of all Wikipedia pages (or files, or subcategories) that are members of the given category,
tw_get_wikipedia_category_members( url = NULL, category = NULL, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_category_members( url = NULL, category = NULL, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
url |
Full URL to a Wikipedia category page. If given, title and language can be left empty. |
category |
Title of a Wikipedia category page or final parts of its url. Must include "Category:", or equivalent in other languages. If given, url can be left empty, but language must be provided. |
type |
Defaults to "page", defines which kind of members of a category are returned. Valid values include "page", "file", and "subcat" (for sub-category). Corresponds to |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A data frame (a tibble) with eight columns: source_title_url
, source_wikipedia_title
, source_qid
, wikipedia_title
, wikipedia_id
, qid
, description
, and language
.
if (interactive()) { sub_categories <- tw_get_wikipedia_category_members( category = "Category:American women anthropologists", type = "subcat" ) sub_categories tw_get_wikipedia_category_members( category = sub_categories$wikipedia_title, type = "page" ) }
if (interactive()) { sub_categories <- tw_get_wikipedia_category_members( category = "Category:American women anthropologists", type = "subcat" ) sub_categories tw_get_wikipedia_category_members( category = sub_categories$wikipedia_title, type = "page" ) }
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_category_members_single( url = NULL, category = NULL, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_category_members_single( url = NULL, category = NULL, type = "page", language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
url |
Full URL to a Wikipedia category page. If given, title and language can be left empty. |
category |
Title of a Wikipedia category page or final parts of its url. Must include "Category:", or equivalent in other languages. If given, url can be left empty, but language must be provided. |
type |
Defaults to "page", defines which kind of members of a category are returned. Valid values include "page", "file", and "subcat" (for sub-category). Corresponds to |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A data frame (a tibble) with four columns: wikipedia_title
, wikipedia_id
, wikidata_id
, wikidata_description
.
if (interactive()) { tidywikidatar:::tw_get_wikipedia_category_members_single( category = "Category:American women anthropologists", type = "subcat" ) tidywikidatar:::tw_get_wikipedia_category_members_single( category = "Category:Puerto Rican women anthropologists", type = "page" ) }
if (interactive()) { tidywikidatar:::tw_get_wikipedia_category_members_single( category = "Category:American women anthropologists", type = "subcat" ) tidywikidatar:::tw_get_wikipedia_category_members_single( category = "Category:Puerto Rican women anthropologists", type = "page" ) }
Get all Wikidata Q identifiers of all Wikipedia pages that appear in one or more pages
tw_get_wikipedia_page_links( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_page_links( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A data frame (a tibble) with eight columns: source_title_url
, source_wikipedia_title
, source_qid
, wikipedia_title
, wikipedia_id
, qid
, description
, and language
.
if (interactive()) { tw_get_wikipedia_page_links(title = "Margaret Mead", language = "en") }
if (interactive()) { tw_get_wikipedia_page_links(title = "Margaret Mead", language = "en") }
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_links_single( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
tw_get_wikipedia_page_links_single( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
wikipedia_page_qid_df |
Defaults to NULL. If given, used to reduce calls to cache. A data frame |
A data frame (a tibble) with four columns: wikipedia_title
, wikipedia_id
, wikidata_id
, wikidata_description
.
if (interactive()) { tw_get_wikipedia_page_links_single(title = "Margaret Mead", language = "en") }
if (interactive()) { tw_get_wikipedia_page_links_single(title = "Margaret Mead", language = "en") }
Gets the Wikidata Q identifier of one or more Wikipedia pages
tw_get_wikipedia_page_qid( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_page_qid( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
url |
A character vector with the full URL to one or more Wikipedia pages. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A a data frame with six columns, including qid
with Wikidata identifiers, and a logical disambiguation
to flag when disambiguation pages are returned.
if (interactive()) { tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") # check when Wikipedia returns disambiguation page tw_get_wikipedia_page_qid(title = c("Rome", "London", "New York", "Vienna")) }
if (interactive()) { tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en") # check when Wikipedia returns disambiguation page tw_get_wikipedia_page_qid(title = c("Rome", "London", "New York", "Vienna")) }
Gets the Wikidata id of a Wikipedia page
tw_get_wikipedia_page_qid_single( title = NULL, url = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_page_qid_single( title = NULL, url = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A data frame (a tibble) with eight columns: title
, wikipedia_title
, wikipedia_id
, qid
, description
, disambiguation
, and language
.
if (interactive()) { tw_get_wikipedia_page_qid_single(title = "Margaret Mead", language = "en") }
if (interactive()) { tw_get_wikipedia_page_qid_single(title = "Margaret Mead", language = "en") }
Get links from a specific section of a Wikipedia page
tw_get_wikipedia_page_section_links( url = NULL, title = NULL, section_title = NULL, section_index = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
tw_get_wikipedia_page_section_links( url = NULL, title = NULL, section_title = NULL, section_index = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
section_title |
Defaults to NULL. If given, it should correspond to the human-readable title of a section of the relevant Wikipedia page. See also |
section_index |
Defaults to NULL. If given, it should correspond to the ordinal of a section of the relevant Wikipedia page. See also |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
wikipedia_page_qid_df |
Defaults to NULL. If given, used to reduce calls to cache. A data frame |
A data frame (a tibble).
if (interactive()) { tw_get_wikipedia_page_section_links(title = "Margaret Mead", language = "en", section_index = 1) }
if (interactive()) { tw_get_wikipedia_page_section_links(title = "Margaret Mead", language = "en", section_index = 1) }
Get sections of a Wikipedia page
tw_get_wikipedia_page_sections( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
tw_get_wikipedia_page_sections( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10 )
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
A data frame (a tibble), with the same columns as tw_empty_wikipedia_page_sections
.
if (interactive()) { tw_get_wikipedia_page_sections(title = "Margaret Mead", language = "en") }
if (interactive()) { tw_get_wikipedia_page_sections(title = "Margaret Mead", language = "en") }
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_sections_single( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
tw_get_wikipedia_page_sections_single( url = NULL, title = NULL, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 1, attempts = 10, wikipedia_page_qid_df = NULL )
url |
Full URL to a Wikipedia page. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
attempts |
Defaults to 10. Number of times it re-attempts to reach the API before failing. |
wikipedia_page_qid_df |
Defaults to NULL. If given, used to reduce calls to cache. A data frame |
A data frame (a tibble) with four columns: wikipedia_title
, wikipedia_id
, wikidata_id
, wikidata_description
.
if (interactive()) { tw_get_wikipedia_page_sections_single(title = "Margaret Mead", language = "en") }
if (interactive()) { tw_get_wikipedia_page_sections_single(title = "Margaret Mead", language = "en") }
Mostly used internally
tw_get_wikipedia_section_links_api_url( url = NULL, title = NULL, section_index, language = tidywikidatar::tw_get_language() )
tw_get_wikipedia_section_links_api_url( url = NULL, title = NULL, section_index, language = tidywikidatar::tw_get_language() )
url |
A character vector with the full URL to one or more Wikipedia pages. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
section_index |
Required. It should correspond to the ordinal of a section of the relevant Wikipedia page. See also |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
A character vector of base urls to be used with the MediaWiki API
tw_get_wikipedia_section_links_api_url(title = "Margaret Mead", section_index = 1, language = "en")
tw_get_wikipedia_section_links_api_url(title = "Margaret Mead", section_index = 1, language = "en")
Mostly used internally
tw_get_wikipedia_sections_api_url( url = NULL, title = NULL, language = tidywikidatar::tw_get_language() )
tw_get_wikipedia_sections_api_url( url = NULL, title = NULL, language = tidywikidatar::tw_get_language() )
url |
A character vector with the full URL to one or more Wikipedia pages. If given, title and language can be left empty. |
title |
Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided. |
language |
Two-letter language code used to define the Wikipedia version to use. Defaults to language set with |
A character vector of base urls to be used with the MediaWiki API
tw_get_wikipedia_sections_api_url(title = "Margaret Mead", language = "en")
tw_get_wikipedia_sections_api_url(title = "Margaret Mead", language = "en")
Tested only with SQLite and MySql. May work with other drivers.
tw_index_cache_item( table_name = NULL, check_first = TRUE, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_index_cache_item( table_name = NULL, check_first = TRUE, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
table_name |
Name of the table in the database. If given, it takes precedence over other parameters. |
check_first |
Logical, defaults to |
type |
Defaults to "item". Type of cache file to output. Values typically used by |
show_details |
Logical, defaults to |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
To ensure smooth functioning, the search column in the cache table is transformed into a column of type varchar
and length 255.
If show_details
is set to FALSE, nothing, used only for its side effects (add index to caching table). If TRUE, a data frame, same as the output of tw_check_cache_index(show_details = TRUE)
.
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_index_cache_search() }
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_index_cache_search() }
Tested only with SQLite and MySql. May work with other drivers.
tw_index_cache_search( table_name = NULL, check_first = TRUE, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
tw_index_cache_search( table_name = NULL, check_first = TRUE, type = "item", show_details = FALSE, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE )
table_name |
Name of the table in the database. If given, it takes precedence over other parameters. |
check_first |
Logical, defaults to TRUE. If TRUE, then before executing anything on the database it checks if the given table has already been indexed. If it has, it does nothing and returns only an informative message. |
type |
Defaults to "item". Type of cache file to output. Values typically used by |
show_details |
Logical, defaults to FALSE. If FALSE, return the function adds the index to the database, but does not return anything. If TRUE, returns a data frame with more details about the index. |
language |
Defaults to language set with |
response_language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
To ensure smooth functioning, the search column in the cache table is transformed into a column of type varchar
and length 255.
If show_details
is set to FALSE, nothing, used only for its side effects (add index to caching table). If TRUE, a data frame, same as the output of tw_check_cache_index(show_details = TRUE)
.
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_index_cache_search() }
if (interactive()) { tw_enable_cache() tw_set_cache_folder(path = fs::path( fs::path_home_r(), "R", "tw_data" )) tw_index_cache_search() }
Gets labels for all columns with names such as "id" and "property".
tw_label( df, value = TRUE, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_label( df, value = TRUE, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
df |
A data frame, typically generated with other |
value |
Logical, defaults to TRUE. If TRUE, it tries to get labels for all supposed id in the column called value. May break if the columns include some value which starts with Q and some digits, but is not a wikidata id. |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A data frame, with the same shape as the input data frame, but with labels instead of identifiers.
if (interactive()) { tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") %>% head(2) %>% tw_label() }
if (interactive()) { tw_get_qualifiers(id = "Q180099", p = "P26", language = "en") %>% head(2) %>% tw_label() }
avia_par_
datasetThe Wikidata Q identifier of European airports found in Eurostat's avia_par_
dataset
tw_qid_airports
tw_qid_airports
A data frame with 429 rows and 1 column:
Q identifiers
https://www.wikidata.org/wiki/Wikidata:Main_Page
A dataset with all the Wikidata items that have "Q27169" (member of the European Parliament) for the property "P39" (position held).
tw_qid_meps
tw_qid_meps
A data frame with 4581 rows and 1 column:
Q identifiers
https://www.wikidata.org/wiki/Wikidata:Main_Page
This function aims to facilitate only the most basic type of queries: return which items have the following property pairs. For more details on Wikidata queries, consult: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples. For complex queries, use WikidataQueryServiceR::query_wikidata()
.
tw_query( query, fields = c("item", "itemLabel", "itemDescription"), language = tidywikidatar::tw_get_language(), return_as_tw_search = TRUE )
tw_query( query, fields = c("item", "itemLabel", "itemDescription"), language = tidywikidatar::tw_get_language(), return_as_tw_search = TRUE )
query |
A list of named vectors, or a data frame (see example and readme). |
fields |
A character vector of Wikidata fields. Ignored if |
language |
Defaults to language set with |
return_as_tw_search |
Logical, defaults to TRUE. If TRUE, returns a data frame with three columns (id, label, and description) that can be piped to other |
Consider tw_get_all_with_p()
if you want to get all items with a given property, irrespective of the value.
A data frame
if (interactive()) { query <- list( c(p = "P106", q = "Q1397808"), c(p = "P21", q = "Q6581072") ) tw_query(query) }
if (interactive()) { query <- list( c(p = "P106", q = "Q1397808"), c(p = "P21", q = "Q6581072") ) tw_query(query) }
Removes the table where qualifiers are cached
tw_reset_item_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_item_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_item_cache() }
if (interactive()) { tw_reset_item_cache() }
Removes the table where qualifiers are cached
tw_reset_qualifiers_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_qualifiers_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_qualifiers_cache() }
if (interactive()) { tw_reset_qualifiers_cache() }
Removes from cache the table where data typically gathered with tw_get_wikipedia_category_members()
are stored.
tw_reset_wikipedia_category_members_cache( language = tidywikidatar::tw_get_language(), type = "page", cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_wikipedia_category_members_cache( language = tidywikidatar::tw_get_language(), type = "page", cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
type |
Defaults to "page", defines which kind of members of a category are returned. Valid values include "page", "file", and "subcat" (for sub-category). Corresponds to |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_wikipedia_category_members_cache() }
if (interactive()) { tw_reset_wikipedia_category_members_cache() }
Removes the table where data typically gathered with tw_get_wikipedia_page_qid()
from cache
tw_reset_wikipedia_page_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_wikipedia_page_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_wikipedia_page_cache() }
if (interactive()) { tw_reset_wikipedia_page_cache() }
Removes from cache the table where data typically gathered with tw_get_wikipedia_page_links()
are stored
tw_reset_wikipedia_page_links_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_wikipedia_page_links_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_wikipedia_page_links_cache() }
if (interactive()) { tw_reset_wikipedia_page_links_cache() }
Removes from cache the table where data typically gathered with tw_get_wikipedia_page_sections()
are stored
tw_reset_wikipedia_page_sections_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
tw_reset_wikipedia_page_sections_cache( language = tidywikidatar::tw_get_language(), cache = NULL, cache_connection = NULL, disconnect_db = TRUE, ask = TRUE )
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
ask |
Logical, defaults to TRUE. If FALSE, and cache folder does not exist, it just creates it without asking (useful for non-interactive sessions). |
Nothing, used for its side effects.
if (interactive()) { tw_reset_wikipedia_page_sections_cache() }
if (interactive()) { tw_reset_wikipedia_page_sections_cache() }
By defaults, this search returns items. Set type
to property or use tw_search_property()
for properties.
tw_search( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_search( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A string to be searched in Wikidata |
type |
Defaults to "item". Either "item" or "property". |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame (a tibble) with three columns (id, label, and description), and as many rows as there are results (by default, limited to 10). Four columns when include_search
is set to TRUE.
tw_search(search = c("Margaret Mead", "Ruth Benedict"))
tw_search(search = c("Margaret Mead", "Ruth Benedict"))
This search returns only items, use tw_search_property()
for properties.
tw_search_item( search, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_search_item( search, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A string to be searched in Wikidata |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame (a tibble) with three columns (id, label, and description), and as many rows as there are results (by default, limited to 10).
tw_search_item(search = "Sylvia Pankhurst")
tw_search_item(search = "Sylvia Pankhurst")
This search returns only properties, use tw_search_items()
for properties.
tw_search_property( search, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_search_property( search, language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, wait = 0, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search |
A string to be searched in Wikidata |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
A data frame (a tibble) with three columns (id, label, and description), and as many rows as there are results (by default, limited to 10).
tw_search_property(search = "gender")
tw_search_property(search = "gender")
This search returns only items, use tw_search_property()
for properties.
tw_search_single( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
tw_search_single( search, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), limit = 10, include_search = FALSE, cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE, wait = 0 )
search |
A string to be searched in Wikidata |
type |
Defaults to "item". Either "item" or "property". |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
limit |
Maximum numbers of responses to be given. |
include_search |
Logical, defaults to FALSE. If TRUE, the search is returned as an additional column. |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
wait |
In seconds, defaults to 0. Time to wait between queries to Wikidata. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries. |
A data frame (a tibble) with three columns (id, label, and description), and as many rows as there are results (by default, limited to 10). Four columns when include_search
is set to TRUE.
tidywikidatar:::tw_search_single(search = "Sylvia Pankhurst")
tidywikidatar:::tw_search_single(search = "Sylvia Pankhurst")
Set database connection settings for the session
tw_set_cache_db( db_settings = NULL, driver = NULL, host = NULL, server = NULL, port = NULL, database = NULL, user = NULL, pwd = NULL )
tw_set_cache_db( db_settings = NULL, driver = NULL, host = NULL, server = NULL, port = NULL, database = NULL, user = NULL, pwd = NULL )
db_settings |
A list of database connection settings (see example) |
driver |
A database driver. Common database drivers include |
host |
Host address, e.g. "localhost". Different drivers use server or host parameter, only one of them is likely needed. |
server |
Server address, e.g. "localhost". Different drivers use server or host parameter, only one of them is likely needed. |
port |
Port to use to connect to the database. |
database |
Database name. |
user |
Database user name. |
pwd |
Password for the database user. |
A list with all given parameters (invisibly).
if (interactive()) { # Settings can be provided either as a list db_settings <- list( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) tw_set_cache_db(db_settings) # or as parameters tw_set_cache_db( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) # or ignoring fields that can be left to default values, such as "localhost" and port 3306 tw_set_cache_db( driver = "MySQL", database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) }
if (interactive()) { # Settings can be provided either as a list db_settings <- list( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) tw_set_cache_db(db_settings) # or as parameters tw_set_cache_db( driver = "MySQL", host = "localhost", server = "localhost", port = 3306, database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) # or ignoring fields that can be left to default values, such as "localhost" and port 3306 tw_set_cache_db( driver = "MySQL", database = "tidywikidatar", user = "secret_username", pwd = "secret_password" ) }
Consider using a folder out of your current project directory, e.g. tw_set_cache_folder("~/R/tw_data/")
: you will be able to use the same cache in different projects, and prevent cached files from being sync-ed if you use services such as Nextcloud or Dropbox.
tw_set_cache_folder(path = NULL) tw_get_cache_folder(path = NULL)
tw_set_cache_folder(path = NULL) tw_get_cache_folder(path = NULL)
path |
A path to a location used for caching data. If the folder does not exist, it will be created. |
The path to the caching folder, if previously set; the same path as given to the function; or the default, tw_data
is none is given.
if (interactive()) { tw_set_cache_folder(fs::path(fs::path_home_r(), "R", "tw_data")) } tw_get_cache_folder()
if (interactive()) { tw_set_cache_folder(fs::path(fs::path_home_r(), "R", "tw_data")) } tw_get_cache_folder()
Defaults to "en".
tw_set_language(language = NULL) tw_get_language(language = NULL)
tw_set_language(language = NULL) tw_get_language(language = NULL)
language |
A character vector of length one, with a string of two letters such as "en". For a full list of available values, see: https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all |
A two letter code for the language, if previously set; the same language as given to the function; or the default, en
is none is given.
if (interactive()) { tw_set_language(language = "en") } tw_get_language()
if (interactive()) { tw_set_language(language = "en") } tw_get_language()
WikidataR::get_item()
A list mostly used for testing with some Wikidata items in the format resulting from WikidataR::get_item()
tw_test_items
tw_test_items
A list, an object such as the one resulting from WikidataR::get_item()
Writes item to cache. Typically used internally, but exported to enable custom caching solutions.
tw_write_item_to_cache( item_df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_item_to_cache( item_df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
item_df |
A data frame with three columns typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it first deletes all rows associated with the item(s) included in |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Nothing, used for its side effects.
tw_set_cache_folder(path = fs::path(tempdir(), paste(sample(letters, 24), collapse = ""))) tw_create_cache_folder(ask = FALSE) tw_disable_cache() df_from_api <- tw_get(id = "Q180099", language = "en") df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en" ) is.null(df_from_cache) # expect TRUE, as nothing has yet been stored in cache tw_write_item_to_cache( item_df = df_from_api, language = "en", cache = TRUE ) df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en", cache = TRUE ) is.null(df_from_cache) # expect a data frame, same as df_from_api
tw_set_cache_folder(path = fs::path(tempdir(), paste(sample(letters, 24), collapse = ""))) tw_create_cache_folder(ask = FALSE) tw_disable_cache() df_from_api <- tw_get(id = "Q180099", language = "en") df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en" ) is.null(df_from_cache) # expect TRUE, as nothing has yet been stored in cache tw_write_item_to_cache( item_df = df_from_api, language = "en", cache = TRUE ) df_from_cache <- tw_get_cached_item( id = "Q180099", language = "en", cache = TRUE ) is.null(df_from_cache) # expect a data frame, same as df_from_api
Mostly used internally by tidywikidatar
, use with caution to keep caching consistent.
tw_write_qid_of_wikipedia_page_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_qid_of_wikipedia_page_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
df |
A data frame typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Silently returns the same data frame provided as input. Mostly used internally for its side effects.
if (interactive()) { df <- tw_get_wikipedia_page_qid( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_qid_of_wikipedia_page_to_cache( df = df, language = "en" ) }
if (interactive()) { df <- tw_get_wikipedia_page_qid( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_qid_of_wikipedia_page_to_cache( df = df, language = "en" ) }
Mostly to be used internally by tidywikidatar
, use with caution to keep caching consistent.
tw_write_qualifiers_to_cache( qualifiers_df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_qualifiers_to_cache( qualifiers_df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
qualifiers_df |
A data frame typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Silently returns the same data frame provided as input. Mostly used internally for its side effects.
q_df <- tw_get_qualifiers( id = "Q180099", p = "P26", language = "en", cache = FALSE ) tw_write_qualifiers_to_cache( qualifiers_df = q_df, language = "en", cache = TRUE )
q_df <- tw_get_qualifiers( id = "Q180099", p = "P26", language = "en", cache = FALSE ) tw_write_qualifiers_to_cache( qualifiers_df = q_df, language = "en", cache = TRUE )
Writes search to cache. Typically used internally, but exported to enable custom caching solutions.
tw_write_search_to_cache( search_df, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_search_to_cache( search_df, type = "item", language = tidywikidatar::tw_get_language(), response_language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
search_df |
A data frame with four columns typically generated with |
type |
Defaults to "item". Either "item" or "property". |
language |
Language to be used for the search. Can be set once per session with |
response_language |
Language to be used for the returned labels and descriptions. Corresponds to the |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Defaults to FALSE. If TRUE, overwrites cache. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Nothing, used for its side effects.
tw_set_cache_folder(path = fs::path(tempdir(), paste(sample(letters, 24), collapse = ""))) tw_create_cache_folder(ask = FALSE) tw_disable_cache() search_from_api <- tw_search(search = "Sylvia Pankhurst", include_search = TRUE) search_from_cache <- tw_get_cached_search("Sylvia Pankhurst") nrow(search_from_cache) == 0 # expect TRUE, as nothing has yet been stored in cache tw_write_search_to_cache(search_df = search_from_api) search_from_cache <- tw_get_cached_search("Sylvia Pankhurst") search_from_cache
tw_set_cache_folder(path = fs::path(tempdir(), paste(sample(letters, 24), collapse = ""))) tw_create_cache_folder(ask = FALSE) tw_disable_cache() search_from_api <- tw_search(search = "Sylvia Pankhurst", include_search = TRUE) search_from_cache <- tw_get_cached_search("Sylvia Pankhurst") nrow(search_from_cache) == 0 # expect TRUE, as nothing has yet been stored in cache tw_write_search_to_cache(search_df = search_from_api) search_from_cache <- tw_get_cached_search("Sylvia Pankhurst") search_from_cache
Mostly used internally by tidywikidatar
, use with caution to keep caching consistent.
tw_write_wikipedia_category_members_to_cache( df, language = tidywikidatar::tw_get_language(), type = "page", cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_wikipedia_category_members_to_cache( df, language = tidywikidatar::tw_get_language(), type = "page", cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
df |
A data frame typically generated with |
language |
Defaults to language set with |
type |
Defaults to "page", defines which kind of members of a category are returned. Valid values include "page", "file", and "subcat" (for sub-category). Corresponds to |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Silently returns the same data frame provided as input. Mostly used internally for its side effects.
if (interactive()) { df <- tw_get_wikipedia_category_members( category = "American women anthropologists", language = "en", cache = FALSE ) tw_write_wikipedia_category_members_to_cache( df = df, language = "en" ) }
if (interactive()) { df <- tw_get_wikipedia_category_members( category = "American women anthropologists", language = "en", cache = FALSE ) tw_write_wikipedia_category_members_to_cache( df = df, language = "en" ) }
Mostly used internally by tidywikidatar
, use with caution to keep caching consistent.
tw_write_wikipedia_page_links_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_wikipedia_page_links_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
df |
A data frame typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Silently returns the same data frame provided as input. Mostly used internally for its side effects.
if (interactive()) { df <- tw_get_wikipedia_page_links( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_wikipedia_page_links_to_cache( df = df, language = "en" ) }
if (interactive()) { df <- tw_get_wikipedia_page_links( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_wikipedia_page_links_to_cache( df = df, language = "en" ) }
Mostly used internally by tidywikidatar
, use with caution to keep caching consistent.
tw_write_wikipedia_page_sections_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
tw_write_wikipedia_page_sections_to_cache( df, language = tidywikidatar::tw_get_language(), cache = NULL, overwrite_cache = FALSE, cache_connection = NULL, disconnect_db = TRUE )
df |
A data frame typically generated with |
language |
Defaults to language set with |
cache |
Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with |
overwrite_cache |
Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated. |
cache_connection |
Defaults to NULL. If NULL, and caching is enabled, |
disconnect_db |
Defaults to TRUE. If FALSE, leaves the connection to cache open. |
Silently returns the same data frame provided as input. Mostly used internally for its side effects.
if (interactive()) { df <- tw_get_wikipedia_page_sections( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_wikipedia_page_sections_to_cache( df = df, language = "en" ) }
if (interactive()) { df <- tw_get_wikipedia_page_sections( title = "Margaret Mead", language = "en", cache = FALSE ) tw_write_wikipedia_page_sections_to_cache( df = df, language = "en" ) }