Please be aware that the Python SDK is undergoing significant updates in preparation for its first major release. This upcoming version will introduce breaking changes. We advise caution when using the current version of the SDK, as future updates may require adjustments in your existing code.
Installation in Python - PyPI release
Fusionbase is on PyPI, so you can use pip to install it.
pipinstallfusionbase
If you want to use all features, you must make sure that pandas and numpy are installed.
pipinstallpandaspipinstallnumpy
Fusionbase by default uses the standard JSON library of Python to serialize and locally store data. However, you can use the faster orjson library as a drop-in replacement.
Therefore, just install orjson and Fusionbase will automatically detect and use it.
pipinstallorjson
Getting Started
Got to examples to deep dive into Fusionbase and see various examples on how to use the package.
Here are some Examples for a quick start:
Data Streams
The Data Stream module lets you conveniently access data and metadata of all Data Streams. Each stream can be accessed via its unique stream id or label.
Setup
# Import Fusionbasefrom fusionbase import Fusionbase# Create a new datastream# Provide your API Keyfusionbase =Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"})# If you prefer to have extended logging output and information # Like a progress bar for downloading datastreams etc.# Turn on the log fusionbase =Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"}, log=True)# Get the datastream with the key "28654971"data_stream_key ="28654971"data_stream = fusionbase.get_datastream(data_stream_key)
Human readable datastream information
# Print a nice table containing the meta data of the streamdata_stream.pretty_meta_data()
Getting the data
The samples below show how to retrieve the data of a datastream as a list of dictionaries. Each element in the list represents one row within the dataset.
Note that the data can by hierarchical.
# The following returns the full datastream as a list of dictionaries# It uses a local cache if availabledata = data_stream.get_data()print(data)# Get always the latest data from Fusionbasedata = data_stream.get_data(live=True)print(data)# If you only need a subset of the columns# You'll gain much performance by only selecting those columnsdata = data_stream.get_data(fields=["NAME_OF_COLUMN_1", "NAME_OF_COLUMN_N"])print(data)# If you need only an excerpt of the data you can use skip and limit# The sample below gets the 10 first rowsdata = data_stream.get_data(skip=0, limit=10)print(data)
If you are working with pandas, it is probably the most convenient way to load to data directly as a pandas DataFrame.
# Load the data from Fusionbase, cache it and put it in a pandas DataFramedf = data_stream.as_dataframe()print(df)# Force ignoring the cache and make sure to get the latest datadf = data_stream.as_dataframe(live=True)print(df)
Storing the data
Large datasets potentially do not fit into the memory. Therefore, it is possible to get the data of a stream directly as partitioned files.
The folder structure is automatically created and always like ./{ID-OF-THE-STREAM}/data/*
from pathlib import Path# Store as JSON filesdata_stream.as_json_files(storage_path=Path("./data/"))# Store as CSV filesdata_stream.as_csv_files(storage_path=Path("./data/"))# Store as Pickle filesdata_stream.as_pickle_files(storage_path=Path("./data/"))
Data Services
A data service can be seen as an API that returns a certain output for a specific input. For example, our address normalization service parses an address and returns the structured and normalized parts of it.
Setup
# Import Fusionbasefrom fusionbase.Fusionbase import Fusionbase# Create a new dataservice# Provide your API Key and the Fusionbase API URI (usually: https://api.fusionbase.com/api/v1)fusionbase =Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"})data_service_key ="50527318"data_service = fusionbase.get_dataservice(data_service_key)
Human readable data service information:
# Retrieves the metadata from a Service by giving a Service specific key and prints it nicely to consoledata_service.pretty_meta_data()
Human readable data service definition:
# Retrieve the request definition (such as required parameters) from a Service by giving a Service specific key and print it to console.
data_service.pretty_request_definition()
Invoke a data service:
# Invoke a service by providing input data# The following lines of code are equivalent# Services can be invoked directly by their parameter namesresult = data_service.invoke(company_name="OroraTech GmbH", zip_code="81669")print(result)