stark_qa.tools
stark_qa.tools.api
- stark_qa.tools.api.complete_texts_claude(inputs, **kwargs)
- stark_qa.tools.api.complete_texts_hf(inputs, **kwargs)
- stark_qa.tools.api.get_gpt_outputs(inputs, **kwargs)
- stark_qa.tools.api.get_llm_output(message, model='gpt-4-0125-preview', max_tokens=2048, temperature=1, json_object=False)[source]
A general function to complete a prompt using the specified model.
- Parameters:
message (str or list) – The input message or a list of message dicts.
model (str) – The model to use for completion.
max_tokens (int) – Maximum number of tokens to generate.
temperature (float) – Sampling temperature.
json_object (bool) – Whether to output in JSON format.
- Returns:
The completed text generated by the model.
- Return type:
str
- Raises:
ValueError – If the model is not recognized.
- stark_qa.tools.api.get_llm_outputs(inputs, **kwargs)
- stark_qa.tools.api.parallel_func(func, n_max_nodes=5)[source]
A general function to call a function on a list of inputs in parallel.
- Parameters:
func (callable) – The function to apply.
n_max_nodes (int) – Maximum number of parallel processes.
- Returns:
A wrapper function that applies func in parallel.
- Return type:
callable
stark_qa.tools.args
- stark_qa.tools.args.load_args(args_dict)[source]
Convert a dictionary into an argparse.Namespace object.
- Parameters:
args_dict (dict) – Dictionary of arguments to be converted.
- Returns:
Namespace object with the arguments.
- Return type:
argparse.Namespace
- stark_qa.tools.args.merge_args(args_1, args_2)[source]
Merge two argparse.Namespace objects. Arguments from args_2 have higher priority.
- Parameters:
args_1 (argparse.Namespace) – First namespace object.
args_2 (argparse.Namespace) – Second namespace object.
- Returns:
Merged namespace object.
- Return type:
argparse.Namespace
stark_qa.tools.download_hf
- stark_qa.tools.download_hf.download_hf_file(repo, file, repo_type='dataset', save_as_file=None)[source]
Downloads a file from a Hugging Face repository and saves it to the specified path.
- Parameters:
repo (str) – The repository name.
file (str) – The file path within the repository to download.
repo_type (str) – The type of the repository (e.g., ‘dataset’).
save_as_file (str, optional) – The local file path to save the downloaded file. If not provided, saves the file in the current directory with the same name as the original file.
- stark_qa.tools.download_hf.download_hf_folder(repo, folder, repo_type='dataset', save_as_folder=None)[source]
Downloads a folder from a Hugging Face repository and saves it to the specified directory.
- Parameters:
repo (str) – The repository name.
folder (str) – The folder path within the repository to download.
repo_type (str) – The type of the repository (e.g., ‘dataset’).
save_as_folder (str, optional) – The local directory to save the downloaded folder. Defaults to “data/”.
stark_qa.tools.graph
- stark_qa.tools.graph.k_hop_subgraph(node_idx, num_hops, edge_index, relabel_nodes=False, num_nodes=None, flow='source_to_target', directed=False)[source]
Extracts the k-hop subgraph around a given node or a list of nodes.
- Parameters:
node_idx (Union[int, List[int], Tensor]) – The central node or a list of central nodes.
num_hops (int) – The number of hops to consider.
edge_index (Tensor) – The edge indices of the graph.
relabel_nodes (bool, optional) – If True, the nodes will be relabeled to a contiguous range. Defaults to False.
num_nodes (Optional[int], optional) – The number of nodes in the graph. Defaults to None.
flow (str, optional) – The flow direction (‘source_to_target’, ‘target_to_source’, ‘bidirectional’). Defaults to ‘source_to_target’.
directed (bool, optional) – If True, the graph is treated as directed. Defaults to False.
- Returns:
The node indices, the edge indices, the indices of the original nodes, and the edge mask.
- Return type:
Tuple[Tensor, Tensor, Tensor, Tensor]
- stark_qa.tools.graph.relabel_graph(subset, edge_index, num_nodes)[source]
Relabels the nodes in the graph to a contiguous range.
- Parameters:
subset (Tensor) – The subset of nodes.
edge_index (Tensor) – The edge indices of the graph.
num_nodes (int) – The number of nodes in the graph.
- Returns:
The relabeled edge indices.
- Return type:
Tensor
stark_qa.tools.io
- stark_qa.tools.io.load_files(save_path)[source]
Load all files from a specified directory.
- Parameters:
save_path (str) – Directory to load the files from.
- Returns:
Dictionary with filenames (without extension) as keys and file contents as values.
- Return type:
dict
- stark_qa.tools.io.read_from_file(file_path)[source]
Read content from a file based on its extension.
- Parameters:
file_path (str) – Path to the file.
- Returns:
Content of the file.
- Return type:
content
- Raises:
NotImplementedError – If the file type is not supported.
- stark_qa.tools.io.save_files(save_path, **kwargs)[source]
Save multiple files in a specified directory.
- Parameters:
save_path (str) – Directory to save the files.
**kwargs – Keyword arguments where keys are filenames (without extension) and values are the contents.
- stark_qa.tools.io.write_to_file(file_path, content)[source]
Write content to a file based on its extension.
- Parameters:
file_path (str) – Path to the file.
content – Content to write.
- Raises:
NotImplementedError – If the file type is not supported.
stark_qa.tools.node
- class stark_qa.tools.node.Node[source]
Bases:
object
- stark_qa.tools.node.df_row_to_dict(row, column_names=None)[source]
Convert a row of a DataFrame to a dictionary.
- Parameters:
row (pandas.Series) – A row of a DataFrame.
column_names (list, optional) – The list of column names. Defaults to None.
- Returns:
A dictionary that contains the same information as the row.
- Return type:
dict
- stark_qa.tools.node.dict_tree(dictionary, indent=0)[source]
Create a visual tree representation of a dictionary.
- Parameters:
dictionary (dict) – The dictionary to represent as a tree.
indent (int) – The current indentation level.
- Returns:
A string representing the dictionary as a tree.
- Return type:
str
- stark_qa.tools.node.register_node(node, dictionary)[source]
Register a dictionary into a Node object.
- Parameters:
node (Node) – The node to register the dictionary to.
dictionary (dict) – The dictionary to register.
stark_qa.tools.process_text
- stark_qa.tools.process_text.chunk_text(text, chunk_size)[source]
Split text into chunks of specified size.
- Parameters:
text (str) – Input text to be chunked.
chunk_size (int) – Size of each chunk.
- Returns:
List of text chunks.
- Return type:
list
- stark_qa.tools.process_text.clean_data(item)[source]
Clean the text data.
- Parameters:
item (Union[str, list, dict]) – An object that contains text data which is cleaned iteratively.
- Returns:
The cleaned data in the same format as item.
- stark_qa.tools.process_text.clean_dict(dictionary, remove_values=['', 'nan'])[source]
Clean the dictionary by removing specific values.
- Parameters:
dictionary (dict) – A dictionary to be cleaned.
remove_values (list) – List of values to remove from the dictionary.
- Returns:
Cleaned dictionary.
- Return type:
dict
- stark_qa.tools.process_text.compact_text(text)[source]
Compact the text by removing unnecessary spaces and punctuation issues.
- Parameters:
text (str) – Input text to be compacted.
- Returns:
Compacted text.
- Return type:
str
- stark_qa.tools.process_text.decode_escapes(s)[source]
Decode escape sequences in a string.
- Parameters:
s (str) – Input string with escape sequences.
- Returns:
Decoded string.
- Return type:
str
- stark_qa.tools.process_text.exact_match_score(prediction, ground_truth)[source]
Calculate the exact match score between prediction and ground truth.
- Parameters:
prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.
- Returns:
Exact match score.
- Return type:
float
- stark_qa.tools.process_text.f1_score(prediction, ground_truth)[source]
Calculate the F1 score between prediction and ground truth.
- Parameters:
prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.
- Returns:
F1 score.
- Return type:
float
- stark_qa.tools.process_text.normalize_answer(s)[source]
Normalize text by removing punctuation, articles and extra whitespace, and lowercasing the text.
- Parameters:
s (str) – Input text to be normalized.
- Returns:
Normalized text.
- Return type:
str
- stark_qa.tools.process_text.pluralize(singular)[source]
Return the plural form of a given lowercase singular word (English only).
- Parameters:
singular (str) – Singular word.
- Returns:
Plural form of the word.
- Return type:
str
- stark_qa.tools.process_text.recall_score(prediction, ground_truth)[source]
Calculate the recall score between prediction and ground truth.
- Parameters:
prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.
- Returns:
Recall score.
- Return type:
float
- stark_qa.tools.process_text.remove_punctuation(text)[source]
Remove all punctuation from the given text.
- Parameters:
text (str) – Input text from which punctuation will be removed.
- Returns:
Text without punctuation.
- Return type:
str
- stark_qa.tools.process_text.synonym_extractor(phrase)[source]
Extract synonyms for a given phrase using WordNet.
- Parameters:
phrase (str) – Input phrase to find synonyms for.
- Returns:
List of synonyms.
- Return type:
list
stark_qa.tools.seed
- stark_qa.tools.seed.set_seed(seed)[source]
Sets seed