stark_qa.tools

stark_qa.tools.api_lib

stark_qa.tools.api

stark_qa.tools.api.complete_texts_claude(inputs, **kwargs)

stark_qa.tools.api.complete_texts_hf(inputs, **kwargs)

stark_qa.tools.api.get_gpt_outputs(inputs, **kwargs)

stark_qa.tools.api.get_llm_output(message, model='gpt-4-0125-preview', max_tokens=2048, temperature=1, json_object=False)[source]

A general function to complete a prompt using the specified model.

Parameters:

message (str or list) – The input message or a list of message dicts.
model (str) – The model to use for completion.
max_tokens (int) – Maximum number of tokens to generate.
temperature (float) – Sampling temperature.
json_object (bool) – Whether to output in JSON format.

Returns:

The completed text generated by the model.

Return type:

str

Raises:

ValueError – If the model is not recognized.

stark_qa.tools.api.get_llm_outputs(inputs, **kwargs)

stark_qa.tools.api.parallel_func(func, n_max_nodes=5)[source]

A general function to call a function on a list of inputs in parallel.

Parameters:

func (callable) – The function to apply.
n_max_nodes (int) – Maximum number of parallel processes.

Returns:

A wrapper function that applies func in parallel.

Return type:

callable

stark_qa.tools.args

stark_qa.tools.args.load_args(args_dict)[source]

Convert a dictionary into an argparse.Namespace object.

Parameters:: args_dict (dict) – Dictionary of arguments to be converted.
Returns:: Namespace object with the arguments.
Return type:: argparse.Namespace

stark_qa.tools.args.merge_args(args_1, args_2)[source]

Merge two argparse.Namespace objects. Arguments from args_2 have higher priority.

Parameters:

args_1 (argparse.Namespace) – First namespace object.
args_2 (argparse.Namespace) – Second namespace object.

Returns:

Merged namespace object.

Return type:

argparse.Namespace

stark_qa.tools.download_hf

stark_qa.tools.download_hf.download_hf_file(repo, file, repo_type='dataset', save_as_file=None)[source]

Downloads a file from a Hugging Face repository and saves it to the specified path.

Parameters:

repo (str) – The repository name.
file (str) – The file path within the repository to download.
repo_type (str) – The type of the repository (e.g., ‘dataset’).
save_as_file (str, optional) – The local file path to save the downloaded file. If not provided, saves the file in the current directory with the same name as the original file.

stark_qa.tools.download_hf.download_hf_folder(repo, folder, repo_type='dataset', save_as_folder=None)[source]

Downloads a folder from a Hugging Face repository and saves it to the specified directory.

Parameters:

repo (str) – The repository name.
folder (str) – The folder path within the repository to download.
repo_type (str) – The type of the repository (e.g., ‘dataset’).
save_as_folder (str, optional) – The local directory to save the downloaded folder. Defaults to “data/”.

stark_qa.tools.graph

stark_qa.tools.graph.k_hop_subgraph(node_idx, num_hops, edge_index, relabel_nodes=False, num_nodes=None, flow='source_to_target', directed=False)[source]

Extracts the k-hop subgraph around a given node or a list of nodes.

Parameters:

node_idx (Union[int, List[int], Tensor]) – The central node or a list of central nodes.
num_hops (int) – The number of hops to consider.
edge_index (Tensor) – The edge indices of the graph.
relabel_nodes (bool, optional) – If True, the nodes will be relabeled to a contiguous range. Defaults to False.
num_nodes (Optional[int], optional) – The number of nodes in the graph. Defaults to None.
flow (str, optional) – The flow direction (‘source_to_target’, ‘target_to_source’, ‘bidirectional’). Defaults to ‘source_to_target’.
directed (bool, optional) – If True, the graph is treated as directed. Defaults to False.

Returns:

The node indices, the edge indices, the indices of the original nodes, and the edge mask.

Return type:

Tuple[Tensor, Tensor, Tensor, Tensor]

stark_qa.tools.graph.relabel_graph(subset, edge_index, num_nodes)[source]

Relabels the nodes in the graph to a contiguous range.

Parameters:

subset (Tensor) – The subset of nodes.
edge_index (Tensor) – The edge indices of the graph.
num_nodes (int) – The number of nodes in the graph.

Returns:

The relabeled edge indices.

Return type:

Tensor

stark_qa.tools.io

stark_qa.tools.io.load_files(save_path)[source]

Load all files from a specified directory.

Parameters:: save_path (str) – Directory to load the files from.
Returns:: Dictionary with filenames (without extension) as keys and file contents as values.
Return type:: dict

stark_qa.tools.io.read_from_file(file_path)[source]

Read content from a file based on its extension.

Parameters:: file_path (str) – Path to the file.
Returns:: Content of the file.
Return type:: content
Raises:: NotImplementedError – If the file type is not supported.

stark_qa.tools.io.save_files(save_path, **kwargs)[source]

Save multiple files in a specified directory.

Parameters:

save_path (str) – Directory to save the files.
**kwargs – Keyword arguments where keys are filenames (without extension) and values are the contents.

stark_qa.tools.io.write_to_file(file_path, content)[source]

Write content to a file based on its extension.

Parameters:

file_path (str) – Path to the file.
content – Content to write.

Raises:

NotImplementedError – If the file type is not supported.

stark_qa.tools.node

class stark_qa.tools.node.Node[source]: Bases: object

stark_qa.tools.node.df_row_to_dict(row, column_names=None)[source]

Convert a row of a DataFrame to a dictionary.

Parameters:

row (pandas.Series) – A row of a DataFrame.
column_names (list, optional) – The list of column names. Defaults to None.

Returns:

A dictionary that contains the same information as the row.

Return type:

dict

stark_qa.tools.node.dict_tree(dictionary, indent=0)[source]

Create a visual tree representation of a dictionary.

Parameters:

dictionary (dict) – The dictionary to represent as a tree.
indent (int) – The current indentation level.

Returns:

A string representing the dictionary as a tree.

Return type:

str

stark_qa.tools.node.register_node(node, dictionary)[source]

Register a dictionary into a Node object.

Parameters:

node (Node) – The node to register the dictionary to.
dictionary (dict) – The dictionary to register.

stark_qa.tools.process_text

stark_qa.tools.process_text.chunk_text(text, chunk_size)[source]

Split text into chunks of specified size.

Parameters:

text (str) – Input text to be chunked.
chunk_size (int) – Size of each chunk.

Returns:

List of text chunks.

Return type:

list

stark_qa.tools.process_text.clean_data(item)[source]

Clean the text data.

Parameters:: item (Union[str, list, dict]) – An object that contains text data which is cleaned iteratively.
Returns:: The cleaned data in the same format as item.

stark_qa.tools.process_text.clean_dict(dictionary, remove_values=['', 'nan'])[source]

Clean the dictionary by removing specific values.

Parameters:

dictionary (dict) – A dictionary to be cleaned.
remove_values (list) – List of values to remove from the dictionary.

Returns:

Cleaned dictionary.

Return type:

dict

stark_qa.tools.process_text.compact_text(text)[source]

Compact the text by removing unnecessary spaces and punctuation issues.

Parameters:: text (str) – Input text to be compacted.
Returns:: Compacted text.
Return type:: str

stark_qa.tools.process_text.decode_escapes(s)[source]

Decode escape sequences in a string.

Parameters:: s (str) – Input string with escape sequences.
Returns:: Decoded string.
Return type:: str

stark_qa.tools.process_text.exact_match_score(prediction, ground_truth)[source]

Calculate the exact match score between prediction and ground truth.

Parameters:

prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.

Returns:

Exact match score.

Return type:

float

stark_qa.tools.process_text.f1_score(prediction, ground_truth)[source]

Calculate the F1 score between prediction and ground truth.

Parameters:

prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.

Returns:

F1 score.

Return type:

float

stark_qa.tools.process_text.normalize_answer(s)[source]

Normalize text by removing punctuation, articles and extra whitespace, and lowercasing the text.

Parameters:: s (str) – Input text to be normalized.
Returns:: Normalized text.
Return type:: str

stark_qa.tools.process_text.pluralize(singular)[source]

Return the plural form of a given lowercase singular word (English only).

Parameters:: singular (str) – Singular word.
Returns:: Plural form of the word.
Return type:: str

stark_qa.tools.process_text.recall_score(prediction, ground_truth)[source]

Calculate the recall score between prediction and ground truth.

Parameters:

prediction (str) – Predicted text.
ground_truth (str) – Ground truth text.

Returns:

Recall score.

Return type:

float

stark_qa.tools.process_text.remove_punctuation(text)[source]

Remove all punctuation from the given text.

Parameters:: text (str) – Input text from which punctuation will be removed.
Returns:: Text without punctuation.
Return type:: str

stark_qa.tools.process_text.synonym_extractor(phrase)[source]

Extract synonyms for a given phrase using WordNet.

Parameters:: phrase (str) – Input phrase to find synonyms for.
Returns:: List of synonyms.
Return type:: list

stark_qa.tools.seed

stark_qa.tools.seed.set_seed(seed)[source]: Sets seed