Parallelizing "Molecular Similarity (Tanimoto etc)"

I am performing pairwise molecular similarity calculations for 300,000 molecules. I use two "SD Reader" components to read the same input file. The output from one of the readers is assigned "IsReference := true." The 600,000 records are then sent to the input port of the "Molecular Similarity (Tanimoto etc)" component for calculation. I would like to parallelize the work of the "Molecular Similarity (Tanimoto etc)" component. I have attempted to collapse the component into a subprotocol however this generated an error when run.

Ideally, the "Molecular Similarity (Tanimoto etc)" component would read all 300,000 records having "IsReference := true" as a preprocessing stage. Then, it should read the non-reference set in a linear fashion, processing one record from that set at a time. When run in parallel, each instance of the component would have the full 300,000 reference set and then handle the designed batch size subsets of the non-references linearly.

Unforuntately, it is not clear how parallelization via a subprotocol breaks up the input to "Molecular Similarity (Tanimoto etc)," Is there a technique to ensure that the reference set is sent in full to each parallel instance while the non-reference set is broken up into batches?