direct data from web and bypassing error

Hello PP USers, I have two questions:

1) Is it possible (how?) to read molecules from webpage and process them in your protocol? i.e., I want to read all the pubchem molecules (~70 millions) directly from the pubchem website (without going through the pain of downloading them all) and run some protocols.

2) I already tried loading some of the directories from pubchem but they keep giving me error of "illegal bond length" while reading in the protocol. How can I bypass these bad molecules and keep the protocol running?

Would appreciate any help.

Thank you

nar