Help

1. What is this place?

A.: The Chemical Structure Lookup Service (CSLS) is meant to work as an address book for chemical structures. It has two major modes of operation: The first is when you submit one or more chemical structures in the form of an SD file, as SMILES strings, or in more than 20 other molecular structure formats CSLS understands. The service will determine whether the submitted structures are present in any of the databases which currently are indexed in CSLS. The second mode is when you submit a document and the service will try to extract all possible chemical information this document might contain - InChI string, InChIKey, SMILES string, molecular formula, or our NCI/CADD Structure Identifiers (uuuuu, FICuS, or FICTS) - and then conducts a search with these extracted chemical data. You can also enter any of these chemical structure identifiers directly. See below for examples of possible searches.

2. Can I see some examples?

A.: Yes, here are some:

3. What are these NCI/CADD Structure Identifiers "uuuuu", "FICuS", or "FICTS"?

A.: The uuuuu, FICuS, and FICTS identifiers are different variants of our NCI/CADD Structure Identifiers and are calculable, unique structure identifiers for small molecule. All of them are based on hashcode calculations built into the chemoinformatics toolkit CACTVS. The identifier represents a given chemical structure with different levels of specificity to the features of a chemcial structure such as tautomerism, counterions, isotopes, charges, stereochemistry, etc. A more detailed discussion about the NCI/CADD Structure Identifiers is available here.

If you like to use the NCI/CADD Structure Identifiers to search the database it is probably a good strategy to perform a search by uuuuu first. The uuuuu being the least chemical feature-sensitive one of our identifiers, it will give you the broadest overview of what is in CSLS for this parent structure. After you have had a look at the results, you may then refine your search by using a specific FICuS (e.g. of a specific stereoisomer). This strategy helps minimize the risk of not finding of what you are looking for because a structure may have been incorrectly (or incompletely) represented in the original database.

4. Can I use wildcards in queries?

A.: Only for InChI. For InChIs, it may actually make a lot of sense to use wildcards because of the layered structure of InChI strings. A InChI string is formed by layers of increasing specificity separated by slashes. For instance, try to look up the following InChI string:

InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)

The search result in CSLS is empty since only an exact lookup of this InChI string is performed. However, you can allow additional layers by adding a wildcard symbol:

InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/*

Now, CSLS reports several structure records in the database with the InChI string:

InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/f/h7H,6H2

All InChI strings contain "/f/h7H,6H2" as an additional layer (fixed hydrogen atom layer). Chemically, they all represent a specific tautomer, whereas the original query

InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)

is a tautomer-invariant InChI string.

5. Can I download all structures in CSLS? Or at least specific databases?

A.: No. The purpose of CSLS is not to provide a source of structures, but to allow the user to quickly lookup where a structure occurs. For publicly available and downloadable aggregated structure sets, we suggest you go, e.g., to the PubChem FTP site.

6. How does "Auto detect" work?

A.: The "Auto detect" mode is trying to make the best guess about the type of the query you have entered and search for the most information available to answer that query. You can submit a single word, number or structure (as SMILES string, SD file etc.) or a complete journal article and it will try to extract all possible chemical information from your query and look it up in our database. While not 100% error-free, it is the recommended mode of search for most common queries.

7. Can I enter several queries at once? Can I mix InChIs, Formulae, etc.?

A.: Yes, you can enter mutiple queries in one search. Please enter them tab- or space-separated. Avoid the usage of commas, semicolons, dashes, slashes, parentheses, periods, quotation marks or plus signs to separate several queries, as the search result might be unpredictable. The following example shows one way of submitting multiple queries:

740 741

Placing individual query values on separate lines is even safer:

740
741

Mixing search values is done best in the same way:

740
C6H6
InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/*

However, the following queries

740,741

and

740, 741

will not work as intended.

If you use a molecular structure file (such as SD files) please note that there are many different "dialects" of these file formats used; our parser may not recognize all of them, especially if they violate some syntax rules. Likewise, incorrect chemistry (such as pentavalent carbon) may lead to unexpected results. Similar limitations pertain to search structures by submitted SMILES strings. Auto-detect will try to pick SMILES out of a mixed context (such as a journal article), but its capabilities are limited. For example, it will not be able to automatically recognize metallo-organic SMILES

8. Why wasn't my formula found, I know it should be there?

A.: One common mistake is not to use upper/lower case correctly in formulas. Element symbols are case-sensitive! For example, "CO" is not the same as "Co" - the former is carbon monoxide, the latter cobalt. And "co" is not a meaningful chemical formula at all.

Note: For short formulas like the just mentioned examples it also important to use the Formula search mode instead of the Auto detect search mode in CSLS. The Auto detect mode gives the interpretation of a search value as SMILES a higher precedence than the interpretation as formula, i.e. "CO" is regarded as the SMILES representation of ethanol, "Co" and "co" are considered as SMILES representation of formaldehyde (in case you like to repesent carbon monoxide or cobalt as SMILES string you have to type them as "[C-]#[O+]" and "[Co]", respectively).

For longer formulas it is helpful to use the correct Hill order when inputting your formula. The service will try to convert any formula to Hill order automatically, but may not always be successful.

9. Can I access the service without using the web interface, for instance by SOAP?

A.: Yes, there is a SOAP interface for CSLS available which mimics the functionality of the web service. An example of connecting to the service is:

#!/usr/bin/perl -w
use SOAP::Lite;
open(IN,"file.sdf");
while ($line=<IN$gt;)
{
	$query.=$line;
}
close(IN);
#$query="CCO";  # another possible query
#$query="740";  # yet another possibility
$type="auto"; # auto, Original ID, SMILES, Formula, InChI, InChIKey, uuuuu, FICuS, FICTS

print SOAP::Lite
	-> service('http://cactus.nci.nih.gov/lookup/lookup-soap.wsdl')
	-> run($query,$type);
exit;

It will return a tab-separated list of hits (comparable to the csv file which is downloadable for each search result from the web service).

10. Can I search for a compound by chemical name?

A.: No, currently not - but we plan to offer this functionality in a later version of the service.

11. Can I have my database added to this service?

A.: Yes! Please send an e-mail to Marc C. Nicklaus to discuss the technical details. Generally, we will be happy to add any small-molecule database to which you have the rights and/or which is public.

12. How do you pronounce "CSLS"?

A.: We pronounce it like "sizzles".

Last Update: 2008-06-12