How to Upload a query datasetBack to top
Formatting Data for EXALT uploadBack to top
The EXALT website performs gene expression signature based microarray data set comparisons. If you have made your microarray data set in a simple table format or GEO soft format, you are ready to use EXALT with the following two-step protocol.
- Step 1 - Upload your microarray data set to the EXALT server and obtain your data set tracking ID.
- Step 2 - Retrieve your EXALT result using your unique data set tracking ID.
The following is a brief description of the various data attributes required for using EXALT.
1. Group Number (Back to top)
Based on the experimental deisgn you have to select the number of groups in your data set. Every group needs to have at least two biological replicate samples for analysis. EXALT requires at least 2 groups and at most 9 groups. If you have more than 9 groups in your data set, you can either split your input file into multiple sub-datasets or contact us for help.
For any given experiment make sure to group the hybridizations into related groups. For Example:
- Normal vs. Disease comparison
- Treated vs Untreated comparison
- Time Course
- Dose Response
- Effect of gene knock-out
- Effect of gene knock-in
2. Important pre-conditions for EXALT (Back to top)
Your microarray expression data files must be pre-processed, e.g. well-measured, normalized, and summarized.
It is your responsibility to decide whether a spot, or an entire microarray has good quality or not. You should also make sure that groups of samples within your experiment are comparable.
There are many sources of systemic variation in microarray experiments that affect the measured gene expression levels. These variations can be removed through the use of normalization, of which there are many techniques.
Sources of systemic variation will affect different microarray experiments to different degrees. Thus to compare microarrays, it is important to remove as much of the systemic variation as possible, to bring the data from the different groups to a comparable level.
3. SOFT Format (Back to top)
The "SOFT" data format is defined in NCBI Geo as "Simple Omnibus Format in Text (SOFT) which is designed for rapid batch submission (and download) of data". An example SOFT from Geo is listed as a SOFT format example (GDS167.soft).
For example the following is the format of GDS167.soft

4. File Format (Back to top)
EXALT supports three popular file formats (click to see an example file), CSV (*.csv) and tab-delimited text (*.txt or *.soft) files. You can edit your file in MS EXCEL or any text editor, then save them with corresponding file extensions. Please note that text files (*.txt or *.soft) have to use ANSI or UTF-8 encoding not Unicode.
Importantly, the file must encode row 1 as a head field, and column 1 and 2 are reserved for gene ID columns, such as local Probe ID and public GenBank Accession Number or Gene Symbol (required field).
Unlike SOFT format, a user edited CSV or TXT file has only a one line head to define reserved gene ID columns and all samples in the data set. For example, a simple experiment with two groups (wt and tg), and 3 biological replicates per group (wt1,2,3 or tg1,2,3), that was done on afftmetrix chips and summarized as following:
| affyProbeID | accNum | wt1 | wt2 | wt3 | tg1 | tg2 | tg3 |
|---|---|---|---|---|---|---|---|
| AFFX-MurIL2_at | M16762 | 3 | 2.4 | 0.9 | 0.8 | 0.7 | 0.5 |
| AFFX-MurIL10_at | M37897 | 4.6 | 0.7 | 0.7 | 7.8 | 1 | 2 |
| AFFX-MurIL4_at | M25892 | 11.4 | 7.4 | 9.3 | 44.5 | 63 | 34 |
| AFFX-MurFAS_at | M83649 | 36.6 | 27.7 | 40.2 | 67.6 | 48.6 | 78.1 |
| AFFX-BioB-5_at | J04423 | 71.6 | 79.4 | 56.6 | 67.9 | 84.8 | 91.6 |
| AFFX-BioB-M_at | J04423 | 144.1 | 144.7 | 147.5 | 144 | 171.9 | 229.4 |
| AFFX-BioB-3_at | J04423 | 110.9 | 124.2 | 125 | 101.9 | 141.8 | 201.5 |
| AFFX-BioC-5_at | J04423 | 196 | 222.4 | 188.9 | 183.1 | 222.1 | 300 |
| AFFX-BioC-3_at | J04423 | 155 | 156.8 | 140.1 | 153.6 | 188.9 | 207.2 |
| AFFX-BioDn-5_at | J04423 | 278.2 | 271.5 | 237.2 | 251.3 | 343.4 | 374.5 |
| AFFX-BioDn-3_at | J04423 | 912.8 | 1053.7 | 824.5 | 875.7 | 1240.9 | 1406.9 |
| AFFX-CreX-5_at | X03453 | 1674.1 | 1657.8 | 2043 | 1599.2 | 1385.5 | 2569.2 |
| AFFX-CreX-3_at | X03453 | 2060.5 | 2391.2 | 2248.5 | 2085.1 | 2364.6 | 3267.7 |
| AFFX-DapX-5_at | L38424 | 0.4 | 0.5 | 3.6 | 2.8 | 0.5 | 0.9 |
| AFFX-DapX-M_at | L38424 | 0.6 | 1.8 | 2.6 | 0.7 | 5.5 | 1 |
| AFFX-DapX-3_at | L38424 | 0.4 | 0.7 | 0.4 | 0.9 | 0.7 | 2.4 |
| AFFX-LysX-5_at | X17013 | 2.4 | 0.5 | 0.7 | 0.9 | 1.5 | 0.4 |
Uploading a Dataset into EXALTBack to top
Before uploading your dataset ensure that it is properly formatted.
Getting Started (Back to top)
Begin Uploading your dataset by choosing the Uploading a query data set link on the navigation bar. After this you will see this page:
At this point you will want to select the number of groups that are in the dataset that is being uploaded and then to specify the location of the query file you are uploading to the website. After these two steps are completed click the Upload File button.
Annotating the Query to be Uploaded (Back to top)
After the file has been uploaded into the EXALT program it then becomes necessary to annotate your query and a page that looks like this will appear:
At the top of the page you should see the name of the file you just uploaded and then after that there are 11 fields that need to be filled out. The following is a description of each field.
- Dataset Name
- This will be the identifier that is used in the Database for identifying the query. Note: you can only use numbers and letters in this field. Spaces or other escape characters (such as %, #, @,!,...) will be removed during uploading. The user can choose any meaningful name for their dataset. Our current naming convention is to use the primary instituion, then the primary author, an optional identifier to describe the dataset and then finally the short name of the journal accompanied by the volume number and page number.
- Dataset Title
- This field is for the Dataset title, this field can contain any characters and has a max of 100 characters.
- Dataset Description
- This field is for a brief description of the dataset and experiment
- Dataset Platform Technology Type
- This field has two options: single channel and two channel, and these two options refer to if the microarray experiment used a one channel, e.g. an Affymetrix Array, or a two channel with two spots on top of each other.
- Dataset platform organism
- Currently EXALT supports human, mouse, and rat arrays.
- Dataset Sample Count
- This field is filled in automatically, usually it shouldn't be necessary to adjust this field.
- Dataset Update Date
- This field is automatically filled in by the EXALT web service, it should automatically fill in the current date
- Dataset value type
- This field refers to the data transformations performed on the data before upload. Was the data Log transformed or is it still in the base expression values.
- Column 1 Name (ID_REF)
- This field and the next field refer to the first two columns of the input file, There are two fields required for these two fields, the Local Probe ID and the GenBank accession number. They can be interchanged in these two fields but both are required. In SOFT files the ID_REF refers to the local probe ID field.
- Column 2 Name (IDENTIFIER)
- This field and the next field refer to the first two columns of the input file, There are two fields required for these two fields, the Local Probe ID and the GenBank accession number. They can be interchanged in these two fields but both are required. In a SOFT file the IDENTIFIER is the same as the public ID field.
- Sample / Group Assignment
- This is a dynamically generated table of fields where the user can specify the experimental set-up and the layout of the groups (and columns of the upload file). Assign each sample to a group ensuring that every group has at least two members. Samples may be assigned to more than 1 group.
After submitting this page you are taken to a page to confirm that all the dataset annotation is correct. Please check to ensure that everything is correct because it will affect the results of your search.
After again submitting the reviewed information you are a taken to a page that will provide the user with a tracking ID that will be necessary for retrieving the EXALT results after the program has finished running and a summary of the dataset that has been uploaded.
The EXALT uploading process will automatically detect many file errors which it displays in red text for the user.You can see an example here where the SOFT file has an error in the !dataset_sample_count field:
Retrieving EXALT ResultsBack to top
To retrieve your EXALT report you must have a dataset tracking ID, which is provided after the user has finished uploading a dataset. Our policy is that results should be ready within 48 hours. If the results are not ready by theis time please email us.
To retrieve the results from the EXALT service, use your dataset tracking ID you obtained during the uploading process (alternatively for testing purposes you can use the ID from our test dataset gds167@070212020932In ). You will want to copy paste this tracking ID into the final field of the search bar on the EXALT website, as shown in the picture below. In the first field choose the EXALT (Tracking ID) field, and then in the second field choose EXALT report, as shown below. Once you have filled out all three fields submit your query using the GO button on the right.
After submitting your search you should be redirected to a page with a link to a zipped folder containing your results which you can download and from there begin to examine your results for interesting signature matches.
Interpreting EXALT ResultsBack to top
The output from the EXALT program is three hyperlinked html files that list the matched signatures in varying levels of detail
The first file is the InterpretABdsIndex.html file which lists the other datasets that the original query matched with a p-value that is computed from the total number of matches to the many possible signatures inside that dataset. In our example file GDS167 there is only a self-match but this can be seen below.
The second file is the ABseqIndex.html file, or the Report Signature Index. This file lists the query and the number of significant genes from the query (each group comparison generates a possible query, so if you have 4 groups your experiment can have up to 6 queries) that it used to query the EXALT database. Then it lists the datasets that each query matched, and within those datasets which group comparisons the query matched and to what degree, based on p-value.
The last file is the ABSum.html file, which lists the full details for each query match. So the entries list every match for every query and then the details of each match. Here is the output from our example dataset.
Further information about the details of the matches and how the significance of each match is calculated can be found in the paper.