Genestack Platform overview¶
Clicking on your username (your email address) in the top right corner of the page will give you access to your profile and allows to sign out of the platform.
In this section you can change your name, password, the name of your organisation and your vendor ID.
Organizations are a way of enforcing group permissions. There are two types of user in an organization – administrators and non-administrators. If you are in the same organization as another user, you can add them to groups you control and share files with them freely. If you are in different organizations, administrators from both organizations first need to approve adding them to the group. You can learn more about data sharing, permissions and groups in the Sharing data and collaboration section.
Vendor IDs are used for application development. Applications you have created will be marked with your vendor ID.
Tasks links to the Task Manager application, where you can monitor running and previous computations.
Data Browser allows you to browse through public, private and shared data with Data Browser application which allows you to search through the wealth of your data using complex queries.
Wherever you are on the platform, you can also access a shortcuts menu by clicking on the button in the top left corner of any platform page. It is an easy way to reach most commonly used applications and folders. Data Browser, Import Data, Manage Applications, Manage Groups, Expression Data Miner, Differential Expression Similarity Search, and Import Template Editor, as well as the folders for created and imported files can all be found here. You can also click the User Guide to access user documentation.
Let’s look deeper into each of these items.
Here you can view the list of all applications available on the platform – both ones you have written as well as public ones.
The Developer button will give you the option to choose which version of an application you want to use.
The ‘minified’ options optimizes loading of CSS and JS used in the application. You can find more details on minifying in blog post by Dino Esposito.
The Session and User dropdown menus allow you to chose the version of the application you want to use for your current log-in session and for your current user account respectively. Inherit is the default option and the order of version choice inheritance is Global → User → Session. If you change the version of an application, you also need to reload it to run the version of your choice.
In order to share data, we use groups. In the Manage Groups section you can change the settings of your current collaboration groups or create new ones and invite other users to join. You can also view and accept all the invitations you have received from other users. Read more about collaboration on Genestack in the Sharing Data and Collaboration section.
In this section, you can change password of your users or create new users. If you click on Manage Users you will go to the user management screen. Every user in Genestack Platform belongs to an organisation. When you signed up to use Genestack via the sign up dialog, we created a new organisation for you, and you have automatically become its first user and its administrator. As an organisation administrator you can create as many new users for your organisation as you want. For instance, you can create accounts for your colleagues. Being in one organisation means you can share data without any restrictions. The user management screen allows you to get an overview of all users in your organisation. You can change a user’s password, make any user an administrator or lock a user out of the system.
You can also create new users. Let’s create a Second User by clicking the Create user button.
You will need to set the user name, email and password. Users added this way are immediately confirmed, and can log in right away.
You can find more about managing users on Genestack from this video.
Here is a list of file types that can be imported into Genestack. Note that gzippped (.gz) and zipped (.zip) files are also supported.
|Genestack file type||Description||Supported file formats|
Continuous Genomic Data
Contains information on continuous
genome statistics, e.g. GC% content
Discrete Genomic Data
Information on discrete regions of
the genome with an exact start and
The file includes the list of genes
and expression pattern (Log FC)
specific to an organism phenotype
with possibly additional annotatio
Stores a list of genes with possibly
Gene Signature Database
A list of annotated gene sets, that
can be used in enrichment analysis
Methylation data matrices contained
Beta-values methylation ratios for
Illumina Infinium Microarrays
Infinium Microarray Data
Raw intensity data files for
Illumina Infinium Microarrays
Reads aligned to a specific
Methylation chip annotation
containing information about
association of microarray probe
to known genes
Annotation file containing
information about association of
microarray probes to biological
entities like genes, transcripts and
Raw microarray data obtained from
a microarray experiment
Files used to annotate metadata
Raw sequencing data
Reference genome sequence for a
specific organism with annotation
FASTA + GFF
FASTA + GTF
FASTA + GFF3
Genetic variations files, storing
gene sequence variations
Import of Gene Expression Signature and Gene List files
If the file contains both gene names and log fold changes, it is imported as Gene Expression Signature. If the file only contains gene names, it is imported as Gene List. The importer will look at the headers of the .tsv file to try to detect which columns may correspond to gene names or log fold changes (common variations are supported such as ‘gene’/‘symbol’ for gene names, and ‘logFC’/’log fold change’ for log fold changes). If it fails to detect them, the user will be asked to manually choose the file type and specify the file headers corresponding to gene names or log fold changes. Gene symbols and Ensembl/Entrez gene IDs are currently supported for gene names.
When you import files that are detected as raw sequencing or microarray data, Genestack automatically creates a dataset, a special type of folder, and adds the assays to it. Additional documents in any format (e.g. PDF, Word, text, etc.) can be imported as attachments to a dataset. We will discuss the use of attachments below. Some types of files, namely Reference Genome, Gene List, Gene Expression Signature, Gene Signature Database, Genetic Variations, Ontology Files, Dictionary, Microarray Annotation, Methylation Array Annotation, Infinium Beta Values, are not wrapped in datasets on import because they are rarely uploaded and processed as batches.
When you perform any analysis on Genestack, other data types, which cannot be imported, can be created such as:
- Affymetrix/Agilent/GenePix Microarrays Normalisation — file with normalized Affymetrix/Agilent/GenePix microarrays data;
- Differential Expression Statistics — expression statistics for change in expression of individual genes or other genomic features between groups of samples, such as fold-changes, p-values, FDR, etc.;
- Genome Annotations — a technical file used for matching GO terms and gene symbols to gene coordinates;
- Mapped Read Counts — file is produced from Mapped Reads and contains the number of reads mapped to each feature of a reference sequence.
There are several ways you can access the Import application:
- using the Import data link on the Dashboard;
- clicking the Import button in the File Manager;
- using an import template. We will describe what import template is and how to use it later in the guide.
Import data consists of three steps: firstly, temporary Upload files with your data are created in the platform; then, the biological data type is assigned to your imported data; finally, you can fill in all required metadata or import it from a text file.
Step 1: Getting data into the platform¶
There are two ways to have your data imported into the platform:
- Upload data from your computer — select or drag-and-drop files.
- Import from URLs (FTP or HTTP/HTTPS) — specify URLs for separate files or directories.
Furthermore, you can reuse your previous Upload files instead of uploading the same data again: just select existing files with the Use previous uploads option and, then, add more data if necessary. This feature can be useful, for example, when you import a dataset with several samples, one of the files is chosen incorrectly or corrupted, so you would like to replace it. In this case, you need to upload again just one sample and reuse all other previously uploaded files.
What is an Upload file?
The Upload file is a temporary file that is automatically created during the data importing process. The only purpose of the Upload files is to temporarily store the data until the corresponding Genestack files are created and initialized correctly. It is Genestack files that will be further used in bioinformatic data analysis; that is why the platform periodically can remove the Upload files, but no data is lost.
Data uploading from your computer is carried out in multiple streams to increase upload speed. Import from URLs is performed in the background, which means that even while these files are being uploaded, you can edit their metadata and use them in pipelines.
If during uploading you lose your Internet connection, you will be able to resume unfinished uploads later.
Click the Import files button to proceed.
Step 2: Format recognition¶
After your data is uploaded, Genestack automatically recognizes file formats and transforms them into biological data types: raw reads, mapped reads, reference genomes, etc. All format conversions will be handled internally by Genestack. You will not have to worry about formats at all.
If files are unrecognized or recognized incorrectly, you can manually allocate them to a specific data type: drag the Upload file and move it to the green “Choose type” box at the top of the page.
Choose the data type you find suitable:
Click the Create files button to proceed.
Step 3: Editing metainfo¶
During this step, the import has already completed, and you can describe uploaded data using an Excel-like spreadsheet.
By default, you see all metainfo fields available for files, you can fill them or create new custom columns. Click the Add column button, name new metainfo field and choose its type (Text, Integer, etc.):
You can also choose to apply a naming scheme. This allows you to generate file names automatically based on other metainfo attributes.
Metainfo fields can be associated with specific dictionaries and ontologies. We pre-uploaded some public dictionaries such as the NCBI Taxonomy database for the “Organism” field, the Cellosaurus (a resource on cell lines), the ChEBI for chemical compounds, and the Cell Ontology (cell types in animals).
We also created our own controlled vocabularies to cover Sex, Method and Platform fields. You can find out more about ontologies in the Managing metadata section.
Import with templates¶
You can create your own custom dictionary by importing it into the platform as OWL, OBO or CSV file and attach it to the import template.
What is an import template?
Import templates allow you to select what metainfo attributes of your imported files will be tightly controlled (so you don’t lose any information in the process). Import templates allow you to set default fields for file metadata based on file type (e.g. Datasets, Discrete Genomic Data, Genetic Variations, etc.). Of course, if you’re only importing mapped reads, you don’t need to specify metainfo attributes for other data types.
You can select which import template to use in two ways: from the Dashboard, or during the 3rd step of the import process by right-clicking on the import template name (“Default template” is for the public one). You can create a copy of existing import templates with Make a copy option in the context menu.
Genestack will attempt to fill metainfo fields automatically, but you can always edit the contents manually during the import process. By using metainfo templates you can make sure that all of your files will be adequately and consistently described so you will not lose any valuable information. For example, here is the list of metainfo attributes used by default to describe Reference Genome data:
Import template editor application allows to modify existing import templates and create new ones with proper metainfo fields, requirements and controlled vocabularies. To access the application right-click on a template’s name and select the Import template editor from the “Manage” submenu. To create new template on the basis of the default one you can also click Add import template one the Dashboard.
Now let’s say you wish to create an import template to control the metainfo attributes of raw reads (e.g. you always need to know the tissue and sex of your samples). In order to do this, click on Add import template, then look for the table related to Raw Reads and for the fields “tissue” and “sex”, change the required fields to Yes. As you can see, the system controls what type of information can you put into your metainfo fields. In this case, for tissue the system will map your entries to the Uberon ontology (an integrative multi-species anatomy ontology) and the metainfo type must be text.
If you want to add other metainfo fields that are not included in the table already, you can do this at the bottom of the table where there are blank spaces. For each entry, you must specify whether or not this field is required and what is its metainfo type (e.g. text, yes/no, integer).
If you are using a file kind that is not yet listed, you can add a new one by clicking on the Add file kind button. Keep in mind that file kinds are defined in Genestack — you will not be able to create a template entry for a file kind that is not used on the platform.
When you are done, click on the blue Import using this template button. This will take you to the Import Data app, where you can go through the three import steps described above. You can find all the imported files in the “Imported” folder which can be accessed from the Dashboard and from the File Manager.
Apart from editing metainformation manually, you can also import and validate the metainfo attached to the assays and to the dataset on the platform.
Click Import data from spreadsheet button and select a local CSV or Excel file containing metadata you would like to associate with the imported files.
Note that names in the first column in the file with metadata should exactly match names of the data samples on the platform, based on the first “Name” column. For example, in our case metainfo for the second sample does not match to any assays and is highlighted in red.
Use the Select file option to manually allocate the imported metadata to an appropriate file.
Columns that are mapped to a metainfo field from the dataset’s template (by default data are imported with “Default” template) are highlighted in green.
On this step for each column you can specify whether it should be imported or not, and if it should be mapped to some metainfo key from the import template, by clicking on the column header.
Click Import when you finish editing the table. As a result, the table on the Metainfo Editor page is filled in with metadata from the Excel-file.
While importing a dataset into Genestack, you can also attach various files to it such as, for example, a PDF file with the experiment plan or an R script, etc. When you open your newly-imported datasets, all of the attachments will accompany it. They will be safely stored on Genestack, so later you can download them from the platform, in case they get lost on your computer.
How to upload an attachment?
Attachments should be uploaded together with the dataset. In the Data Import application, choose the attachments from your computer along with your dataset. The platform will recognize the raw data, and all additional files that were unrecognised will be added to the dataset as attachments.
Besides, you can upload more attachments, or remove attachments in the Metainfo Editor.
Efficient data search and browsing are at the core of Genestack. The platform provides rapid access to private, shared, and public data and analysis results.
Our platform provides you with a rich collection of freely accessible datasets that we imported from various well-known repositories, such as GEO NCBI, ENA, SRA and Array Express. Data is synchronized regularly from these databases, keeping things up-to-date. There are currently more than 3 million sequencing and microarray assays from over 100,000 public datasets indexed in Genestack. All the public datasets and assays are accompanied by original metainformation describing biological data. Generally, this information is not standardized that makes operations with biological data, like browsing data and combining assays from several datasets or reproducing some analysis, difficult or even impossible without human participation. To harmonize raw metadata we apply automated curation where we map raw entries to controlled terms that we store and maintain in special files called Dictionaries. To prepare these Dictionaries we adopted terms from external ontologies or created them manually. You can also use our standardized and unified terminology to describe your own data or analysis results.
The Data Browser allows to browse these public datasets, as well as your private data and the data shared with you on Genestack. You can access the Data Browser either from the Dashboard or the Shortcuts menu on the left-hand side.
You can search relevant data with a free-text query, and you can further filter down datasets by metadata attributes using the checkboxes on the left. These attributes are generated based on the metadata associated with datasets. For instance, you can set the filters “Access”, “Method” and “Organism” to “Public”, “Whole Exome Sequencing”, “Mus musculus”, respectively, to filter out publicly accessible data on mice obtained from mouse WES data.
Data Browser allows you to find bioinformatics analyses results associated with raw data. If there are analysis performed on a given dataset, and you have access to these results (i.e. they are yours, or they are shared with you), then you will find both intermediate results and reports in the column Downstream.
Then, you can merge data from several datasets into a single combined dataset or share several datasets with your collaborator together. To do so you should select several datasets and choose on a “Briefcase bar” that appears at the bottom of the screen Merge… button or Share… button, respectively.
If not all the samples meet your searching criteria, feel free to create a subset of a dataset with matching samples and process them separately. To do so, click a link showing the number of matching files in the Data Browser column Matched, then, click Make a subset with matching files button to save files matching to the set filters. You can also make a subset on the Metainfo Editor page.
Clicking on the name of any of the datasets will take you to the Metainfo Editor, where you can view (and possibly edit) the metadata of this dataset and its assays.
Besides,directly from the Metainfo Editor page you can start building pipeline step by step via the button Analyse.
If you want to analyse some part of your dataset, select samples and click the Make a subset button (by default all subsets are created in the folder My datasets).
Click a subset name to open it with Metainfo Editor application and edit its metainformation if it is needed.
If you are an owner of a given dataset, you can add more samples to your dataset by clicking on Upload more files button.
Besides, you can remove files from a dataset: select files you want to exclude and click Remove files from dataset button.
And if you are sure, confirm removing of the data by click Remove button. Remember that if the files you are going to exclude from a dataset are not used anywhere, they will be deleted from the platform without any possibility to restore.
If your dataset is made from subsets of other datasets, use metainfo filters in File Provenance. Open the dataset in File Provenance to see based on which metadata samples were selected, and, therefore, you can be sure that no significant data was omitted.
Like on any operating system, the File Manager is where you can easily access all of your files, organise them into folders and open them with various applications.
The panel (tree view) on the left-hand side is the file system navigator. Here you can see many different folders. Some special folders are worth mentioning:
Created files is the folder where any new file created by an application on Genestack goes.
Imported files is where imported data goes, organized by date: all files imported at the same time (during one import action) will be located in the same folder.
Uploads contains all the files you have uploaded into Genestack — FASTQ and BAM files, pdf documents, excel tables etc.
What is the difference between uploads and imported files?
When you have just started importing your files (in various formats like FASTQ, BAM etc.), they all go to the specific storage area (the “Uploads” folder). During import, Genestack will recognize these uploaded files and allocate them to appropriate biological types (you can also do it manually), e.g. sequencing assays, mapped reads etc. These meaningful biological objects are what you work with on Genestack, and these are located in the “Imported files” folder.
The Exports folder contains data ready for export. See the Data export section for more information.
Shared with me give access to all files that other users have shared with you or that you shared with other users. See the Sharing data and collaboration section for more details.
Public Data folder contains all of the goodies we have preloaded on Genestack to make life a bit simpler for our users. This folder contains:
- Codon tables: currently 18 different tables such as yeast mitochondrial, vertebrate mitochondrial, blepharisma macronuclear etc.;
- Default template: is an import template thai is used by default in data importing process. It provides the list of optional and required metadata fields for each file kind. An ontology or a dictionary can be associated with metadata keys to validate metainfo;
- Dictionaries: dictionaries include terms from external ontologies and are used to curate and harmonize metainfo, e.g. sex, platform, NCBI taxonomy.
- Example results: so you can play around with our platform and see what types of visualizations are available;
- External databases: sets of sequences with associated annotation, e.g. Greengenes for 16S rRNA;
- Genome annotations: for a range of different organisms and platforms (for WES analysis);
- Microarray annotations: annotation lists to be used as the translation table to link probes and common public domain sequences;
- Public analyses: all files created during re-analysis of previously published datasets;
- Reference genomes: various reference genomes for the most commonly analysed organisms;
- Public data flows: all data flows available to our users, including tutorial data flows and the ones found on the Dashboard;
- Public experiments: this is a feature we are particularly proud of have pre-loaded the platform with thousands and thousands of publicly available datasets, from public repositories such as GEO, ArrayExpress, SRA, and ENA. Currently, we have more than 110,000 datasets in our database.
- Tutorials: the folder contains files we use as examples during various tutorials.
To access the context menu for a given file, you can either do a right or left click on the respective entry in the file browser. The topmost entry is the application that was used to generate this file or the application that should be used to view it. The next four entries are submenus for each of the four different types of applications that can be used on the file. Further down are options for viewing and re-using the pipeline used to generate the file. The final section allows you to manage file locations and names. For folders, left-clicking opens the folder, while right-clicking opens the menu. The Add to and Move to action allow you to link or move a file to a chosen directory.
This does not perform a copy
We use the word “linking” and not “copying” in this context because in Genestack, adding a file to a folder does not physically create a duplicate of that file (unlike copy-pasting in your traditional operating system). It just adds a link to that file from the folder (similar to symbolic links on UNIX).
Show all parent containers shows you a list of all the folders in which the current file is linked. The file accession is a unique identifier attached to each file. Unlike other metainfo attributes, it will never change for any file.
Above the File Manager pane, you can find the Import button. Clicking it takes you to the Import application page, where you can upload your files, import them into the platform and edit their metainfo.
Next to the Import button, you can see a New Folder button. Using it you will be able to create a new folder wherever you want. Another option — New folder with selection — appears when you have selected files and want to put all of them in a separate folder.
The Preprocess, Analyse, Explore and Manage menus at the top of the page correspond to the four groups of applications that can be used to process and view data. These menus will become available when you select a file.
When you choose a file, the system will suggest applications which can work with the specific file type (e.g. sequencing assay). However, you still need to think about the nature of the data. For instance, if you want to align a raw WGBS sequencing assay, Genestack will suggest several mappers, but only the Bisulfite Sequencing Mapping application will be suitable in this case. To figure out what applications are recommended to process WGBS, WES, RNA-Seq or other sequencing data, go to the Applications review section of this guide.
File search in the top-right corner allows you to search for files by metadata (names, organism, method). To limit the search by file type or whether or not the file is shared with you, click on the arrow inside the search box.
Below the search box is a button to access your briefcase. Your briefcase is a place where you can temporarily store files from various folders. To delete an item from your briefcase hover over it and click on the “x” button. To clear all items from the briefcase, select the “Clear all” option.
To add files to your briefcase, hover over each individual file and use the special “briefcase” button or select several files, right-click on them and choose “Add to briefcase…”.
If you select a file, three additional buttons will show up, allowing you to share, delete the file or view metainfo (an “eye”-icon) for the file.
Use the Share button to share your data with colleagues (the share button will not be available if you are using a guest account). Read more about sharing on Genestack in the section Sharing data and collaboration.
The Delete button allows you to remove your files from the system.
The View metainfo button gives you more information about the file: technical (file type, its owner, when the file was created and modified, etc.), biological (e.g. cell line, cell type, organism, etc.), and file permissions.
Metainfo Editor application enable you to explore metadata for datasets or standalone files. Besides, if you have enough permissions, you can edit metadata or import it from spreadsheet in .xls, .xlsx, .csv formats. You can access Metainfo Editor from anywhere in the platform via the context menu. Moreover, metadata editing is the last step in the data importing process (see Import section for more information). Metadata of the files are shown in Excel-like tables where columns represent metainfo fields, such as ‘Organism’, ‘Cell line’ or ‘Platform’.
Edit metadata manually¶
By default a metainfo data table is based on Default Import Template that, however, you can easily replace with a custom one (learn more about templates in the section Importing data). To do so click on the template’s name, select Change template, and select the template you want in the pop-up window.
When you start typing in the corresponding cell, you will be suggested with terms from our controlled dictionaries if possible. You are free to enter any values, however we encourage you to use our standartized terminology, that helps you to avoid typos and harmonise metadata.
Furthermore, you can add several terms to one metadata field for each file. To do so enter the first term as usual, click the button Add another and either add one of the existing fields or create your own one (i.e. custom key).
If you create new metadata field, you also need to specify its type: for example, for free-text values you should select “Text”, and for numeric value you should use “Integer” or “Decimal” one.
Click column name to sort metadata or delete the selected column if needed.
Import metainfo data from your computer¶
To begin, click the Import data from spreadsheet button. Then, choose a CSV, XLS or XLSX file with metadata that you would like to attach.
Make sure that names of samples in the imported file are the same as the ones shown in the column “Name” in Metainfo Editor application. Otherwise, all not matching information in the imported file will not be imported. It will be marked in red, so you could easily fix it by clicking on “Select file” link.
During metadata import process you can also decide whether a column should be imported and associate it with another metadata field by click on the name of the column.
Compose file names using metainfo keys¶
When you complete describing your samples, you can use the metadata to name them. Click Apply naming scheme button and select metainfo fields that you want to use to create names.
Make a subset¶
If you just want to analyse some samples from a dataset, you can make a subset. There are two ways of making subsets: select samples you want to analyse using checkboxes, and click Make a subset; or you can open metainfo summary and specify metainfo values that will be used as a rule to create a subset and filter out all non-matching files.
Once you are happy with the metadata for your files, you can proceed to analyse them by clicking the button Use dataset. You can use the suggested visualize applications to explore your files, like “FastQC Report” to check the quality of raw reads, use on of the existing public data flows or build your own pipeline by adding applications step-by-step. Moreover, you could share the files with your collaborators and add them to a folder of your choice.
Access control model¶
There are three concepts around access control in Genestack: users, groups and organisations. Each user belongs to a single organisation (typically corresponding to the user’s company or institution, or a specific team within the institution). Organisations have two types of users: regular users and administrators, who have the right to add new users, and deactivate existing ones.
To check which organisation you belong to, you can go to the Profile page, accessible via the menu which opens when you click on your email address at the top-right corner of any page.
If you are an administrator of your organisation, the shortcut menu appeared when click on the Genestack logo will also have an additional item, Manage Users, which takes you to the organisation’s user management page.
From there, administrators can add or disable users, and reset passwords.
Sharing in Genestack is done through groups: every user can create any number of groups, and add other users to them. Each file in the system can be shared with any number of groups, who are granted different permissions (read-only, read and write, etc.).
To manage your groups, click on the Genestack logo at the top-left corner of any screen and select Manage groups.
From there, you can create groups using the Create group button, add or remove people from groups, and change users’ privileges within groups. By default, you will be a group administrator of any group that is created by your user.
If you are an administrator of a group, you can click the Add member button to add people to a group. From there you will be prompted for the e-mail of the user you want to add. If they are in your organisation, you will be provided with autocomplete.
Can I add users from other organisations?
You can also add users from other organisations to a group (“cross-organisation group”). However, in that case, every user invitation will need to be approved by an organisation administrator of both your organisation and the other user’s organisation.
Once you have added a user from your organisation to the newly created group, you will also be able to set up their permissions within the group. Within a group, a user can be:
- Non-sharing user (can only view data shared with the group);
- Sharing user (can view data shared with the group, and share data);
- Group administrator (all of the above, and can add/remove users to the group and change users’ privileges).
By default, newly added users will be granted the lowest permission level (Non-sharing user). You can change that using the drop-down next to their name.