The Apollo 'Replaced Models' field - explanations and examples

Using the i5k Workspace ‘Replaced Models’ field

 

 

Contents:

 

 

 

What is the ‘Replaced Models’ field?

  • The ‘Replaced Models’ field is a new field in the Information Editor where you are required to enter the name or ID of the gene model that your manually curated model replaces.

intro 2.png

 

What is the ‘Replaced Models’ field for?

  • The ‘Replaced Models’ field comes into play when we generate an official gene set for the genome assembly (OGS - a single, non-redundant set of gene models that is the official representation of the gene content). In most cases in the i5k Workspace, the OGS will be generated by merging one set of computational gene predictions (usually output from MAKER, e.g. LDEC_v0.5.3-Models) with the manually curated models from Apollo after the end of the manual curation period.

  • The gene model ID(s) that you enter in the ‘Replaced Models’ field tell us which computational gene predictions should not be included in the OGS - your manual curations will replace them. (Credits to Dan Hughes at BCM for the idea).

 

Does the requirement for using the ‘Replaced Models’ field apply to the genome assembly that I’m annotating?

  • If the ‘Replaced Models’ field is present in the Information Editor, then yes, it does.

 

Show me some examples.

  • Simple replacement: The manual curation is the same as the model from LDEC_v0.5.3. Therefore, we enter the LDEC_v0.5.3 model name in the ‘Replace Models’ field.

 

simple replace.png

 

 

simple replace b cropped.png

 

  • Adding a new model that’s not included in the computational gene predictions. Here, we’re adding a model from the Augustus evidence track, but there’s no corresponding model in the LDEC_v0.5.3 gene predictions track. Because we’re not using the Augustus track in the OGS merge, and there are no models in LDEC_v0.5.3 that should be replaced, we add ‘NA’ to the ‘Replaced Models’ field.

No replace2.png

 

 

merge demo cropped.png

 

  • Splitting a model. The two newly generated models replace the single LDEC_v0.5.3 model that they were generated from.

 

split demo 1 cropped.png

 

split demo 2 cropped.png

 

 

  • Outlier case: curating a model inside another model’s UTR or intron. In this case, the curated model does not share coding sequence (CDS) with the overlapping LDEC_v0.5.3 model. The LDEC_v0.5.3 model does not necessary have to be replaced, because it is not considered an isoform of the curated model. Therefore, we enter ‘NA’ into the ‘Replaced Models’ field. (If you have additional evidence that would suggest that the LDEC_v0.5.3 model should be deleted, please drag the model into the user-created annotation track, and select 'Status -> Delete'.)

 

model in utr cropped.png

 

 

  • Outlier case: curating a model on the opposite strand. In this case, the curated model does not share coding sequence (CDS) with the overlapping LDEC_v0.5.3 model. The LDEC_v0.5.3 model does not necessary have to be replaced, because it is not considered an isoform of the curated model. Therefore, we enter ‘NA’ into the ‘Replaced Models’ field. (If you have additional evidence that would suggest that the LDEC_v0.5.3 model should be deleted, please drag the model into the user-created annotation track, and select 'Status -> Delete'.)

 

 

What happens if I forget to add the ‘Replaced Models’ information?

  • We’ll screen for it at the end of the annotation period. If it’s not present, we’ll send an email to you with a URL for the model, asking for you to include it.

  • We’ll remind you once. If you don’t add the information after that, we will not include your annotation in the OGS.

 

Should I use the ‘Replaced Models’ field in the ‘gene’ or ‘mRNA’ section of the Information Editor?

  • We recommend that you use ‘mRNA’ section due to a Web Apollo behavior that, in rare cases, assigns the wrong gene to an mRNA (the Apollo development team is working on a fix). However, if you’re annotating a lot of isoforms for a gene model, adding the ‘Replace Model’ information to each mRNA will get cumbersome - so use the ‘gene’ section in this case.

 

I have a suggestion for the ‘Replaced Models’ field.

 

I still don’t understand how to use the ‘Replaced Models’ field.