Format of xls data file

Hi All

I have the output of several thousand organic acid MS procedures and need to import the data into a structured database for further analysis. The files I have available are xls (as attached) and As far as I know these are the output from an AGILENT 5975C & 7890A mass spec.
I was expecting to find data that gives me abundance vs m/z ratio i.e. simple xy values but the xls files have several data points for each compound. 
- Is there documentation available that describes the xls format?

- Is there a software library that allows me to import the files so I don't have to parse the xls files?


BTW there are also a whole bunch of other files such as ini, res and txt files.


Any advice appreciated. 


  • Hi


    Do you need simple Choromatogram in XY-format for several datafiles ?


    Do you need every spectra in datafile listed in XY format (M/Z:Abundance) ?


    I suppose you were looking the later. That should be in excel readable format (like CSV or TSV), so that every spectra is in one line and excel file should have one run on sheet, right?

  • If you look something like this screenshot below,  I have a tiny macro what can do the job for you (Expecting you use MSD Chemstation (because your Excel file has data path as C:\msdchem\1\DATA\2017\Organic acids\April 2017\25 April 2017\). It has in one sheet, scans picked from one run, one scan per line.

  • Hello,


    Could you please provide us with the software product you are using (MSD ChemStation, MassHunter Qual/Quant, etc...) as well as the software revision (found by going to Help>About)?

  • I do sell an application that reads the gcms data files and converts it really fast in a csv file. It can transfer all data files in subdirectories and I can programs any output format and destination. Just send me a private message if you are interest. 

  • ProteoWizard: Documentation 


    Maybe this is what you are searching for?


    Unfortunatly the .ms files format is agilent proprietary binary.. As of my knowledge the documentation isn´t public.. But you might be able to reuse the libraries in the software product to read the files.. People have done it in the past (see Open Source ProteoWizard for example)..

  • Actually, Hewlett-Packard did publish fairly complete details on the data format of files in the original documentation for the 5970A MSD.  To the best of my knowledge (perhaps Agilent people can correct this if not so) that same format has been continued.  I provide free software (Quadfiles 1.2) at that can read ChemStation files and output their content in text format.  I also provide a code listing in Pascal for reading files, based on the original Hewlett Packard documentation.  Quadfiles will also read Mass Hunter files (quadrupole MS and MS/MS only), but that is based on inferred data structures, since I do not think Agilent has publicly released the Mass Hunter file format at this time.    

  • Thanks for the replies. I'm trying to extract a simple x/y plot of time vs abundance not each spectra. - I'm not sure why but I don't see it in the excel workbook. The workbook has a lot of chemical compounds with data around each one - I'm not an expert in this field so I don't know how the machine is deriving these values but I assume there is a prebuilt profile that's part of the software.

    Looks like my best bet is to try and port the pascal code from quadtech (thanks again!) to .NET so I can load the binary and try and extract the same data as their software. My background is in software development so I've got a better chance of attacking this route. Ill also try and  get the software version and post here and if successful I'll post the code on Github.

  • Once again replying to my own question after thinking some more and realizing that I will need the spectra data to identify compounds at each peak of the GC. 

  • Tarquinv


    Perhaps I just don't understand what you are looking for.  If you are working with ChemStation files and trying to extract all scans RT, total abundance and full spectrum (M/A pairs), then that is exactly what the program I mentioned to you earlier does.  It is specifically designed to extract all those data into a CSV file in a format designed to then be read by other software (not just Excel).  As far as I can tell, that is precisely what you want, isn't it?  Have you tried Quadfiles yet?


    If the problem is that you want to extract those data from thousands of files, then perhaps what you need is the ability to conduct that processing in batch mode.  If so, then let me know and it should be simple to add an option to do that.  If for some reason you absolutely must use only your own software (sort of re-inventing the wheel to me), then you only need the file structure, the routine to extract longint packed abundance values, and of course a lot of byte swapping, all of which is defined in the source code listed on my website.  This was defined years ago by Hewlett-Packard.  But it does seem to me that Quadfiles can do exactly what you are asking for right now with little or no additional effort.  Let me know if you want to discuss further. 





  • This question has been marked as assumed answered.

Was this helpful?