Extracting Excel File Path Information Using Excel Formulas

In several of the template Excel spreadsheets I use in my work there are references to clients and version numbers that are included. This is for reference purposes, particularly if the spreadsheet might be printed or exported as a soft copy (e.g. PDF). Whilst it is possible to maintain this information manually I decided that this should be something that could be improved by virtue of the file name containing such information (as these are the very basis of file management on projects). I’d referenced the file path and elements of it in the past and I thought that this would be something that should be relatively straight forward to achieve; and here’s how I got on ….

The Source

First of all I began by retrieving the file path of the spreadsheet using the CELL function.

=CELL("filename",A1)

For my rather convoluted example file path (the strange name/path will be useful later in demonstrating a few things - I’m not suggesting this would be anything like a real convention!) here’s what it gave.

C:\Users\millaste\Desktop\[Filename(.xlsx-example) vexed version v1.0.xlsx]Some.Sheet

In my example spreadsheet (available to download below) this is held in cell B3.

Getting the Folder Path

To get the folder path of the file path, the principle is simply to take the first character through to (and including) the last back slash. In order to do this we need to find the character position of the last back slash so that we can use that with the LEFT function to extract that number of characters from the full path string.

Unfortunately the FIND function in Excel that is used to locate the position of a string of text in another string of text has no option to search a string backwards. Similarly there’s no option to reverse the character sequence of a string. Whilst this could be achieved with VBA I prefer to try and accomplish things with standard Excel formulas to avoid the security implications of relying upon macros. This is a particularly important consideration when sharing the spreadsheets with clients.

The workaround is to use the SUBSTITUTE function. SUBSTITUTE allows you to replace one character string with another character string in the original string of text. So if we can replace the last back slash with another character we can then search specifically for that character and discern the length of the file path. So how does SUBSTITUTE know which is the last back slash? Well it doesn’t, but you can pass it an optional parameter to tell it which instance of a character string to replace. So we simply need to count the number of back slashes and pass that in as a parameter.

Note a back slash is not a valid file name or Excel worksheet name character. You may be wondering why I’m not simply searching for the open square bracket (“[”) instead? Well whilst this isn’t a recommended file character for use in Excel file names (for what are hopefully obvious reasons) it is technically possible to include it in the file name so I went for the back slashes instead.

So how do we count the number of back slashes? Well if we don’t pass the optional instance parameter to the SUBSTITUTE function it will replace all instances and our replacement text string could be of zero length which means if we replace all back slashes with an empty string this will be the same as deleting them from the string. Comparing the length of that string (using the LEN function) to the length of the original string will tell us how many back slashes there are which can then be passed as an instance to our earlier SUBSTITUTE function.

In cell B6 I output the number of slashes using the following function combination, the result being 4.

=LEN(B3)-LEN(SUBSTITUTE(B3,"\",""))

In cell B7 I output the character position of the last back slash using the following function combination, the result being 26.

=FIND("*",SUBSTITUTE(B3,"\","*",B6),1)

Armed with this position I can then use a simple LEFT function to give me the folder path (in cell B12).

=LEFT(B3,B7)

The result is C:\Users\millaste\Desktop\

Getting the File Name

Using a similar principle of locating characters and text replacement we can also find the file name and extension from the path using the square brackets as string delimiters. At this point for this example case I’m assuming that the file name is delimited by the last open square bracket and the last close square bracket (in case there are any such brackets in the file path), but I could just as easily have used the length of the folder path + 1 as the first delimiter.

In the example spreadsheet, cell B8 contains the position of the last open square bracket; a value of 27.

=FIND("*",SUBSTITUTE(B3,"[","*",LEN(B3)-LEN(SUBSTITUTE(B3,"[",""))),1)

Cell B9 contains the position of the last close square bracket; a value of 75.

=FIND("*",SUBSTITUTE(B3,"]","*",LEN(B3)-LEN(SUBSTITUTE(B3,"]",""))),1)

Cell B13 then uses the MID function to select the file name based on the position of the open square bracket and the difference between the position of the open square bracket and the close square bracket (i.e. the length of the file name).

=MID(B3,B8+1,B9-B8-1)

The result of this is

Filename(.xlsx-example) vexed version v1.0.xlsx

To remove the file extension we simply use the same sort of combination of FIND and SUBSTITUTE functions to locate the last period (“.”) in the file name and take all the text to the left of it. Cell B14 contains the function to do this.

=LEFT(B13,FIND("*",SUBSTITUTE(B13,".","*",LEN(B13)-LEN(SUBSTITUTE(B13,".",""))),1)-1)

Getting the Worksheet Name

To get the name of the Excel worksheet I simply use the RIGHT formula to take all of the text in the full file path to the right of the last close square bracket. The result (Some.Sheet) is shown in cell B15.

=RIGHT(B3,LEN(B3)-B9)

Getting the Version Number

That gives me all of the basic elements from the file path, but I actually want to extract additional information from the file name. The example I’m using is a version number. The same principles apply for working with the text strings, but this time rather than single character delimiters I’m working with multiple character delimiters.

In the example spreadsheet, the pre-delimiter is “ v” (held in cell B19) and the post-delimiter is “.xls” (held in cell C19). As long as the characters appear either side of the desired “attribute” you wish to extract and are the last instances of such text in the file name then the combinations I’m sharing here will work (and are why I’ve chosen such a bizarre file name to illustrate the example).

The length of the delimiter strings is now important (as they may not necessarily be a single character - thought they certainly could be) in carrying out our text string operations.

Cell B20 contains the length of the pre-delimiter (2).

=LEN(B19)

Cell C20 contains the length of the post-delimiter (4).

=LEN(C19)

The next row (21) contains the first set of text substitutions. These simply replace each of the delimiters with our asterisks.

Cell B21 contains the pre-delimiter substitution (Filename(.xlsx-example)*exed*ersion*1.0.xlsx).

=SUBSTITUTE(B13,B19,"*")

Cell C21 contains the post-delimiter substitution (Filename(*x-example) vexed version v1.0*x).

=SUBSTITUTE(B13,C19,"*")

Here’s where the delimiter length comes into play. We need to get the difference of the lengths between the unsubstituted string and the substituted string and divide that by the length of the delimiter string minus one to give us the number of delimiter instances we’re working with.

Therefore cell B22 contains the number of instances of the pre-delimiter in the file name (3).

=(LEN(B13)-LEN(SUBSTITUTE(B13,B19,"*")))/(B20-1)

Cell C22 contains the number of instances of the post-delimiter in the file name (2).

=(LEN(B13)-LEN(SUBSTITUTE(B13,C19,"*")))/(C20-1)

Using this we can then carry out the second set of substitution operations to replace the last instance of each delimiter.

Cell B23 therefore contains…

=SUBSTITUTE(B13,B19,"*",B22)

Which gives…

Filename(.xlsx-example) vexed version*1.0.xlsx

And cell C23 contains…

=SUBSTITUTE(B13,C19,"*",C22)

Which gives…

Filename(.xlsx-example) vexed version v1.0*x

From this we can now derive the positions of the delimiters using the FIND function.

Cell B24 contains the position of the last instance of the pre-delimiter in the file name (38).

=FIND("*",B23)

Cell C24 contains the position of the last instance of the post-delimiter in the file name (43).

=FIND("*",C23)

Just like getting the file name in the first place, we can then use the MID function to get the attribute (in this case the version number remember). The start of the attribute string is the position of the pre-delimiter plus the length of the pre-delimiter. The length of the attribute string is the position of the post delimiter minus the sum of the position of the pre-delimiter and the length of the pre-delimiter.

So cell B26 in the example worksheet contains the version number of 1.0.

=MID(B13,B24+B20,C24-B24-B20)

If you want to get at all of this information you could just create a worksheet like the Some.Sheet worksheet in the example spreadsheet, hide it in the workbook and reference the elements as you need them in other worksheets. If you don’t want to use a hidden worksheet, you could of course build out the functions rather than build them up over a set of cells. I’m not advocating this approach as the hidden worksheet is much easier to maintain and debug, but sometimes it can be handy not to have hidden worksheets or simply to show off how awesome your Excel skills are.

The example spreadsheet contains a second worksheet where I’ve done just that and there are a few examples from that worksheet shown below, but I’m sure by now you might want a copy of the spreadsheet so here’s the download link.

All In One

Folder Path

=LEFT(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"\","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"\",""))),1))

File Name (with file extension)

=MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1)

File Name (without file extension)

=LEFT(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),FIND("*",SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),".","*",LEN(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1))-LEN(SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),".",""))),1)-1)

Worksheet Name

=RIGHT(CELL("filename",A1),LEN(CELL("filename",A1))-FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1))

Version

=MID(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),FIND("*",SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1)," v","*",(LEN(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1))-LEN(SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1)," v","*")))/(LEN(" v")-1)))+LEN(" v"),FIND("*",SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),".xls","*",(LEN(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1))-LEN(SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1),".xls","*")))/(LEN(".xls")-1)))-FIND("*",SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1)," v","*",(LEN(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1))-LEN(SUBSTITUTE(MID(CELL("filename",A1),FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)+1,FIND("*",SUBSTITUTE(CELL("filename",A1),"]","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"]",""))),1)-FIND("*",SUBSTITUTE(CELL("filename",A1),"[","*",LEN(CELL("filename",A1))-LEN(SUBSTITUTE(CELL("filename",A1),"[",""))),1)-1)," v","*")))/(LEN(" v")-1)))-LEN(" v"))

Summary

The same methods can of course be used to extract other attributes of the file name or even for example the parent folder (last folder in the file path) if that were relevant. As long as you can define the delimiters reliably and they are either the first or the last, you should be able to get the data you want programmatically.

Author: Stephen Millard
Tags: | excel |

Buy me a coffeeBuy me a coffee



Related posts that you may also like to read