InfoQ Homepage Presentations Best Practices to Secure Web Applications

Best Practices to Secure Web Applications

Bookmarks

View Presentation

Speed:

Download

51:55

Summary

Loiane Groner discusses the best practices for secure coding, input validation techniques, the importance of strategic error handling and logging, and how to manage file uploads safely.

Bio

Loiane Groner is a Development Manager at Citibank and has authored books for Packt Publishing. She is a Google Developer Expert in Angular, Microsoft MVP in Developer Technologies, Oracle ACE, Java Champion, and speaker at tech conferences.

About the conference

InfoQ Dev Summit Boston software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Groner: We're going to talk a little bit about API security. Before we get started, we have to understand why we have to do this. For application security, how many companies are handling this today is, you do your planning, development. Developers will push the PRs, going to do the build. There's usually a QA environment, testing, UAT, many companies will call this different things. Then, that's when you're going to raise your security testing request. You're going to ask the InfoSec team, please test my application. Let me know if you find any security vulnerabilities. If they find, goes back to the dev team, "We found this security issue. This is a very high risk for our business, and you have to fix it". Again, goes through the PR, has to go through testing again, rinse, repeat, until you have a clean report or no high-risk vulnerabilities, and you finally can go to production. This has a few caveats.

First, it can cause production delays, because if you have to go through this testing and rinse, repeat, all this cycle until you get a clean report so you can go to production, that can take a while. Or, it's even worse, companies are not even doing security testing through the software development lifecycle, and they're doing this once a year, or not doing at all. There's a very interesting research that was done by the Ponemon Institute, and they say that fixing software defects, or worse, fixing security risks once the product is in production, costs way more than if you are handling that during the development. That's why in the industry, we say there is a shift left happening, because many years ago, we went through all that cultural change of having unit testing done as part of our development cycle, and now we're going through this again.

However, we're talking about security this time. It is much cheaper and much cost effective for the team, for the company as well, for you to handle all those security vulnerabilities, and make sure that your software is secure when you're doing development. Security has to be from day one. It's not a technical debt. It's not something that we're going to add in the next sprint. It has to be part of your user story. It has to be part of your acceptance criteria. It has to be part of your deliverable. I would like to show you a few things that I've learned throughout the years.

My name is Loiane. This talk is from a developer to other developers and leads, so we can go through this cultural change and make sure that security is indeed part of our development phase.

What is API Security?

First of all, whenever we say API security, if you decide to Google this, search this, go into YouTube, try to find a tutorial, you're going to find a lot of tutorials talking about authentication and authorization, especially if you're working with Spring Boot. All my examples here are going to be with Java, because this is the technical stack that I'm most familiar with. All the examples you can easily translate them into a different programming language, framework, or platform. Going back to my question, if you go to YouTube and you search by Java security or Spring security, you'll find a lot of tutorials about authentication and authorization. Security is not only about that. If we take a look at the OWASP Top 10 vulnerabilities that are found each year, and this list is going to change year after year, you find a lot of the same things happening over and again. What I'm going to show you here, at least with all the tips and all these best practices, we can at least make sure that half of this list are not going to happen within our software.

Better Authorization

Let's go through it first. Let's suppose that everybody is doing authentication, so at least user password, or you're using an OAuth service. You're doing that in your software. We still need to handle authorization, which is why you have to make sure that the user that is trying to access your application or trying to perform a certain action within your application, is indeed able to perform that action. How do we make authorization better within our applications? Let's start with the first example, with a bad practice. We're checking here if we can update a course. This is a RESTful API, so we're doing a Post here. We have the ID. We also have the object, the data that we're trying to update. I get the user that's authenticated. I'm checking if this user has a role student. If the user has a role student, they cannot update the course. If I am somebody that I don't know anything about this application, and I'm reviewing this code, I don't know who exactly can actually update this record. It's not clear just reading the code.

A better way of doing that is deny by default. I'm going to write my code, I'm going to write my business logic, and by default, nobody is going to have access to it. What I'm going to do is I'm going to list whoever can actually update it, and everybody else, I'm just going to not allow it. When I read this code now, at least I can see that only admins and only teachers can actually update this record, so it's a little bit better. The other thing is, the majority of the frameworks that we work with, they do have some support to role-based access control. In Java, for example, we handle a lot of things through annotations. When you're working with Spring Boot, you do have some annotation that you can easily add all the roles that are actually allowed to do this. We are working with the deny by default approach. You're free to write your own business logic and leave that part of the authorization, the security check, outside the main business logic. This is great. However, this works perfectly for small systems or systems that you don't have a lot of roles.

I really wish that my applications were the same, that I watch those YouTube tutorials and I have user or admin. That's it. That will be a wonderful world. Unfortunately, it's not like that. What can happen here is role explosion. I'm going to start with the user and admin. Maybe I have a teacher as well, but now it will be really good to have teaching assistants as well. I'm going to grant access to them to my system so they can do a few things on behalf of the teacher as well. Or, maybe we're working with an eLearning platform, I have an account manager. The account manager will also be able to do those things in my system. We start adding more roles to the system. Now my business logic is only one line of code, and I have more code, just doing the pre-authorization part. It can be, when you're reading this, not so good, and we can do better. If you're working with something like this, which actually looks like the projects that I work with, sometimes the authorization level goes to the button that I see on the screen or the link that I can click on the screen. RESTful here, it's really going to depend on what's the role that I have, if I'm able to perform that particular action or not. It really depends on the role and all the actions that I'm able to perform.

When we handle situations like this, it is much easier if we have something that is a little bit more dynamic. There are many different ways that you can do this. If we're using Spring security and Java, of course, you can use a custom security expression. You can design this according to your needs, according to the size of your project and your business. You can maybe have all the mapping, all the authorization within a database or another storage, and you load that, and you have a method or a function that's going to calculate if the user really has access or not. Of course, annotations for the win. We can actually use the annotation and have our method here with the privilege. Now it's a little bit more clear for me that the only users that are able to actually perform this action are the ones that have the course update privilege, that's mapped somewhere else. When we go into those more complex cases, this can be a little bit easier for us. There is no more hardcoding with all those roles within the system.

The other issue that we might face is, I'm logged in. I'm checking if I am authorized or not. Should I have access to update that particular record? If I am a teacher, let's say we are in university, and there are many classes, should anyone update it? If I know the ID, should I be able to update it, just because I know the ID? I know that some of you are using the incremental identity that is generated automatically by the database. We have to be very careful with that. Again, we still can bypass even if we are authorized to use the system. Be very careful with that, and remember to always deny by default.

How exactly do we make that better? One thing that you can do is, once you have the information, again, you go through the authorization, you have to find a way to check if that particular record can be updated by that particular user. Maybe there is some kind of ID, the course teacher ID, you're going to match that against the user ID that's trying to update that record. That way you can make sure that only that that certain user is able to actually perform that action. However, one thing that I see happening a lot, is we're getting that object, the course object, directly from the request. I still have my ID from the path variable that I'm parsing through my request, but the course that I'm actually checking my logic came from the request. It can be something as simple as using Postman or any other similar tool.

You can manipulate it, or if you're a little bit more smart, you can use another tool to intercept the request, change the JSON that is being sent to the request, and the ID here might be something. Again, you can bypass any authorization logic that you have and still update that record in the database, and something that should not happen. Never trust the request. When you have to do something like this, always go back to the database or to the true source of the data, the data source. Check for the true data to make sure that that is actually able. There is a tradeoff here. This is going to be a little bit more slower, because we have to go to the data source. There is a request, milliseconds, but again, it is a small tradeoff that we are willing to pay here just to have our APIs more secure.

Property Level Issues

When we are working with objects, there are still a few other issues that we can run into. This has a very fancy name. Just to give you an example, if I have a user and I'm trying to get the data from the user that is logged in, I have a user and password. I've done this multiple times myself, exposing the entity directly. Because why am I going to create another class or another object that's just a copy of my entity, and then I'm going to expose it. This can lead to some issues. In this particular case, if I'm only trying to expose the username, and I have some common sense, and I know that I'm not going to expose the password in the JSON, so using annotations, I can simply annotate my Get method and have a JsonIgnore. What happens if tomorrow we receive another requirement and we have to capture another field, for example, sensitive data such as social security number or something else.

The developer that is working on this unintentionally forgets to annotate the method to get the social security number, and when we're sending back that information through the request, you are exposing something that you're not supposed to. This can go through pull request reviews, code reviews, and we're not going to notice. That can happen. A way that we have to avoid this is creating the data transfer objects or DTOs. You can use records if you're using a more modern version of Java, or you can just create a class. You have to explicitly tell what are the properties that you want to expose in this case. It's a much better way of doing that. If tomorrow we get, again, the requirement to add sensitive data to our object, we're not going to expose because the public contract, I don't have that information here, and that social security number or whatever other sensitive data that we have to capture, it's going to stay internally within the system.

Then we can enter into another very good discussion here. Should I create a DTO for a request and have another DTO for a response? Again, this is conflict territory. Each one of us will have their own point of view on this. If you are reusing the same DTO for both requests and responses, just be careful. For example, for the request, do not use the ID, if you have the ID or whatever primary key or whatever unique property that you're using to identify that object from your DTO when you're handling requests. This can also slip through the cracks, and then, again, something might happen. It's always best to have one for request, another one for response. In case you have something again, so you have a metric against duplicated lines of code that you have within a project, be very careful with that.

Password/Key Exposure

Now we're able to handle authorization a little bit better. The second part will be the password and key exposure. This seems a little bit of common sense. Who here is going to expose the password and commit that to your GitHub repo and have the database password? There are a few different ways that this can still happen. Many companies, you have your URL, and then you have your resource name, something to help to identify the project. Then you are creating a developer database. Again, I really wish my project was the same thing as those tutorials, that I can simply have a MySQL database and a Docker image with two tables, and that's it. That would be wonderful as well. Especially when you're working with legacy systems and you have that huge database with maybe hundreds of tables as well, it's a little bit more complicated, with lots of data as well.

Some of the companies, they will have their own database in a server or in a cloud that everybody is going to access that database. I don't know about you, but me, personally, I'm not so good with names. That's the hardest thing to do. How am I going to name a class, a variable? What name do I give to my database? I'm just going to give the company name, or maybe the project name, and dev, to indicate that this is a development environment. I'm going to use prod to indicate that this is production. This can be a little bit dangerous. Then for the password, again, I'm not going to remember all the 30 passwords that I have to do for all the services that we use. I'm just going to use something as well as my learningPlatform@Dev. Then for production, I just change that to production. If something like this gets committed into a repository and somebody sees that information, I wonder what happens if I change this from dev to prod or to another upper environment? Be really careful with that. Never leave passwords or any secrets within your properties YAML file or even hardcoded even for lower environments.

Another issue here, is this last line right here. If you're working with JPA, if you're working with Hibernate, there is a way for you that the framework is going to be responsible for checking all the entities that you have in a source code, and it's going to create all the tables for you. It can create, drop, update. There are many different options. This is a big issue. Never use a user ID that is able to make schema changes in your database. Again, deny by default. You'll start with, I need read access to my database, because I have my user ID, so you grant that read access. If your application is also writing to the database, then you grant the write access. If you need any access to execute any store procedures, then you add that access as well, but never grant more access than is actually needed. Be very careful with that. This only works for tutorials. This does not work for real applications.

Input Validation

The third part that I would like to bring to your attention is input validation. This seems to be also common sense. This seems to be something that is very basic as well. We are failing in this, for lots of code that we review. We are just not even adding any kind of validation, and we need to start changing that as well. We have our frontend. It's beautiful, fully validated. I have all the error messages, user experience, chef's kiss. Then if you take a look at the API that's feeding that frontend, that's just this. I have my create method. I have my DTO. There's nothing. It's just simple code. This is a big red flag. How can we improve that? Never trust the input. Again, if you have your frontend fully validated, the user is entering all the data, hits the submit or save button. Sends the request to the API. It passes the data it saved perfectly. Then, again, if you try Postman, or if you try any of the other approaches to actually evoke your API without a frontend, then you start to run into issues. There are no validations.

Always remember that if you're working with an API that is being used by a frontend, the API exists independently from the frontend. We really have to start validating the API. First step, the same validations that you are applying in your frontend, you have to apply in your API as well. That's the minimum that we have to do. I know it's a lot of work, because there's a lot of validations that can go through, especially when we're working with forms, and we do have a lot of forms in some of the applications, but again, always add the validations to your API, at least the same. Remember that your API has to have more validations than your frontend. It is the one that has to be bulletproof and has to hold the fort when we're talking about security.

Make sure that you're validating type, length, formatting, range, and enforcing limits. Java is a beautiful language, because we have something that I like to call annotation driven development. We just start adding all the annotations, and magically, it's going to do all the work behind the scenes for us. When you are annotating your entities, you have the @Column, for example, just to map this particular property from your class to the column in the database, or to the property in the document. Make sure that you're adding the length as well, if it's nullable or not, if it is unique. Try to map your database mapping into your code as well, because, again, that's going to be at least one layer that we can add a security.

In Java, we have a really nice project that's called the Jakarta Beans Validation. If you're a little bit old school, the Java EE Beans Validation as well. Hibernate also has one of the implementations that's called the Hibernate Validators, that you can use to enhance all your entities or all your documents as well. Do not forget to validate strings, when we have a name. Even if you look at this code right here, I see you have some validations, but that's not enough. I don't have all the validations. There is too much damage that I can do if I only have validations for the size, but I'm not validating the string itself. If you try to do a request, can I do !##$, and something, I'm just going to look at my keyboard and add some special characters or weird characters. Is that a valid name? Should it be allowed? Validate strings.

One thing that we usually tend to do, I just go to my keyboard. Let me look and I'm going to type and I'm going to create my regex from my keyboard. If you go to the ASCII table, or if you take a look at the Unicode table, you have hundreds of characters. Characters that I don't even know that exist, or I don't even know the name. Be very careful with that. Always prefer to work with an allowed list. What does it exactly mean? A name. If I need to have or I'm only allowed to have alphanumeric characters with maybe a space, parenthesis, underscore, so that is my name, anything else is going to be deny by default. One other thing that you can do is maybe sanitize as well. It really depends on the project. You can use the approach that, if the user tries, I'm going to not allow it, just going to throw an error. Or you can try to automatically remove those characters, or you sanitize those characters as well. Different approaches for different projects. Just make sure that you are choosing the one that is a better fit for you.

Always remember to secure all the layers. For example, we're working here with three layers. We have our blob controller, and validate all the parameters that your methods have. Do not be shy to use those annotations. It only takes seconds to actually add those annotations over there. One thing that is very important, especially if you're working with pagination, never forget to add an upper limit to your page size. My frontend only allows 100 records per page. That's fine. Here, what if I parse a million, 5 million? What if I try to do a DDoS attack and send multiple requests with 5 million? Is your server able to handle that many requests? You can bring down your service, and that can cause business loss, financial loss to the company as well.

Always make sure that you're adding validation to each and every parameter that your API is receiving. Again, in the service, you're going to repeat that. The good thing is, you've done that in the controller, so Control C, Control V in the service, or maybe you're doing the other way around, the service and then the controller. Make sure that you are propagating all those validations across all the layers. Because, what can happen, depending on the application that you are working with, you can have a service that is being consumed by only one controller, but again, maybe next week, next month, or next year, you have another controller also using that same service. What's going to happen?

If the developer that is now coding the controller, that developer does not do any validation in the controller, at least the service is going to be able to handle any kind of validation and reject any kind of requests. Again, the entity or documents as well, don't be shy to use and add all those annotations. The beautiful thing about this is, if you are handling a request, and sometimes if you have a column or a property that is only able to handle 10 characters, and let's say that you are sending 50 characters through the request, you don't get that truncate message, that exception, and it's going to fail to write into the database. The other beautiful thing about this as well is if you are on the cloud and the service that you're using is charging you per request when you have all these validations in place, you are saving a failed request to the database so that can actually bring some cost saving benefits to the organization when you have all these validations in place.

SQL injection. It's 2024, we still have to talk about SQL injection. That is still happening. Make sure that you are validating, sanitizing your inputs, escaping those special characters that can be used for SQL injection. I know sometimes we don't want to use some kind of Hibernate thing. When you have something a little bit more complex, you want to write your own native queries. Make sure you're not using concatenation. Please, at least a prepare statement. Be a lazy developer, use what the framework has to offer you. Don't try to do things on your own. Many developers have gone through the same issues before, and that's why we have frameworks to try to abstract a few of these things for us. I'm still seeing code during code reviews with concatenations in place. Sad, but it's life.

File Upload

Still talking about input. We're only talking about validating the request. What about files? I work for an industry where we handle a lot of files. I'm not talking about images. I'm talking about Excel files, Word files, PDFs, things like that where you have to read those files, parse it, and then do something with the data that's within the file. Then you go through with the business logic. First rule of thumb, always make sure that you are adding limits to the file size. If the file is too big, ask the user. Again, really depends on what's the business use case here. Try to find, what's your limit, something that is acceptable. Make sure that you are setting that into your application. Again, if you're using Java Spring, two lines of code. Easy. Five seconds and you're done. Make sure to also check for extension and type validations. These can be very deceiving. If you remember a few slides back, never trust the input, because here you can go to the content header, and you can manually change it and deceive the code, if you're checking for the extension in the content header. What do we do?

The issue that we can run into this with the extension is, if your library is expecting one extension, and it's actually something else, you can run into all sorts of issues. Also with the file name, there is one very famous vulnerability that's called the path traversal vulnerability, where the file name, again, we don't know what's the file name. You can use those tools to intercept the request and change it, and have something that is malicious. You can completely wipe out directories of files. I don't know if you're using a NAS, if you're using an S3 bucket, or any kind of storage, but there is a lot of damage that you can do only with a malicious file name. Make sure to also validate that. Be a lazy developer. Use tools that are already available, if you are able to add these dependencies to your project. If you need something that is very simple, very quick, you can use Apache Commons IO. There is a file, you choose file because we love a you choose class. There is a you choose file that you can use to normalize the file name.

If you need something a little bit more robust, you can use Apache Tika that you can actually read the metadata of the file, get the real file extension, sanitize the name of the file. I cannot tell you how many times this library has helped me to close a few vulnerability issues for the applications that I have worked with. Whenever I'm working with file upload, the first thing that I do, do I have Apache Tika in my pom.xml? If I have, then uncover, and then just can copy paste the boilerplate code, or you can create a static method just to run those validations for you and have some reusability as well. Again, if you are indeed saving the file somewhere, be sure that you are running the file through a virus scan. If you're working with spreadsheets or CSVs or documents, again, deny by default. Do my Excel file need to have macros or formulas? My Word document, do I need to allow embedded objects? Does it make sense for my application? Do I have a valid business justification? Make sure that we have all those validations in place. Then you can safely store your file and live happily ever after.

Exception Handling and Logging

Exception handling and logging, this is where we have to be a little bit careful as well. We as developers, and I find this really funny, whenever I'm using a service on the internet and an error occurs, and I see, they're using this tech stack. That's really cool. For me, it is, but for somebody that doesn't have good intentions, might not be. Never expose the stack trace. Log the stack trace, because we as developers, we're going to rely on logs to do some debugging and try to fix some of the production issues. Log it, but do not expose it. Return a friendly and helpful message. Please do not return something like an error occurred, please get in touch with the administrator. What does it mean? Something that is helpful to whoever is seeing the message, but you're not exposing anything.

You're not exposing the technology stack that you are using. Because what happens is, if you expose the technology stack, the person that does not have good intentions might see, let me see if there is any vulnerabilities. You're using Spring. Does Spring have any vulnerabilities that I can try to exploit? That is one of the reasons. Again, if you're using Spring, one line of code that you can add to your properties file or YAML file to not expose the stack trace. Also, be careful with what you are logging. We've watched some talks during this conference here that we as developers, we are responsible. We have to be accountable for the code that we are writing. The beautiful thing of being a developer is that you can work within any industry. With power comes responsibilities. Different industries will have different regulations, so make sure that you're not logging the password, even for debugging purposes.

If you work with personal identifiable information, like first name, last name, email, phone, address, something that can help to identify a person, do not log those in. We have several regulations, GDPR, California has the California Privacy Act. Other states are passing their own regulations. We have to study our programming language, and at the same time, we have to keep ourselves up to date with all these regulations that can impact our jobs as well, to make sure that we're being ethical, and we are writing code that is not infringing any of those laws: financial information, health care data, any kind of confidential business information. Log something that is still helpful to you, to help you to debug those production issues, but do not log something that is sensitive.

One of the things that you can use to remove those sensitive data, especially if you're using the toString to log something, again, remove any sensitive data for your toString. There are annotations that you can do this. I personally prefer to not use annotations on this, because, again, you can forget to annotate in case you're adding a new property. I like to explicitly tell what's my toString here, so I can actually safely log that information if I have to. In case you do have to log user IDs or credit card numbers or any sensitive confidential data, you can mask that data and still be presented in a helpful way to you, or you can use vault tokens as well.

There are many different ways that you can do this, in case you absolutely have to log it. Be very careful with that. Last but not least here, apply rate limits to your APIs. There are many flavors in the industry. It all depends on the size of your application. If you need something that is very quick and easy, you can use Spring AOP. There's also a great library, Bucket4j. If you need a more robust enterprise solution, Redis for the win, among other solutions out there as well. Do apply because, in case your API does have any kind of vulnerability, at least here, you're going to prevent some data mining. At least if you have some rate limit, you can control the damage that's done here. At least have one of the things. If you cannot have it all, at least try to apply a few validations, rate limit so you can decrease the size of the damage.

Testing

Testing. After all we've talked about, of course, we have to test all of this. It's not only our business logic. For testing, make sure that you are adding those exception edge cases as well to your testing. If you only care about percentage of code coverage, this is not going to add any code coverage to your reports, but at least you are testing if you have your validations in place. You know if your security checks are in place.

One of the things that really helps me, especially when I have to write this kind of data, you can use other data sources for this. You can have your invalid data into some sort of file, and load it. There are many ways of doing this. In case you're writing the data yourself, use AI to help you with this. You write two, three, and then the AI is going to pick it up and bind the test, all the rest of the data for you. This is a way to also improve that.

The AI Era

Again, we are in our AI era here, so make sure that you are taking advantage of that. If you are starting to work with projects with AI, because, of course, now it's AI, our companies are going to ask, can you just put an AI on that? Just make sure that we have an AI. In case you are working in one of those projects and you are handling prompt engineering, make sure to validate and sanitize that as well. This is a really cool comic. Make sure that you are validating and sanitizing your input. It doesn't matter the project, always validate and sanitize. Use AI as an ally here. It's a great IntelliSense tool. I really like to use as my best friend coding with me.

You're not sure how to write a unit test for a validation, just ask Copilot, CodeWhisperer, whatever tool that you are using, it can help you with that. In case you're using GitHub now, they're coming out with a lot of services. I really think that this is adding the security within the pipeline itself. Keep your dependencies up to date, that also helps a lot. Add some code scanning. For any security vulnerability, make sure that you're not exposing those passwords. It can also help a lot with that if you do have access to services like this. Of course, there are a lot of other services within the industry as well. It really depends what your company is using. There are great services out there that you can achieve a very similar result.

Education and Training

Of course, you're not going to go back tomorrow and say, team, I think we need to start incorporating a little bit more security within our code. This change does not happen from night to day or from day to night. It is a slow process. We need to mentor junior developers on this and the rest of our team as well. This is a work in progress, through many months. One of the things that I like to do with the folks that I work with is, whenever we're having demos of the product, I'll start asking questions. This is a really nice, cool feature, you're handling a file upload? Are you checking for the file name? Are you validating that? Or, if we have some RESTful API, what are you using for validation? Start asking questions.

Next time you're having those sessions again, ask the same questions again. Next time she's going to ask about that, let's just add it so when she asks, we've already done it. That's a different way of doing that. Provide feedback. Make sure that the requirements are part of your user stories, it is part of the requirements, so we can start to incorporate it as part of the development. One thing that I like to use as well is some security checklist whenever I'm doing code reviews. This is only a suggestion. These are some of the things that I find mostly in the code reviews that I do. Always be kind with the code reviews that you are doing. These are some of the things that I usually check whenever I'm doing code reviews. You can evolve from this. Adapt to something that works better for your team. Again, many flavors available out there.

Questions and Answers

Participant 1: Do you have any recommendations for libraries for file content validation?

Groner: It really depends on what kind of validation you are using. For example, for all the Word documents that I handle, all the spreadsheets that I handle, we usually do not allow macros, formulas, embedded objects. For the content itself, it really depends on the use case that you have. It can be something manual. You can use some OCR tool to help you to do that as well. It's really going to depend.

Participant 1: Since you mentioned Excel files. We do have a use case where users upload Excel files. I was just wondering if there are any off-the-shell libraries that we can use, or do we have to write custom code?

Groner: Depending on what you need, we usually write our own. We only validate for things that we do not allow. If you have a data table you're only trying to extract that data table, we're going to run all the validations on all the types that we have all over again, and validate all the business logic to make sure that that data is what we are expecting. That level of detail, it's usually that we usually write something. Depending on the use case, Google has services for that, and there are a few services out there that you can try to use to help you to go through that.

Participant 2: You're logging what we should expose, what we should return. In our team, we are having this double-edged sword in the sense that we don't want to return sensitive information, like expose our business logic, how we do our profile management. When we have issues escalated to our helpdesk or service centers, we can't find the exact errors by looking at Splunk, because our APIs don't return those important crumbs for us. How can we approach this better? Our architects suggested, for instance, maybe we should use error codes. Like, this is the error code 2 or 3. Have you encountered this issue before? What should we do?

Groner: There are a few different ways that you can approach this. One is you can definitely have your own dictionary of the error codes, as you mentioned, just to help you a little bit with the debugging process. The other way around it is, you can try to mask the data. It'll still be something that is meaningful and it's easier for you to consume, but not something that's going to be exposing any sensitive data. Because often when we're running into production issues, it can be something like a software defect where we have to fix, but it can also be data consistency issues as well.

Those cases are a little bit more difficult to do the debugging. If you have some masking that you still have, like the nature of the data itself, you can still go through that without actually having access to the database, or something like that. It will be one of the approaches that I would try to use. This is very specific. It really depends on the business case, but it helps a little bit. The other thing that you can do as well is some kind of vault. If you have that data, you have some token, and you can log the token that can help you to retrieve the data. That will be another approach as well.

Participant 3: Do you have any suggestions for any tool in the CI/CD pipeline to scan the code quality and check for security inside the code?

Groner: There are a few, like Snyk. There's Sonar. Depending on how you configure Sonar, you can try to catch those as well. Personally, we use a lot of checkmarks to do that, like checkmarks for code. There is still a team of InfoSec that is reviewing the checkmark, what it's flagging to review if it's a real issue or not. There is Black Duck for any kind of CVEs that we have out there for dependencies. There are other tools on the market, but these are some that we use internally, that's global to the organization. If you're using GitHub, GitHub now, they're rolling out a lot of features, and they have the code scanning. A lot of them are free to use if you're actually using GitHub, but a few of them, you still have to have the license of the product in order to be able to use.

Participant 4: You mentioned having validation at all levels. One of the things we've done is pulled out, like we don't have authentication at every level, we just handle that, not even in the service authorization, pull it up to the top level. For something like validation, we also have that at the top level, and not have underlying services or something that we handle just at the base level, like in a controller, so that we don't have to keep adding that in. Is there a difference, in your opinion, on like authorization and authentication versus validation, and why you do validation at every single level, like why that's different?

Groner: I think it's really going to depend on the team itself. You definitely can do a validation only on the controller level, if you want to keep your service layer a little bit more clean. I would definitely add that to the entity as well, because sometimes, we make mistakes. You're going to forget something in the controller level, so at least you have another layer protecting you. If your team has the discipline to always add those validations into the controller, and if that's working for you, that's great. You can continue doing that. It also really depends on the nature of the project. If you have your controller, and then you're calling your service, and maybe you're using microservices architecture, and you don't have multiple controllers, that works really well. If you're working in a monolithic application where you have thousands of controllers and then you have thousands of services.

Then, in one controller, you're making reference to 10 different other services, that becomes a little bit more complex, and you can actually make a mistake when you are trying to reuse that service in a different file. Then if you forget something, that is one of the reasons that I would say, to add in to all layers. It depends on the project. If that's working for you, that is great. For the validation and authorization itself, usually, this is only done on the highest layer, usually in the controller, if we're talking about Java, Spring, or something like that. That's usually where we handle. You don't necessarily need to handle the services in the service layer, unless you have a service that's calling another service. Then you need to have some kind of authorization and authentication, some mechanism in there as well, in case you are interfacing with a different service, like connectivity to a different web service, or what have you.

See more presentations with transcripts

Recorded at:

Oct 23, 2024

Loiane Groner

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?